From 26248cc3112897d420ced1413d0bbf015c8c9be9 Mon Sep 17 00:00:00 2001 From: Andrew Lukoshko Date: Mon, 29 Jun 2026 11:18:51 +0000 Subject: [PATCH] Recreate RHEL 5.14.0-687.19.1 from CS9/upstream backports --- ...ctly-check-for-maximum-number-of-act.patch | 508 ++++ ...kip-unrelated-mode-changes-in-dsc-va.patch | 112 + ...kb2-cb-in-ip6-err-gen-icmpv6-unreach.patch | 63 + ...untime-uaf-during-format-change-stop.patch | 137 + ...list-corruption-by-removing-work-lis.patch | 208 ++ ...d-instead-of-task-for-selinux-checks.patch | 332 +++ ...octou-race-when-granting-write-lease.patch | 115 + ...er-for-opening-kernel-internal-files.patch | 122 + ...zalloc-into-alloc-empty-file-helpers.patch | 124 + ...ainer-for-internal-files-with-fake-f.patch | 253 ++ ...tify-events-on-underlying-real-files.patch | 75 + ...anup-from-init-file-into-its-callers.patch | 82 + ...rameter-in-security-binder-transfer-.patch | 119 + ...es-use-kiocb-start-end-write-helpers.patch | 73 + SOURCES/1327-fs-fix-kernel-doc-warnings.patch | 188 ++ ...s-rename-mnt-want-drop-write-helpers.patch | 300 ++ ...-for-an-open-backing-file-s-real-pat.patch | 109 + ...er-path-for-user-displayed-mapped-fi.patch | 153 + ...ad-of-fake-path-in-backing-file-f-pa.patch | 240 ++ ...ble-filesystems-backing-file-helpers.patch | 243 ++ ...backing-file-read-write-iter-helpers.patch | 600 ++++ ...cking-file-splice-read-write-helpers.patch | 189 ++ ...-factor-out-backing-file-mmap-helper.patch | 124 + ...-lsm-add-helper-for-blob-allocations.patch | 158 + ...37-ovl-fix-nested-backing-file-paths.patch | 74 + ...ptr-in-backing-file-accessor-helpers.patch | 105 + ...remove-unneeded-non-const-conversion.patch | 38 + ...undant-iocb-dio-caller-comp-clearing.patch | 48 + ...-event-path-names-with-backing-files.patch | 89 + ...-for-adding-lsm-blob-to-backing-file.patch | 85 + .../1343-lsm-add-backing-file-lsm-hooks.patch | 535 ++++ ...ayfs-mmap-and-mprotect-access-checks.patch | 444 +++ ...l-only-hotfix-for-execmem-regression.patch | 130 + ...s-fix-matcher-action-template-attach.patch | 323 +++ ...mlx5-hws-remove-unused-element-array.patch | 178 ++ ...t-mlx5-hws-make-pool-single-resource.patch | 700 +++++ ...lx5-hws-refactor-pool-implementation.patch | 760 +++++ ...5-hws-cleanup-after-pool-refactoring.patch | 265 ++ ...x5-hws-add-fullness-tracking-to-pool.patch | 108 + ...-mlx5-hws-fix-pool-size-optimization.patch | 53 + ...t-mlx5-hws-implement-action-ste-pool.patch | 585 ++++ ...mlx5-hws-use-the-new-action-ste-pool.patch | 190 ++ ...hws-cleanup-matcher-action-ste-table.patch | 875 ++++++ ...x5-hws-free-unused-action-ste-tables.patch | 254 ++ ...-export-action-ste-tables-to-debugfs.patch | 99 + ...rmatting-of-ptp-rq0-csum-complete-ta.patch | 100 + ...stakes-in-mlx5-core-dbg-message-and-.patch | 58 + ...net-mlx5-hws-fix-ip-version-decision.patch | 136 + ...hws-harden-ip-version-definer-checks.patch | 127 + ...s-disallow-matcher-ip-version-mixing.patch | 256 ++ ...-upon-firmware-failure-for-rq-destru.patch | 142 + ...t-mlx5-support-software-tx-timestamp.patch | 78 + ...tion-mlx5hws-table-ft-set-next-ft-in.patch | 77 + ...finer-function-to-get-field-name-str.patch | 263 ++ ...pose-polling-function-in-header-file.patch | 120 + ...mlx5-hws-introduce-isolated-matchers.patch | 414 +++ ...et-mlx5-hws-support-complex-matchers.patch | 1740 +++++++++++ ...ce-rehash-when-rule-insertion-failed.patch | 93 + ...fix-counting-of-rules-in-the-matcher.patch | 99 + ...undant-extension-of-action-templates.patch | 171 ++ ...1373-net-mlx5-hws-rework-rehash-loop.patch | 209 ++ ...mlx5-hws-dump-bad-completion-details.patch | 191 ++ .../1375-net-mlx5-use-to-delayed-work.patch | 40 + ...5-sws-fix-reformat-id-error-handling.patch | 196 ++ ...ws-register-reformat-actions-with-fw.patch | 246 ++ ...78-net-mlx5-hws-fix-typo-nope-to-nop.patch | 220 ++ ...dle-modify-header-actions-dependency.patch | 228 ++ ...handling-inmlx5-query-nic-vport-qkey.patch | 64 + ...-setting-mac-address-of-representors.patch | 41 + ...ing-in-mlx5-query-nic-vport-node-gui.patch | 63 + ...r-code-in-mlx5hws-bwc-rule-create-co.patch | 43 + ...es-are-always-allocated-on-same-numa.patch | 45 + ...e-when-searching-for-existing-flow-g.patch | 64 + ...5-hws-init-mutex-on-the-correct-path.patch | 51 + ...ssing-ip-version-handling-in-definer.patch | 40 + ...e-the-uplink-is-the-last-destination.patch | 98 + ...fix-leak-of-geneve-tlv-option-object.patch | 81 + ...hecking-to-hws-bwc-rule-complex-hash.patch | 78 + ...race-between-dim-disable-and-net-dim.patch | 101 + ...5e-add-new-prio-for-promiscuous-mode.patch | 116 + ...rectly-set-gso-size-when-lro-is-used.patch | 86 + ...net-mlx5-fix-memory-leak-in-cmd-exec.patch | 49 + ...-peer-miss-rules-to-use-peer-eswitch.patch | 247 ++ ...-convert-timeouts-to-secs-to-jiffies.patch | 70 + ...he-redundant-mlx5-ib-stage-uar-stage.patch | 103 + ...-support-for-200gbps-per-lane-speeds.patch | 60 + ...ma-mlx5-avoid-flexible-array-warning.patch | 109 + ...-event-obj-sub-list-before-xa-insert.patch | 103 + ...rs-query-for-non-representor-devices.patch | 48 + ...a-mlx5-fix-cc-counters-query-for-mpv.patch | 40 + ...x5-fix-vport-loopback-for-mpv-device.patch | 89 + ...xpose-serial-numbers-in-devlink-info.patch | 132 + ...ampo-reorganize-mlx5-rq-shampo-alloc.patch | 242 ++ ...mlx5e-shampo-remove-redundant-params.patch | 113 + ...o-improve-hw-gro-capability-checking.patch | 68 + ...x5e-shampo-separate-pool-for-headers.patch | 304 ++ ...eue-mgmt-ops-and-single-channel-swap.patch | 150 + ...port-ethtool-tcp-data-split-settings.patch | 141 + ...-prios-to-rdma-transport-steering-do.patch | 103 + ...ctor-for-general-object-capabilities.patch | 75 + ...its-for-pcie-congestion-event-object.patch | 90 + ...vice-with-net-namespace-supplied-fro.patch | 99 + ...ling-in-rq-memory-model-registration.patch | 57 + ...fix-rdma-transport-init-cleanup-flow.patch | 89 + ...k-device-memory-pointer-before-usage.patch | 75 + ...mentation-for-setting-tc-bw-on-rate-.patch | 95 + ...d-support-for-setting-tc-bw-on-nodes.patch | 467 +++ ...ass-scheduling-support-for-vport-qos.patch | 687 +++++ ...er-nodes-and-implement-full-support-.patch | 507 ++++ ...e-unused-create-dest-array-parameter.patch | 115 + ...et-mlx5-hws-remove-incorrect-comment.patch | 41 + ...-net-mlx5-hws-export-rule-skip-logic.patch | 69 + ...et-mlx5-hws-refactor-rule-skip-logic.patch | 65 + ...ws-create-stes-directly-from-matcher.patch | 201 ++ ...hws-decouple-matcher-rx-and-tx-sizes.patch | 362 +++ ...hws-track-matcher-sizes-individually.patch | 472 +++ ...range-to-prevent-forward-declaration.patch | 293 ++ ...0-net-mlx5-hws-shrink-empty-matchers.patch | 127 + ...5-add-hws-as-secondary-steering-mode.patch | 43 + ...pelling-mistake-disabliing-disabling.patch | 38 + ...mlx5-migrate-to-the-rxfh-context-ops.patch | 352 +++ ...used-vlan-insertion-logic-in-tx-path.patch | 121 + ...act-a-memcmp-from-a-spinlock-section.patch | 57 + ...ive-vlan-push-handling-with-an-itera.patch | 89 + ...hen-write-combining-is-not-supported.patch | 44 + ...-rx-remove-unnecessary-rqt-redirects.patch | 67 + ...pability-bits-for-mkey-max-page-size.patch | 41 + ...-fix-umr-modifying-of-mkey-page-size.patch | 80 + ...ned-fr-counter-through-hca-capabilit.patch | 45 + ...lx5-ifc-updates-for-disabled-host-pf.patch | 41 + ...destroy-pcie-congestion-event-object.patch | 271 ++ ...device-pcie-congestion-ethtool-stats.patch | 360 +++ ...err-vs-null-bug-in-esw-qos-move-node.patch | 44 + ...psec-hardware-offload-in-legacy-mode.patch | 46 + ...-mlx5e-fix-kdoc-warning-on-eswitch-h.patch | 43 + ...s-rcu-protected-qdisc-sleeping-varia.patch | 50 + ...its-to-support-rss-for-ipsec-offload.patch | 140 + ...ifc-bits-and-enums-for-buf-ownership.patch | 60 + ...-cable-length-field-in-pfcc-register.patch | 89 + ...mpo-cleanup-reservation-size-formula.patch | 145 + ...e-mlx5e-shampo-get-log-hd-entry-size.patch | 83 + ...ve-duplicate-mkey-from-shampo-header.patch | 142 + ...ph-expose-pcie-tph-get-st-table-size.patch | 95 + ...456-net-mlx5-expose-ifc-bits-for-tph.patch | 65 + ...-add-support-for-device-steering-tag.patch | 348 +++ ...ix-build-wframe-larger-than-warnings.patch | 220 ++ SOURCES/1459-net-fix-typos.patch | 42 + ...y-port-buffer-size-in-pbmc-before-up.patch | 53 + ...b-secpath-if-xfrm-state-is-not-found.patch | 113 + ...deadlock-by-deferring-rx-timeout-rec.patch | 157 + ...-networks-during-ipsec-macs-initiali.patch | 158 + ...tis-via-devlink-tx-reporter-diagnose.patch | 78 + ...rectly-set-gso-segs-when-lro-is-used.patch | 61 + ...hws-fix-bad-parameter-in-cq-creation.patch | 39 + ...s-fix-simple-rules-rehash-error-flow.patch | 146 + ...-fix-complex-rules-rehash-error-flow.patch | 126 + ...nt-rehash-from-filling-up-the-queues.patch | 60 + ...h-on-every-kind-of-insertion-failure.patch | 57 + ...-net-mlx5-hws-fix-table-creation-uid.patch | 179 ++ ...x5-ct-use-the-correct-counter-offset.patch | 48 + ...-base-ecvf-devlink-port-attrs-from-0.patch | 49 + ...qos-group-and-attach-vports-directly.patch | 299 ++ ...preserve-tc-bw-during-parent-changes.patch | 110 + ...os-element-when-no-configuration-rem.patch | 151 + ...ence-leak-in-vport-enable-error-path.patch | 44 + ...-scheduling-node-cleanup-on-vport-en.patch | 41 + ...-mlx5e-query-fw-for-buffer-ownership.patch | 142 + ...d-buffer-capacity-during-headroom-up.patch | 110 + ...leak-in-hws-pool-buddy-init-error-pa.patch | 43 + ...leak-in-hws-action-get-shared-stc-ni.patch | 45 + ...alized-variables-in-mlx5hws-pat-calc.patch | 57 + ...-destruction-in-mlx5hws-pat-get-patt.patch | 52 + ...oad-auxiliary-drivers-on-fw-activate.patch | 53 + ...assertion-on-sync-reset-unload-event.patch | 259 ++ ...nack-sync-reset-when-sfs-are-present.patch | 100 + ...eering-mode-changes-in-switchdev-mod.patch | 62 + ...mlx5e-set-local-xoff-after-fw-update.patch | 52 + ...-netdev-access-against-device-unbind.patch | 154 + ...-miss-level-for-ipsec-crypto-offload.patch | 103 + ...nore-flow-level-for-multi-dest-table.patch | 119 + ...c-rs-stats-for-rs-544-514-interleave.patch | 43 + ...-apis-pre-destroy-cq-and-post-destro.patch | 148 + ...riorities-support-to-rdma-transport-.patch | 148 + ...raw-in-user-namespace-for-flow-creat.patch | 47 + ...raw-in-user-namespace-for-anchor-cre.patch | 47 + ...raw-in-user-namespace-for-devx-creat.patch | 47 + ...kc-page-size-capability-check-to-prm.patch | 142 + ...-mlx5-optimize-dmabuf-mkey-page-size.patch | 571 ++++ ...nt-check-on-err-on-return-expression.patch | 41 + ...eturned-type-from-mlx5r-umr-zap-mkey.patch | 119 + ...rdma-mlx5-fix-incorrect-mkey-masking.patch | 43 + ...04-rdma-mlx5-add-dmah-object-support.patch | 171 ++ ...rt-for-reg-user-mr-reg-user-dmabuf-m.patch | 394 +++ ...ctor-optional-counters-steering-code.patch | 358 +++ ...mismatch-for-srq-event-subscriptions.patch | 52 + ...lx5-don-t-use-pk-through-tracepoints.patch | 56 + ...rdware-definitions-needed-for-adjace.patch | 215 ++ ...che-vport-vhca-id-on-first-cap-query.patch | 167 ++ ...switch-set-query-hca-cap-via-vhca-id.patch | 180 ++ ...t-mlx5-export-mlx5-vport-get-vhca-id.patch | 213 ++ ...-query-to-see-if-host-pf-is-disabled.patch | 80 + ...-net-mlx5-support-disabling-host-pfs.patch | 257 ++ ...burst-period-for-tx-and-rx-reporters.patch | 89 + ...ove-kconfig-co-dependency-with-vxlan.patch | 57 + ...vport-acls-root-namespaces-to-xarray.patch | 327 +++ ...port-acls-root-namespaces-creation-t.patch | 271 ++ ...pport-for-adjacent-functions-vports-.patch | 420 +++ ...-acls-root-namespace-for-adjacent-vp.patch | 60 + ...ter-representors-for-adjacent-vports.patch | 134 + ...presentor-attributes-for-adjacent-vf.patch | 138 + ...e-the-cached-vhca-id-for-this-device.patch | 133 + ...psp-capabilities-structures-and-bits.patch | 266 ++ ...egister-read-logic-into-helper-funct.patch | 96 + ...upport-getcyclesx-and-getcrosscycles.patch | 148 + ...-add-rs-fec-histogram-infrastructure.patch | 116 + ...cqe-compress-type-via-devlink-params.patch | 399 +++ ...ement-devlink-enable-sriov-parameter.patch | 308 ++ ...mplement-devlink-total-vfs-parameter.patch | 218 ++ ...estion-event-thresholds-configurable.patch | 359 +++ ...e-counter-for-pcie-congestion-events.patch | 89 + ...t-mlx5-fix-typo-in-pci-irq-c-comment.patch | 42 + ...actor-devcom-to-use-match-attributes.patch | 330 +++ ...ove-devcom-registration-to-lag-layer.patch | 145 + ...-add-net-namespace-support-to-devcom.patch | 149 + ...t-mlx5-lag-add-net-namespace-support.patch | 131 + ...ertion-fields-from-wqe-ether-segment.patch | 56 + ...-refactor-macsec-wqe-metadata-shifts.patch | 152 + ...tadata-conflicts-between-timestampin.patch | 79 + ...5-fix-typo-of-mlx5-eq-doorbel-offset.patch | 50 + ...used-offset-field-from-mlx5-sq-bfreg.patch | 92 + ...xsk-param-of-mlx5e-build-xdpsq-param.patch | 78 + ...ore-the-global-doorbell-in-mlx5-priv.patch | 371 +++ ...pare-for-using-multiple-tx-doorbells.patch | 194 ++ ...are-for-using-different-cq-doorbells.patch | 168 ++ ...-net-mlx5e-use-multiple-tx-doorbells.patch | 142 + ...-net-mlx5e-use-multiple-cq-doorbells.patch | 106 + ...-use-the-num-doorbells-devlink-param.patch | 140 + ...igned-for-mlx5e-get-max-num-channels.patch | 50 + ...r-access-and-odp-page-fault-counters.patch | 58 + ...s-to-match-on-undecrypted-esp-packet.patch | 345 +++ ...ate-decrypted-packets-into-ttc-table.patch | 121 + ...s-for-the-packets-decrypted-by-crypt.patch | 275 ++ ...-rules-for-the-decrypted-esp-packets.patch | 402 +++ ...move-dead-code-from-total-vfs-setter.patch | 70 + ...-format-specifier-for-error-pointers.patch | 553 ++++ ...r-access-and-odp-page-fault-counters.patch | 84 + ...-ifc-bit-for-tir-sq-order-capability.patch | 46 + ...balance-id-and-lag-per-mp-group-bits.patch | 58 + ...r-command-response-if-interface-goes.patch | 56 + ...eclaim-race-during-command-interface.patch | 59 + ...mlx5-fw-reset-add-reset-timeout-work.patch | 99 + ...ombining-test-reliability-for-arm64-.patch | 164 ++ ...mlx5-hws-generalize-complex-matchers.patch | 2553 +++++++++++++++++ ...ng-switchdev-mode-with-inconsistent-.patch | 105 + ...or-messages-with-actual-depth-values.patch | 68 + ...nused-mdev-param-from-rss-indir-init.patch | 104 + ...lx5e-introduce-mlx5e-rss-init-params.patch | 288 ++ ...x5e-rss-params-for-rss-configuration.patch | 247 ++ ...lx5e-use-extack-in-set-rxfh-callback.patch | 85 + ...mode-conflicts-between-fdb-and-nic-i.patch | 134 + ...-reformat-when-tunnel-mode-not-allow.patch | 188 ++ ...ix-pre-2-40-binutils-assembler-error.patch | 49 + ...ad-of-0-in-invalid-case-in-mlx5e-mpw.patch | 67 + ...ing-skb-from-non-linear-xdp-buff-for.patch | 70 + ...ing-skb-from-non-linear-xdp-buff-for.patch | 122 + ...hcr-to-pcam-supported-registers-mask.patch | 43 + ...tor-devcom-to-return-null-on-failure.patch | 302 ++ ...x5-fix-ipsec-cleanup-over-mpv-device.patch | 201 ++ ...ser-count-when-destroying-fdb-tables.patch | 82 + ...ue-in-case-of-module-eeprom-read-err.patch | 77 + ...ror-assignment-in-mlx5e-xfrm-add-sta.patch | 47 + ...the-length-of-the-num-doorbell-error.patch | 45 + ...raparound-in-threshold-between-units.patch | 60 + ...-in-rate-limiting-for-values-above-2.patch | 65 + ...potentially-misleading-debug-message.patch | 64 + ...mlx5-fix-default-values-in-create-cq.patch | 298 ++ ...-new-irq-glue-on-request-irq-failure.patch | 163 ++ ...ix-validation-logic-in-rate-limiting.patch | 65 + ...le-data-direct-with-relaxed-ordering.patch | 141 + ...imate-max-qp-wr-to-reflect-wqe-count.patch | 117 + ...port-loopback-forcing-for-mpv-device.patch | 118 + ...size-bitmap-calculation-for-ksm-mode.patch | 50 + ...-format-specifier-for-error-pointers.patch | 140 + ...query-vports-mac-address-from-device.patch | 223 ++ ...se-common-mlx5-same-hw-devs-function.patch | 77 + ...are-system-image-guid-infrastructure.patch | 424 +++ ...x5-refactor-ptp-clock-devcom-pairing.patch | 84 + ...-net-mlx5-refactor-hca-cap-2-setting.patch | 78 + ...id-support-for-lag-multiplane-groups.patch | 86 + ...nt-tstamp-pointer-from-channel-struc.patch | 139 + ...sary-tstamp-local-variable-in-mlx5i-.patch | 58 + ...rename-hwstamp-functions-to-hwtstamp.patch | 93 + ...-timestamp-fields-to-hwtstamp-config.patch | 194 ++ ...rt-to-new-hwtstamp-get-set-interface.patch | 322 +++ ...on-structures-for-self-loopback-prev.patch | 136 + ...-use-tir-api-in-mlx5e-modify-tirs-lb.patch | 152 + ...self-loopback-prevention-bits-on-tir.patch | 107 + ...self-loopback-prevention-in-tir-init.patch | 55 + ...y-tir-loopback-configuration-if-not-.patch | 52 + ...els-as-argument-to-mlx5e-switch-priv.patch | 122 + ...-closure-to-reduce-interface-down-ti.patch | 67 + ...tph-expose-pcie-tph-get-st-table-loc.patch | 87 + ...-add-direct-st-mode-support-for-rdma.patch | 107 + ...x5-add-other-eswitch-hw-capabilities.patch | 173 ++ ...-eswitch-support-for-steering-tables.patch | 203 ++ ...set-non-default-device-per-namespace.patch | 168 ++ ...d-support-for-dynamic-enable-disable.patch | 247 ++ ...switch-support-eswitch-inactive-mode.patch | 471 +++ ...se-definition-for-1600gbps-link-mode.patch | 39 + ...mlx5-extract-grxrings-from-get-rxnfc.patch | 119 + ...-query-error-handling-to-return-stat.patch | 234 ++ ...-on-excessive-ptp-tx-timestamp-delta.patch | 126 + ...nt-bw-share-minimal-value-assignment.patch | 49 + ...nds-if-all-command-slots-are-stalled.patch | 134 + ...5-use-eopnotsupp-instead-of-enotsupp.patch | 113 + ...itialize-events-outside-devlink-lock.patch | 116 + ...de-notifier-chain-outside-the-devlin.patch | 156 + ...vent-notifier-outside-of-the-devlink.patch | 303 ++ ...table-notifier-outside-the-devlink-l.patch | 295 ++ ...le-notifiers-outside-the-devlink-loc.patch | 278 ++ ...le-notifier-registration-outside-the.patch | 241 ++ ...64-instead-of-u64-in-ieee-setmaxrate.patch | 42 + ...er-limit-mbps-to-upper-limit-100mbps.patch | 58 + ...x-instead-of-hard-coded-magic-number.patch | 42 + ...nit-definitions-for-bandwidth-conver.patch | 95 + ...date-xdp-features-in-switch-channels.patch | 150 + ...t-xdp-target-xmit-with-dummy-program.patch | 88 + ...et-mlx5-make-enable-mpesw-idempotent.patch | 60 + ...le-unregister-of-hca-ports-component.patch | 84 + ...ar-reset-requested-on-drain-fw-reset.patch | 47 + ...-firmware-reset-in-shutdown-callback.patch | 40 + ...er-validate-format-string-parameters.patch | 196 ++ ...acer-handle-escaped-percent-properly.patch | 86 + ...erialize-firmware-reset-with-devlink.patch | 208 ++ ...okup-instead-of-ipv6-dst-lookup-flow.patch | 52 + ...or-resolution-for-unresolved-destina.patch | 63 + ...bql-of-old-txqs-during-channel-recon.patch | 67 + ...ive-priority-for-routes-with-smaller.patch | 59 + ...er-dereference-in-ioctl-module-eepro.patch | 53 + ...-error-message-due-to-invalid-module.patch | 52 + ...h-on-profile-change-rollback-failure.patch | 232 ++ ...mlx5e-priv-in-mlx5e-dev-devlink-priv.patch | 152 + ...-mlx5e-destroy-netdev-instead-of-pri.patch | 163 ++ ...ying-state-bit-after-profile-cleanup.patch | 71 + ...y-leak-in-esw-acl-ingress-lgcy-setup.patch | 48 + ...ding-uplink-netdev-in-switchdev-mode.patch | 159 + ...delete-flows-only-for-existing-peers.patch | 134 + ...-for-netdev-stats-in-ndo-get-stats64.patch | 77 + ...e-mismatch-in-mlx5-esw-vport-vhca-id.patch | 47 + ...-cap-check-in-tx-flow-table-root-dis.patch | 47 + ...d-access-call-trace-use-before-alloc.patch | 157 + ...y-window-setup-for-ipsec-crypto-offl.patch | 53 + ...-device-for-lag-slaves-in-rdma-trans.patch | 138 + ...eswitch-support-for-devx-destruction.patch | 67 + ...rdma-mlx5-refactor-get-prio-function.patch | 151 + ...-eswitch-support-to-userspace-tables.patch | 82 + ...-size-when-5-level-paging-is-enabled.patch | 277 ++ ...mpo-fix-header-mapping-for-64k-pages.patch | 126 + ...mpo-fix-skb-size-check-for-64k-pages.patch | 58 + ...der-formulas-for-higher-mtus-and-64k.patch | 212 ++ ...rict-rtnl-area-to-avoid-a-lock-cycle.patch | 114 + ...peer-miss-rules-host-disabled-checks.patch | 79 + ...ulti-buf-frag-counting-for-legacy-rq.patch | 132 + ...-crash-when-moving-to-switchdev-mode.patch | 150 + ...a-caps-leak-on-notifier-init-failure.patch | 55 + ...ti-buf-frag-counting-for-striding-rq.patch | 148 + ...-vlan-filter-lost-on-add-delete-race.patch | 72 + ...iavf-vlan-is-new-to-iavf-vlan-adding.patch | 87 + ...an-filters-from-pf-on-interface-down.patch | 233 ++ ...rmation-before-removing-vlan-filters.patch | 189 ++ ...d-vlan-to-success-completion-handler.patch | 60 + ...ecording-stale-or-retransmitted-init.patch | 66 + ...tale-init-after-handshake-completion.patch | 52 + ...le-free-on-pvrdma-alloc-ucontext-err.patch | 34 + ...lance-running-cmpxchg-when-balance-i.patch | 187 ++ ...d-serialize-affect-newidle-balancing.patch | 50 + ...g-of-prevent-user-access-and-set-kua.patch | 97 + ...-ancient-workaround-for-gcc-pr-58670.patch | 81 + ...-gcc-bugs-with-asm-goto-with-outputs.patch | 666 +++++ ...-asm-goto-tied-output-test-with-dash.patch | 52 + ...e-workarounds-for-gcc-asm-goto-issue.patch | 127 + ...onfig-gcc-asm-goto-output-workaround.patch | 257 ++ ...-fall-through-in-mlx5-ib-dev-res-srq.patch | 53 + SPECS/kernel.spec | 1162 +++++++- 384 files changed, 61990 insertions(+), 2 deletions(-) create mode 100644 SOURCES/1313-netfilter-flowtable-strictly-check-for-maximum-number-of-act.patch create mode 100644 SOURCES/1314-drm-amd-display-do-not-skip-unrelated-mode-changes-in-dsc-va.patch create mode 100644 SOURCES/1315-ipv6-icmp-clear-skb2-cb-in-ip6-err-gen-icmpv6-unreach.patch create mode 100644 SOURCES/1316-alsa-aloop-fix-peer-runtime-uaf-during-format-change-stop.patch create mode 100644 SOURCES/1317-rdma-iwcm-fix-workqueue-list-corruption-by-removing-work-lis.patch create mode 100644 SOURCES/1318-binder-use-cred-instead-of-task-for-selinux-checks.patch create mode 100644 SOURCES/1319-locks-fix-toctou-race-when-granting-write-lease.patch create mode 100644 SOURCES/1320-fs-use-a-helper-for-opening-kernel-internal-files.patch create mode 100644 SOURCES/1321-fs-move-kmem-cache-zalloc-into-alloc-empty-file-helpers.patch create mode 100644 SOURCES/1322-fs-use-backing-file-container-for-internal-files-with-fake-f.patch create mode 100644 SOURCES/1323-ovl-enable-fsnotify-events-on-underlying-real-files.patch create mode 100644 SOURCES/1324-fs-move-cleanup-from-init-file-into-its-callers.patch create mode 100644 SOURCES/1325-lsm-constify-the-file-parameter-in-security-binder-transfer-.patch create mode 100644 SOURCES/1326-cachefiles-use-kiocb-start-end-write-helpers.patch create mode 100644 SOURCES/1327-fs-fix-kernel-doc-warnings.patch create mode 100644 SOURCES/1328-fs-rename-mnt-want-drop-write-helpers.patch create mode 100644 SOURCES/1329-fs-get-mnt-writers-count-for-an-open-backing-file-s-real-pat.patch create mode 100644 SOURCES/1330-fs-create-helper-file-user-path-for-user-displayed-mapped-fi.patch create mode 100644 SOURCES/1331-fs-store-real-path-instead-of-fake-path-in-backing-file-f-pa.patch create mode 100644 SOURCES/1332-fs-prepare-for-stackable-filesystems-backing-file-helpers.patch create mode 100644 SOURCES/1333-fs-factor-out-backing-file-read-write-iter-helpers.patch create mode 100644 SOURCES/1334-fs-factor-out-backing-file-splice-read-write-helpers.patch create mode 100644 SOURCES/1335-fs-factor-out-backing-file-mmap-helper.patch create mode 100644 SOURCES/1336-lsm-add-helper-for-blob-allocations.patch create mode 100644 SOURCES/1337-ovl-fix-nested-backing-file-paths.patch create mode 100644 SOURCES/1338-fs-constify-file-ptr-in-backing-file-accessor-helpers.patch create mode 100644 SOURCES/1339-ovl-remove-unneeded-non-const-conversion.patch create mode 100644 SOURCES/1340-ovl-remove-redundant-iocb-dio-caller-comp-clearing.patch create mode 100644 SOURCES/1341-perf-core-fix-mmap-event-path-names-with-backing-files.patch create mode 100644 SOURCES/1342-fs-prepare-for-adding-lsm-blob-to-backing-file.patch create mode 100644 SOURCES/1343-lsm-add-backing-file-lsm-hooks.patch create mode 100644 SOURCES/1344-selinux-fix-overlayfs-mmap-and-mprotect-access-checks.patch create mode 100644 SOURCES/1345-selinux-rhel-only-hotfix-for-execmem-regression.patch create mode 100644 SOURCES/1346-net-mlx5-hws-fix-matcher-action-template-attach.patch create mode 100644 SOURCES/1347-net-mlx5-hws-remove-unused-element-array.patch create mode 100644 SOURCES/1348-net-mlx5-hws-make-pool-single-resource.patch create mode 100644 SOURCES/1349-net-mlx5-hws-refactor-pool-implementation.patch create mode 100644 SOURCES/1350-net-mlx5-hws-cleanup-after-pool-refactoring.patch create mode 100644 SOURCES/1351-net-mlx5-hws-add-fullness-tracking-to-pool.patch create mode 100644 SOURCES/1352-net-mlx5-hws-fix-pool-size-optimization.patch create mode 100644 SOURCES/1353-net-mlx5-hws-implement-action-ste-pool.patch create mode 100644 SOURCES/1354-net-mlx5-hws-use-the-new-action-ste-pool.patch create mode 100644 SOURCES/1355-net-mlx5-hws-cleanup-matcher-action-ste-table.patch create mode 100644 SOURCES/1356-net-mlx5-hws-free-unused-action-ste-tables.patch create mode 100644 SOURCES/1357-net-mlx5-hws-export-action-ste-tables-to-debugfs.patch create mode 100644 SOURCES/1358-net-mlx5e-ethtool-fix-formatting-of-ptp-rq0-csum-complete-ta.patch create mode 100644 SOURCES/1359-net-mlx5-fix-spelling-mistakes-in-mlx5-core-dbg-message-and-.patch create mode 100644 SOURCES/1360-net-mlx5-hws-fix-ip-version-decision.patch create mode 100644 SOURCES/1361-net-mlx5-hws-harden-ip-version-definer-checks.patch create mode 100644 SOURCES/1362-net-mlx5-hws-disallow-matcher-ip-version-mixing.patch create mode 100644 SOURCES/1363-rdma-mlx5-fix-error-flow-upon-firmware-failure-for-rq-destru.patch create mode 100644 SOURCES/1364-net-mlx5-support-software-tx-timestamp.patch create mode 100644 SOURCES/1365-net-mlx5-hws-expose-function-mlx5hws-table-ft-set-next-ft-in.patch create mode 100644 SOURCES/1366-net-mlx5-hws-add-definer-function-to-get-field-name-str.patch create mode 100644 SOURCES/1367-net-mlx5-hws-expose-polling-function-in-header-file.patch create mode 100644 SOURCES/1368-net-mlx5-hws-introduce-isolated-matchers.patch create mode 100644 SOURCES/1369-net-mlx5-hws-support-complex-matchers.patch create mode 100644 SOURCES/1370-net-mlx5-hws-force-rehash-when-rule-insertion-failed.patch create mode 100644 SOURCES/1371-net-mlx5-hws-fix-counting-of-rules-in-the-matcher.patch create mode 100644 SOURCES/1372-net-mlx5-hws-fix-redundant-extension-of-action-templates.patch create mode 100644 SOURCES/1373-net-mlx5-hws-rework-rehash-loop.patch create mode 100644 SOURCES/1374-net-mlx5-hws-dump-bad-completion-details.patch create mode 100644 SOURCES/1375-net-mlx5-use-to-delayed-work.patch create mode 100644 SOURCES/1376-net-mlx5-sws-fix-reformat-id-error-handling.patch create mode 100644 SOURCES/1377-net-mlx5-hws-register-reformat-actions-with-fw.patch create mode 100644 SOURCES/1378-net-mlx5-hws-fix-typo-nope-to-nop.patch create mode 100644 SOURCES/1379-net-mlx5-hws-handle-modify-header-actions-dependency.patch create mode 100644 SOURCES/1380-net-mlx5-core-add-error-handling-inmlx5-query-nic-vport-qkey.patch create mode 100644 SOURCES/1381-net-mlx5e-allow-setting-mac-address-of-representors.patch create mode 100644 SOURCES/1382-net-mlx5-add-error-handling-in-mlx5-query-nic-vport-node-gui.patch create mode 100644 SOURCES/1383-net-mlx5-hws-fix-an-error-code-in-mlx5hws-bwc-rule-create-co.patch create mode 100644 SOURCES/1384-net-mlx5-ensure-fw-pages-are-always-allocated-on-same-numa.patch create mode 100644 SOURCES/1385-net-mlx5-fix-return-value-when-searching-for-existing-flow-g.patch create mode 100644 SOURCES/1386-net-mlx5-hws-init-mutex-on-the-correct-path.patch create mode 100644 SOURCES/1387-net-mlx5-hws-fix-missing-ip-version-handling-in-definer.patch create mode 100644 SOURCES/1388-net-mlx5-hws-make-sure-the-uplink-is-the-last-destination.patch create mode 100644 SOURCES/1389-net-mlx5e-fix-leak-of-geneve-tlv-option-object.patch create mode 100644 SOURCES/1390-net-mlx5-hws-add-error-checking-to-hws-bwc-rule-complex-hash.patch create mode 100644 SOURCES/1391-net-mlx5e-fix-race-between-dim-disable-and-net-dim.patch create mode 100644 SOURCES/1392-net-mlx5e-add-new-prio-for-promiscuous-mode.patch create mode 100644 SOURCES/1393-net-mlx5-correctly-set-gso-size-when-lro-is-used.patch create mode 100644 SOURCES/1394-net-mlx5-fix-memory-leak-in-cmd-exec.patch create mode 100644 SOURCES/1395-net-mlx5-e-switch-fix-peer-miss-rules-to-use-peer-eswitch.patch create mode 100644 SOURCES/1396-rdma-mlx5-convert-timeouts-to-secs-to-jiffies.patch create mode 100644 SOURCES/1397-rdma-mlx5-remove-the-redundant-mlx5-ib-stage-uar-stage.patch create mode 100644 SOURCES/1398-rdma-mlx5-add-support-for-200gbps-per-lane-speeds.patch create mode 100644 SOURCES/1399-rdma-mlx5-avoid-flexible-array-warning.patch create mode 100644 SOURCES/1400-rdma-mlx5-initialize-obj-event-obj-sub-list-before-xa-insert.patch create mode 100644 SOURCES/1401-rdma-mlx5-fix-hw-counters-query-for-non-representor-devices.patch create mode 100644 SOURCES/1402-rdma-mlx5-fix-cc-counters-query-for-mpv.patch create mode 100644 SOURCES/1403-rdma-mlx5-fix-vport-loopback-for-mpv-device.patch create mode 100644 SOURCES/1404-net-mlx5-expose-serial-numbers-in-devlink-info.patch create mode 100644 SOURCES/1405-net-mlx5e-shampo-reorganize-mlx5-rq-shampo-alloc.patch create mode 100644 SOURCES/1406-net-mlx5e-shampo-remove-redundant-params.patch create mode 100644 SOURCES/1407-net-mlx5e-shampo-improve-hw-gro-capability-checking.patch create mode 100644 SOURCES/1408-net-mlx5e-shampo-separate-pool-for-headers.patch create mode 100644 SOURCES/1409-net-mlx5e-implement-queue-mgmt-ops-and-single-channel-swap.patch create mode 100644 SOURCES/1410-net-mlx5e-support-ethtool-tcp-data-split-settings.patch create mode 100644 SOURCES/1411-net-mlx5-fs-add-multiple-prios-to-rdma-transport-steering-do.patch create mode 100644 SOURCES/1412-net-mlx5-small-refactor-for-general-object-capabilities.patch create mode 100644 SOURCES/1413-net-mlx5-add-ifc-bits-for-pcie-congestion-event-object.patch create mode 100644 SOURCES/1414-rdma-mlx5-allocate-ib-device-with-net-namespace-supplied-fro.patch create mode 100644 SOURCES/1415-net-mlx5e-fix-error-handling-in-rq-memory-model-registration.patch create mode 100644 SOURCES/1416-net-mlx5-fs-fix-rdma-transport-init-cleanup-flow.patch create mode 100644 SOURCES/1417-net-mlx5-check-device-memory-pointer-before-usage.patch create mode 100644 SOURCES/1418-net-mlx5-add-no-op-implementation-for-setting-tc-bw-on-rate-.patch create mode 100644 SOURCES/1419-net-mlx5-add-support-for-setting-tc-bw-on-nodes.patch create mode 100644 SOURCES/1420-net-mlx5-add-traffic-class-scheduling-support-for-vport-qos.patch create mode 100644 SOURCES/1421-net-mlx5-manage-tc-arbiter-nodes-and-implement-full-support-.patch create mode 100644 SOURCES/1422-net-mlx5-hws-remove-unused-create-dest-array-parameter.patch create mode 100644 SOURCES/1423-net-mlx5-hws-remove-incorrect-comment.patch create mode 100644 SOURCES/1424-net-mlx5-hws-export-rule-skip-logic.patch create mode 100644 SOURCES/1425-net-mlx5-hws-refactor-rule-skip-logic.patch create mode 100644 SOURCES/1426-net-mlx5-hws-create-stes-directly-from-matcher.patch create mode 100644 SOURCES/1427-net-mlx5-hws-decouple-matcher-rx-and-tx-sizes.patch create mode 100644 SOURCES/1428-net-mlx5-hws-track-matcher-sizes-individually.patch create mode 100644 SOURCES/1429-net-mlx5-hws-rearrange-to-prevent-forward-declaration.patch create mode 100644 SOURCES/1430-net-mlx5-hws-shrink-empty-matchers.patch create mode 100644 SOURCES/1431-net-mlx5-add-hws-as-secondary-steering-mode.patch create mode 100644 SOURCES/1432-net-mlx5-fix-spelling-mistake-disabliing-disabling.patch create mode 100644 SOURCES/1433-eth-mlx5-migrate-to-the-rxfh-context-ops.patch create mode 100644 SOURCES/1434-net-mlx5e-remove-unused-vlan-insertion-logic-in-tx-path.patch create mode 100644 SOURCES/1435-net-mlx5e-ct-extract-a-memcmp-from-a-spinlock-section.patch create mode 100644 SOURCES/1436-net-mlx5e-replace-recursive-vlan-push-handling-with-an-itera.patch create mode 100644 SOURCES/1437-net-mlx5-warn-when-write-combining-is-not-supported.patch create mode 100644 SOURCES/1438-net-mlx5e-rx-remove-unnecessary-rqt-redirects.patch create mode 100644 SOURCES/1439-net-mlx5-expose-hca-capability-bits-for-mkey-max-page-size.patch create mode 100644 SOURCES/1440-rdma-mlx5-fix-umr-modifying-of-mkey-page-size.patch create mode 100644 SOURCES/1441-net-mlx5-expose-disciplined-fr-counter-through-hca-capabilit.patch create mode 100644 SOURCES/1442-net-mlx5-ifc-updates-for-disabled-host-pf.patch create mode 100644 SOURCES/1443-net-mlx5e-create-destroy-pcie-congestion-event-object.patch create mode 100644 SOURCES/1444-net-mlx5e-add-device-pcie-congestion-ethtool-stats.patch create mode 100644 SOURCES/1445-net-mlx5-fix-an-is-err-vs-null-bug-in-esw-qos-move-node.patch create mode 100644 SOURCES/1446-net-mlx5-hws-enable-ipsec-hardware-offload-in-legacy-mode.patch create mode 100644 SOURCES/1447-net-mlx5e-fix-kdoc-warning-on-eswitch-h.patch create mode 100644 SOURCES/1448-net-mlx5e-properly-access-rcu-protected-qdisc-sleeping-varia.patch create mode 100644 SOURCES/1449-net-mlx5-add-ifc-bits-to-support-rss-for-ipsec-offload.patch create mode 100644 SOURCES/1450-net-mlx5-add-ifc-bits-and-enums-for-buf-ownership.patch create mode 100644 SOURCES/1451-net-mlx5-expose-cable-length-field-in-pfcc-register.patch create mode 100644 SOURCES/1452-net-mlx5e-shampo-cleanup-reservation-size-formula.patch create mode 100644 SOURCES/1453-net-mlx5e-shampo-remove-mlx5e-shampo-get-log-hd-entry-size.patch create mode 100644 SOURCES/1454-net-mlx5e-remove-duplicate-mkey-from-shampo-header.patch create mode 100644 SOURCES/1455-pci-tph-expose-pcie-tph-get-st-table-size.patch create mode 100644 SOURCES/1456-net-mlx5-expose-ifc-bits-for-tph.patch create mode 100644 SOURCES/1457-net-mlx5-add-support-for-device-steering-tag.patch create mode 100644 SOURCES/1458-net-mlx5-fix-build-wframe-larger-than-warnings.patch create mode 100644 SOURCES/1459-net-fix-typos.patch create mode 100644 SOURCES/1460-net-mlx5e-clear-read-only-port-buffer-size-in-pbmc-before-up.patch create mode 100644 SOURCES/1461-net-mlx5e-remove-skb-secpath-if-xfrm-state-is-not-found.patch create mode 100644 SOURCES/1462-net-mlx5e-fix-potential-deadlock-by-deferring-rx-timeout-rec.patch create mode 100644 SOURCES/1463-net-mlx5e-support-routed-networks-during-ipsec-macs-initiali.patch create mode 100644 SOURCES/1464-net-mlx5e-expose-tis-via-devlink-tx-reporter-diagnose.patch create mode 100644 SOURCES/1465-net-mlx5-correctly-set-gso-segs-when-lro-is-used.patch create mode 100644 SOURCES/1466-net-mlx5-hws-fix-bad-parameter-in-cq-creation.patch create mode 100644 SOURCES/1467-net-mlx5-hws-fix-simple-rules-rehash-error-flow.patch create mode 100644 SOURCES/1468-net-mlx5-hws-fix-complex-rules-rehash-error-flow.patch create mode 100644 SOURCES/1469-net-mlx5-hws-prevent-rehash-from-filling-up-the-queues.patch create mode 100644 SOURCES/1470-net-mlx5-hws-don-t-rehash-on-every-kind-of-insertion-failure.patch create mode 100644 SOURCES/1471-net-mlx5-hws-fix-table-creation-uid.patch create mode 100644 SOURCES/1472-net-mlx5-ct-use-the-correct-counter-offset.patch create mode 100644 SOURCES/1473-net-mlx5-base-ecvf-devlink-port-attrs-from-0.patch create mode 100644 SOURCES/1474-net-mlx5-remove-default-qos-group-and-attach-vports-directly.patch create mode 100644 SOURCES/1475-net-mlx5e-preserve-tc-bw-during-parent-changes.patch create mode 100644 SOURCES/1476-net-mlx5-destroy-vport-qos-element-when-no-configuration-rem.patch create mode 100644 SOURCES/1477-net-mlx5-fix-qos-reference-leak-in-vport-enable-error-path.patch create mode 100644 SOURCES/1478-net-mlx5-restore-missing-scheduling-node-cleanup-on-vport-en.patch create mode 100644 SOURCES/1479-net-mlx5e-query-fw-for-buffer-ownership.patch create mode 100644 SOURCES/1480-net-mlx5e-preserve-shared-buffer-capacity-during-headroom-up.patch create mode 100644 SOURCES/1481-net-mlx5-hws-fix-memory-leak-in-hws-pool-buddy-init-error-pa.patch create mode 100644 SOURCES/1482-net-mlx5-hws-fix-memory-leak-in-hws-action-get-shared-stc-ni.patch create mode 100644 SOURCES/1483-net-mlx5-hws-fix-uninitialized-variables-in-mlx5hws-pat-calc.patch create mode 100644 SOURCES/1484-net-mlx5-hws-fix-pattern-destruction-in-mlx5hws-pat-get-patt.patch create mode 100644 SOURCES/1485-net-mlx5-reload-auxiliary-drivers-on-fw-activate.patch create mode 100644 SOURCES/1486-net-mlx5-fix-lockdep-assertion-on-sync-reset-unload-event.patch create mode 100644 SOURCES/1487-net-mlx5-nack-sync-reset-when-sfs-are-present.patch create mode 100644 SOURCES/1488-net-mlx5-prevent-flow-steering-mode-changes-in-switchdev-mod.patch create mode 100644 SOURCES/1489-net-mlx5e-set-local-xoff-after-fw-update.patch create mode 100644 SOURCES/1490-net-mlx5e-harden-uplink-netdev-access-against-device-unbind.patch create mode 100644 SOURCES/1491-net-mlx5e-add-a-miss-level-for-ipsec-crypto-offload.patch create mode 100644 SOURCES/1492-net-mlx5-hws-ignore-flow-level-for-multi-dest-table.patch create mode 100644 SOURCES/1493-net-mlx5e-fix-missing-fec-rs-stats-for-rs-544-514-interleave.patch create mode 100644 SOURCES/1494-rdma-mlx5-support-driver-apis-pre-destroy-cq-and-post-destro.patch create mode 100644 SOURCES/1495-rdma-mlx5-add-multiple-priorities-support-to-rdma-transport-.patch create mode 100644 SOURCES/1496-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-flow-creat.patch create mode 100644 SOURCES/1497-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-anchor-cre.patch create mode 100644 SOURCES/1498-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-devx-creat.patch create mode 100644 SOURCES/1499-rdma-mlx5-align-mkc-page-size-capability-check-to-prm.patch create mode 100644 SOURCES/1500-rdma-mlx5-optimize-dmabuf-mkey-page-size.patch create mode 100644 SOURCES/1501-rdma-mlx5-remove-redundant-check-on-err-on-return-expression.patch create mode 100644 SOURCES/1502-rdma-mlx5-fix-returned-type-from-mlx5r-umr-zap-mkey.patch create mode 100644 SOURCES/1503-rdma-mlx5-fix-incorrect-mkey-masking.patch create mode 100644 SOURCES/1504-rdma-mlx5-add-dmah-object-support.patch create mode 100644 SOURCES/1505-rdma-mlx5-add-dmah-support-for-reg-user-mr-reg-user-dmabuf-m.patch create mode 100644 SOURCES/1506-rdma-mlx5-refactor-optional-counters-steering-code.patch create mode 100644 SOURCES/1507-ib-mlx5-fix-obj-type-mismatch-for-srq-event-subscriptions.patch create mode 100644 SOURCES/1508-net-mlx5-don-t-use-pk-through-tracepoints.patch create mode 100644 SOURCES/1509-net-mlx5-mlx5-ifc-add-hardware-definitions-needed-for-adjace.patch create mode 100644 SOURCES/1510-net-mlx5-e-switch-cache-vport-vhca-id-on-first-cap-query.patch create mode 100644 SOURCES/1511-net-mlx5-e-switch-set-query-hca-cap-via-vhca-id.patch create mode 100644 SOURCES/1512-rdma-net-mlx5-export-mlx5-vport-get-vhca-id.patch create mode 100644 SOURCES/1513-net-mlx5-query-to-see-if-host-pf-is-disabled.patch create mode 100644 SOURCES/1514-net-mlx5-support-disabling-host-pfs.patch create mode 100644 SOURCES/1515-net-mlx5e-set-default-burst-period-for-tx-and-rx-reporters.patch create mode 100644 SOURCES/1516-eth-mlx5-remove-kconfig-co-dependency-with-vxlan.patch create mode 100644 SOURCES/1517-net-mlx5-fs-convert-vport-acls-root-namespaces-to-xarray.patch create mode 100644 SOURCES/1518-net-mlx5-e-switch-move-vport-acls-root-namespaces-creation-t.patch create mode 100644 SOURCES/1519-net-mlx5-e-switch-add-support-for-adjacent-functions-vports-.patch create mode 100644 SOURCES/1520-net-mlx5-e-switch-create-acls-root-namespace-for-adjacent-vp.patch create mode 100644 SOURCES/1521-net-mlx5-e-switch-register-representors-for-adjacent-vports.patch create mode 100644 SOURCES/1522-net-mlx5-e-switch-set-representor-attributes-for-adjacent-vf.patch create mode 100644 SOURCES/1523-net-mlx5-dr-hws-use-the-cached-vhca-id-for-this-device.patch create mode 100644 SOURCES/1524-net-mlx5-add-psp-capabilities-structures-and-bits.patch create mode 100644 SOURCES/1525-net-mlx5-extract-mtctr-register-read-logic-into-helper-funct.patch create mode 100644 SOURCES/1526-net-mlx5-support-getcyclesx-and-getcrosscycles.patch create mode 100644 SOURCES/1527-net-mlx5-add-rs-fec-histogram-infrastructure.patch create mode 100644 SOURCES/1528-net-mlx5-implement-cqe-compress-type-via-devlink-params.patch create mode 100644 SOURCES/1529-net-mlx5-implement-devlink-enable-sriov-parameter.patch create mode 100644 SOURCES/1530-net-mlx5-implement-devlink-total-vfs-parameter.patch create mode 100644 SOURCES/1531-net-mlx5e-make-pcie-congestion-event-thresholds-configurable.patch create mode 100644 SOURCES/1532-net-mlx5e-add-stale-counter-for-pcie-congestion-events.patch create mode 100644 SOURCES/1533-net-mlx5-fix-typo-in-pci-irq-c-comment.patch create mode 100644 SOURCES/1534-net-mlx5-refactor-devcom-to-use-match-attributes.patch create mode 100644 SOURCES/1535-net-mlx5-lag-move-devcom-registration-to-lag-layer.patch create mode 100644 SOURCES/1536-net-mlx5-add-net-namespace-support-to-devcom.patch create mode 100644 SOURCES/1537-net-mlx5-lag-add-net-namespace-support.patch create mode 100644 SOURCES/1538-net-mlx5-remove-vlan-insertion-fields-from-wqe-ether-segment.patch create mode 100644 SOURCES/1539-net-mlx5-refactor-macsec-wqe-metadata-shifts.patch create mode 100644 SOURCES/1540-net-mlx5e-prevent-wqe-metadata-conflicts-between-timestampin.patch create mode 100644 SOURCES/1541-net-mlx5-fix-typo-of-mlx5-eq-doorbel-offset.patch create mode 100644 SOURCES/1542-net-mlx5-remove-unused-offset-field-from-mlx5-sq-bfreg.patch create mode 100644 SOURCES/1543-net-mlx5e-remove-unused-xsk-param-of-mlx5e-build-xdpsq-param.patch create mode 100644 SOURCES/1544-net-mlx5-store-the-global-doorbell-in-mlx5-priv.patch create mode 100644 SOURCES/1545-net-mlx5e-prepare-for-using-multiple-tx-doorbells.patch create mode 100644 SOURCES/1546-net-mlx5e-prepare-for-using-different-cq-doorbells.patch create mode 100644 SOURCES/1547-net-mlx5e-use-multiple-tx-doorbells.patch create mode 100644 SOURCES/1548-net-mlx5e-use-multiple-cq-doorbells.patch create mode 100644 SOURCES/1549-net-mlx5e-use-the-num-doorbells-devlink-param.patch create mode 100644 SOURCES/1550-net-mlx5e-use-unsigned-for-mlx5e-get-max-num-channels.patch create mode 100644 SOURCES/1551-net-mlx5-add-uar-access-and-odp-page-fault-counters.patch create mode 100644 SOURCES/1552-net-mlx5-change-ttc-rules-to-match-on-undecrypted-esp-packet.patch create mode 100644 SOURCES/1553-net-mlx5e-recirculate-decrypted-packets-into-ttc-table.patch create mode 100644 SOURCES/1554-net-mlx5e-add-flow-groups-for-the-packets-decrypted-by-crypt.patch create mode 100644 SOURCES/1555-net-mlx5e-add-flow-rules-for-the-decrypted-esp-packets.patch create mode 100644 SOURCES/1556-net-mlx5-remove-dead-code-from-total-vfs-setter.patch create mode 100644 SOURCES/1557-net-mlx5-use-pe-format-specifier-for-error-pointers.patch create mode 100644 SOURCES/1558-net-mlx5-expose-uar-access-and-odp-page-fault-counters.patch create mode 100644 SOURCES/1559-net-mlx5-add-ifc-bit-for-tir-sq-order-capability.patch create mode 100644 SOURCES/1560-net-mlx5-ifc-add-balance-id-and-lag-per-mp-group-bits.patch create mode 100644 SOURCES/1561-net-mlx5-stop-polling-for-command-response-if-interface-goes.patch create mode 100644 SOURCES/1562-net-mlx5-pagealloc-fix-reclaim-race-during-command-interface.patch create mode 100644 SOURCES/1563-net-mlx5-fw-reset-add-reset-timeout-work.patch create mode 100644 SOURCES/1564-net-mlx5-improve-write-combining-test-reliability-for-arm64-.patch create mode 100644 SOURCES/1565-net-mlx5-hws-generalize-complex-matchers.patch create mode 100644 SOURCES/1566-net-mlx5e-prevent-entering-switchdev-mode-with-inconsistent-.patch create mode 100644 SOURCES/1567-net-mlx5-improve-qos-error-messages-with-actual-depth-values.patch create mode 100644 SOURCES/1568-net-mlx5e-remove-unused-mdev-param-from-rss-indir-init.patch create mode 100644 SOURCES/1569-net-mlx5e-introduce-mlx5e-rss-init-params.patch create mode 100644 SOURCES/1570-net-mlx5e-introduce-mlx5e-rss-params-for-rss-configuration.patch create mode 100644 SOURCES/1571-net-mlx5e-use-extack-in-set-rxfh-callback.patch create mode 100644 SOURCES/1572-net-mlx5-prevent-tunnel-mode-conflicts-between-fdb-and-nic-i.patch create mode 100644 SOURCES/1573-net-mlx5e-prevent-tunnel-reformat-when-tunnel-mode-not-allow.patch create mode 100644 SOURCES/1574-net-mlx5-fix-pre-2-40-binutils-assembler-error.patch create mode 100644 SOURCES/1575-net-mlx5e-return-1-instead-of-0-in-invalid-case-in-mlx5e-mpw.patch create mode 100644 SOURCES/1576-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch create mode 100644 SOURCES/1577-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch create mode 100644 SOURCES/1578-net-mlx5-add-pphcr-to-pcam-supported-registers-mask.patch create mode 100644 SOURCES/1579-net-mlx5-refactor-devcom-to-return-null-on-failure.patch create mode 100644 SOURCES/1580-net-mlx5-fix-ipsec-cleanup-over-mpv-device.patch create mode 100644 SOURCES/1581-net-mlx5-don-t-zero-user-count-when-destroying-fdb-tables.patch create mode 100644 SOURCES/1582-net-mlx5e-fix-return-value-in-case-of-module-eeprom-read-err.patch create mode 100644 SOURCES/1583-net-mlx5e-fix-missing-error-assignment-in-mlx5e-xfrm-add-sta.patch create mode 100644 SOURCES/1584-net-mlx5e-trim-the-length-of-the-num-doorbell-error.patch create mode 100644 SOURCES/1585-net-mlx5e-fix-maxrate-wraparound-in-threshold-between-units.patch create mode 100644 SOURCES/1586-net-mlx5e-fix-wraparound-in-rate-limiting-for-values-above-2.patch create mode 100644 SOURCES/1587-net-mlx5e-fix-potentially-misleading-debug-message.patch create mode 100644 SOURCES/1588-mlx5-fix-default-values-in-create-cq.patch create mode 100644 SOURCES/1589-net-mlx5-clean-up-only-new-irq-glue-on-request-irq-failure.patch create mode 100644 SOURCES/1590-net-mlx5e-fix-validation-logic-in-rate-limiting.patch create mode 100644 SOURCES/1591-rdma-mlx5-enable-data-direct-with-relaxed-ordering.patch create mode 100644 SOURCES/1592-rdma-mlx5-better-estimate-max-qp-wr-to-reflect-wqe-count.patch create mode 100644 SOURCES/1593-rdma-mlx5-fix-vport-loopback-forcing-for-mpv-device.patch create mode 100644 SOURCES/1594-rdma-mlx5-fix-page-size-bitmap-calculation-for-ksm-mode.patch create mode 100644 SOURCES/1595-rdma-use-pe-format-specifier-for-error-pointers.patch create mode 100644 SOURCES/1596-rdma-net-mlx5-query-vports-mac-address-from-device.patch create mode 100644 SOURCES/1597-net-mlx5-use-common-mlx5-same-hw-devs-function.patch create mode 100644 SOURCES/1598-net-mlx5-add-software-system-image-guid-infrastructure.patch create mode 100644 SOURCES/1599-net-mlx5-refactor-ptp-clock-devcom-pairing.patch create mode 100644 SOURCES/1600-net-mlx5-refactor-hca-cap-2-setting.patch create mode 100644 SOURCES/1601-net-mlx5-add-balance-id-support-for-lag-multiplane-groups.patch create mode 100644 SOURCES/1602-net-mlx5e-remove-redundant-tstamp-pointer-from-channel-struc.patch create mode 100644 SOURCES/1603-net-mlx5e-remove-unnecessary-tstamp-local-variable-in-mlx5i-.patch create mode 100644 SOURCES/1604-net-mlx5e-rename-hwstamp-functions-to-hwtstamp.patch create mode 100644 SOURCES/1605-net-mlx5e-rename-timestamp-fields-to-hwtstamp-config.patch create mode 100644 SOURCES/1606-net-mlx5e-convert-to-new-hwtstamp-get-set-interface.patch create mode 100644 SOURCES/1607-net-mlx5e-enhance-function-structures-for-self-loopback-prev.patch create mode 100644 SOURCES/1608-net-mlx5e-use-tir-api-in-mlx5e-modify-tirs-lb.patch create mode 100644 SOURCES/1609-net-mlx5e-allow-setting-self-loopback-prevention-bits-on-tir.patch create mode 100644 SOURCES/1610-net-mlx5-ipoib-set-self-loopback-prevention-in-tir-init.patch create mode 100644 SOURCES/1611-net-mlx5e-do-not-re-apply-tir-loopback-configuration-if-not-.patch create mode 100644 SOURCES/1612-net-mlx5e-pass-old-channels-as-argument-to-mlx5e-switch-priv.patch create mode 100644 SOURCES/1613-net-mlx5e-defer-channels-closure-to-reduce-interface-down-ti.patch create mode 100644 SOURCES/1614-pci-tph-expose-pcie-tph-get-st-table-loc.patch create mode 100644 SOURCES/1615-net-mlx5-add-direct-st-mode-support-for-rdma.patch create mode 100644 SOURCES/1616-net-mlx5-add-other-eswitch-hw-capabilities.patch create mode 100644 SOURCES/1617-net-mlx5-fs-add-other-eswitch-support-for-steering-tables.patch create mode 100644 SOURCES/1618-net-mlx5-fs-set-non-default-device-per-namespace.patch create mode 100644 SOURCES/1619-net-mlx5-mpfs-add-support-for-dynamic-enable-disable.patch create mode 100644 SOURCES/1620-net-mlx5-e-switch-support-eswitch-inactive-mode.patch create mode 100644 SOURCES/1621-net-mlx5-expose-definition-for-1600gbps-link-mode.patch create mode 100644 SOURCES/1622-mlx5-extract-grxrings-from-get-rxnfc.patch create mode 100644 SOURCES/1623-net-mlx5-refactor-eeprom-query-error-handling-to-return-stat.patch create mode 100644 SOURCES/1624-net-mlx5e-recover-sq-on-excessive-ptp-tx-timestamp-delta.patch create mode 100644 SOURCES/1625-net-mlx5-remove-redundant-bw-share-minimal-value-assignment.patch create mode 100644 SOURCES/1626-net-mlx5-abort-new-commands-if-all-command-slots-are-stalled.patch create mode 100644 SOURCES/1627-net-mlx5-use-eopnotsupp-instead-of-enotsupp.patch create mode 100644 SOURCES/1628-net-mlx5-initialize-events-outside-devlink-lock.patch create mode 100644 SOURCES/1629-net-mlx5-move-the-esw-mode-notifier-chain-outside-the-devlin.patch create mode 100644 SOURCES/1630-net-mlx5-move-the-vhca-event-notifier-outside-of-the-devlink.patch create mode 100644 SOURCES/1631-net-mlx5-move-the-sf-hw-table-notifier-outside-the-devlink-l.patch create mode 100644 SOURCES/1632-net-mlx5-move-the-sf-table-notifiers-outside-the-devlink-loc.patch create mode 100644 SOURCES/1633-net-mlx5-move-sf-dev-table-notifier-registration-outside-the.patch create mode 100644 SOURCES/1634-net-mlx5e-use-u64-instead-of-u64-in-ieee-setmaxrate.patch create mode 100644 SOURCES/1635-net-mlx5e-rename-upper-limit-mbps-to-upper-limit-100mbps.patch create mode 100644 SOURCES/1636-net-mlx5e-use-u8-max-instead-of-hard-coded-magic-number.patch create mode 100644 SOURCES/1637-net-mlx5e-use-standard-unit-definitions-for-bandwidth-conver.patch create mode 100644 SOURCES/1638-net-mlx5e-update-xdp-features-in-switch-channels.patch create mode 100644 SOURCES/1639-net-mlx5e-support-xdp-target-xmit-with-dummy-program.patch create mode 100644 SOURCES/1640-net-mlx5-make-enable-mpesw-idempotent.patch create mode 100644 SOURCES/1641-net-mlx5-fix-double-unregister-of-hca-ports-component.patch create mode 100644 SOURCES/1642-net-mlx5-fw-reset-clear-reset-requested-on-drain-fw-reset.patch create mode 100644 SOURCES/1643-net-mlx5-drain-firmware-reset-in-shutdown-callback.patch create mode 100644 SOURCES/1644-net-mlx5-fw-tracer-validate-format-string-parameters.patch create mode 100644 SOURCES/1645-net-mlx5-fw-tracer-handle-escaped-percent-properly.patch create mode 100644 SOURCES/1646-net-mlx5-serialize-firmware-reset-with-devlink.patch create mode 100644 SOURCES/1647-net-mlx5e-use-ip6-dst-lookup-instead-of-ipv6-dst-lookup-flow.patch create mode 100644 SOURCES/1648-net-mlx5e-trigger-neighbor-resolution-for-unresolved-destina.patch create mode 100644 SOURCES/1649-net-mlx5e-do-not-update-bql-of-old-txqs-during-channel-recon.patch create mode 100644 SOURCES/1650-net-mlx5-lag-multipath-give-priority-for-routes-with-smaller.patch create mode 100644 SOURCES/1651-net-mlx5e-fix-null-pointer-dereference-in-ioctl-module-eepro.patch create mode 100644 SOURCES/1652-net-mlx5e-don-t-print-error-message-due-to-invalid-module.patch create mode 100644 SOURCES/1653-net-mlx5e-fix-crash-on-profile-change-rollback-failure.patch create mode 100644 SOURCES/1654-net-mlx5e-don-t-store-mlx5e-priv-in-mlx5e-dev-devlink-priv.patch create mode 100644 SOURCES/1655-net-mlx5e-pass-netdev-to-mlx5e-destroy-netdev-instead-of-pri.patch create mode 100644 SOURCES/1656-net-mlx5e-restore-destroying-state-bit-after-profile-cleanup.patch create mode 100644 SOURCES/1657-net-mlx5-fix-memory-leak-in-esw-acl-ingress-lgcy-setup.patch create mode 100644 SOURCES/1658-net-mlx5-fix-unbinding-uplink-netdev-in-switchdev-mode.patch create mode 100644 SOURCES/1659-net-mlx5e-tc-delete-flows-only-for-existing-peers.patch create mode 100644 SOURCES/1660-net-mlx5e-account-for-netdev-stats-in-ndo-get-stats64.patch create mode 100644 SOURCES/1661-net-mlx5-fix-return-type-mismatch-in-mlx5-esw-vport-vhca-id.patch create mode 100644 SOURCES/1662-net-mlx5-fs-fix-inverted-cap-check-in-tx-flow-table-root-dis.patch create mode 100644 SOURCES/1663-net-mlx5-fix-vhca-id-access-call-trace-use-before-alloc.patch create mode 100644 SOURCES/1664-net-mlx5e-skip-esn-replay-window-setup-for-ipsec-crypto-offl.patch create mode 100644 SOURCES/1665-rdma-mlx5-change-default-device-for-lag-slaves-in-rdma-trans.patch create mode 100644 SOURCES/1666-rdma-mlx5-add-other-eswitch-support-for-devx-destruction.patch create mode 100644 SOURCES/1667-rdma-mlx5-refactor-get-prio-function.patch create mode 100644 SOURCES/1668-rdma-mlx5-add-other-eswitch-support-to-userspace-tables.patch create mode 100644 SOURCES/1669-ib-mlx5-reduce-imr-ksm-size-when-5-level-paging-is-enabled.patch create mode 100644 SOURCES/1670-net-mlx5e-shampo-fix-header-mapping-for-64k-pages.patch create mode 100644 SOURCES/1671-net-mlx5e-shampo-fix-skb-size-check-for-64k-pages.patch create mode 100644 SOURCES/1672-net-mlx5e-shampo-fix-header-formulas-for-higher-mtus-and-64k.patch create mode 100644 SOURCES/1673-net-mlx5-qos-restrict-rtnl-area-to-avoid-a-lock-cycle.patch create mode 100644 SOURCES/1674-net-mlx5-fix-peer-miss-rules-host-disabled-checks.patch create mode 100644 SOURCES/1675-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-legacy-rq.patch create mode 100644 SOURCES/1676-net-mlx5-fix-crash-when-moving-to-switchdev-mode.patch create mode 100644 SOURCES/1677-net-mlx5-fix-hca-caps-leak-on-notifier-init-failure.patch create mode 100644 SOURCES/1678-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-striding-rq.patch create mode 100644 SOURCES/1679-iavf-fix-vlan-filter-lost-on-add-delete-race.patch create mode 100644 SOURCES/1680-iavf-rename-iavf-vlan-is-new-to-iavf-vlan-adding.patch create mode 100644 SOURCES/1681-iavf-stop-removing-vlan-filters-from-pf-on-interface-down.patch create mode 100644 SOURCES/1682-iavf-wait-for-pf-confirmation-before-removing-vlan-filters.patch create mode 100644 SOURCES/1683-iavf-add-virtchnl-op-add-vlan-to-success-completion-handler.patch create mode 100644 SOURCES/1684-netfilter-skip-recording-stale-or-retransmitted-init.patch create mode 100644 SOURCES/1685-sctp-discard-stale-init-after-handshake-completion.patch create mode 100644 SOURCES/1686-rdma-vmw-pvrdma-fix-double-free-on-pvrdma-alloc-ucontext-err.patch create mode 100644 SOURCES/1687-sched-fair-skip-sched-balance-running-cmpxchg-when-balance-i.patch create mode 100644 SOURCES/1688-sched-fair-have-sd-serialize-affect-newidle-balancing.patch create mode 100644 SOURCES/1689-powerpc-64-force-inlining-of-prevent-user-access-and-set-kua.patch create mode 100644 SOURCES/1690-compiler-gcc-h-remove-ancient-workaround-for-gcc-pr-58670.patch create mode 100644 SOURCES/1691-work-around-gcc-bugs-with-asm-goto-with-outputs.patch create mode 100644 SOURCES/1692-init-kconfig-fix-cc-has-asm-goto-tied-output-test-with-dash.patch create mode 100644 SOURCES/1693-update-workarounds-for-gcc-asm-goto-issue.patch create mode 100644 SOURCES/1694-init-kconfig-remove-config-gcc-asm-goto-output-workaround.patch create mode 100644 SOURCES/1695-rdma-mlx5-fix-error-path-fall-through-in-mlx5-ib-dev-res-srq.patch diff --git a/SOURCES/1313-netfilter-flowtable-strictly-check-for-maximum-number-of-act.patch b/SOURCES/1313-netfilter-flowtable-strictly-check-for-maximum-number-of-act.patch new file mode 100644 index 000000000..3b186a1e5 --- /dev/null +++ b/SOURCES/1313-netfilter-flowtable-strictly-check-for-maximum-number-of-act.patch @@ -0,0 +1,508 @@ +From 23e3d86443306f1ab3a60ea10e0a8403ecbbdb27 Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Fri, 15 May 2026 17:29:34 +0000 +Subject: [PATCH] netfilter: flowtable: strictly check for maximum number of + actions + +JIRA: https://redhat.atlassian.net/browse/RHEL-176922 +CVE: CVE-2026-43329 + +commit 76522fcdbc3a02b568f5d957f7e66fc194abb893 +Author: Pablo Neira Ayuso +Date: Thu Mar 26 00:17:09 2026 +0100 + + netfilter: flowtable: strictly check for maximum number of actions + + The maximum number of flowtable hardware offload actions in IPv6 is: + + * ethernet mangling (4 payload actions, 2 for each ethernet address) + * SNAT (4 payload actions) + * DNAT (4 payload actions) + * Double VLAN (4 vlan actions, 2 for popping vlan, and 2 for pushing) + for QinQ. + * Redirect (1 action) + + Which makes 17, while the maximum is 16. But act_ct supports for tunnels + actions too. Note that payload action operates at 32-bit word level, so + mangling an IPv6 address takes 4 payload actions. + + Update flow_action_entry_next() calls to check for the maximum number of + supported actions. + + While at it, rise the maximum number of actions per flow from 16 to 24 + so this works fine with IPv6 setups. + + Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") + Reported-by: Hyunwoo Kim + Signed-off-by: Pablo Neira Ayuso + +Signed-off-by: CKI Backport Bot + +diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c +index e59fa3be408c..0f3bccc69b57 100644 +--- a/net/netfilter/nf_flow_table_offload.c ++++ b/net/netfilter/nf_flow_table_offload.c +@@ -13,6 +13,8 @@ + #include + #include + ++#define NF_FLOW_RULE_ACTION_MAX 24 ++ + static struct workqueue_struct *nf_flow_offload_add_wq; + static struct workqueue_struct *nf_flow_offload_del_wq; + static struct workqueue_struct *nf_flow_offload_stats_wq; +@@ -208,7 +210,12 @@ static void flow_offload_mangle(struct flow_action_entry *entry, + static inline struct flow_action_entry * + flow_action_entry_next(struct nf_flow_rule *flow_rule) + { +- int i = flow_rule->rule->action.num_entries++; ++ int i; ++ ++ if (unlikely(flow_rule->rule->action.num_entries >= NF_FLOW_RULE_ACTION_MAX)) ++ return NULL; ++ ++ i = flow_rule->rule->action.num_entries++; + + return &flow_rule->rule->action.entries[i]; + } +@@ -226,6 +233,9 @@ static int flow_offload_eth_src(struct net *net, + u32 mask, val; + u16 val16; + ++ if (!entry0 || !entry1) ++ return -E2BIG; ++ + this_tuple = &flow->tuplehash[dir].tuple; + + switch (this_tuple->xmit_type) { +@@ -276,6 +286,9 @@ static int flow_offload_eth_dst(struct net *net, + u8 nud_state; + u16 val16; + ++ if (!entry0 || !entry1) ++ return -E2BIG; ++ + this_tuple = &flow->tuplehash[dir].tuple; + + switch (this_tuple->xmit_type) { +@@ -317,16 +330,19 @@ static int flow_offload_eth_dst(struct net *net, + return 0; + } + +-static void flow_offload_ipv4_snat(struct net *net, +- const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_ipv4_snat(struct net *net, ++ const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + struct flow_action_entry *entry = flow_action_entry_next(flow_rule); + u32 mask = ~htonl(0xffffffff); + __be32 addr; + u32 offset; + ++ if (!entry) ++ return -E2BIG; ++ + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4.s_addr; +@@ -337,23 +353,27 @@ static void flow_offload_ipv4_snat(struct net *net, + offset = offsetof(struct iphdr, daddr); + break; + default: +- return; ++ return -EOPNOTSUPP; + } + + flow_offload_mangle(entry, FLOW_ACT_MANGLE_HDR_TYPE_IP4, offset, + &addr, &mask); ++ return 0; + } + +-static void flow_offload_ipv4_dnat(struct net *net, +- const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_ipv4_dnat(struct net *net, ++ const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + struct flow_action_entry *entry = flow_action_entry_next(flow_rule); + u32 mask = ~htonl(0xffffffff); + __be32 addr; + u32 offset; + ++ if (!entry) ++ return -E2BIG; ++ + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4.s_addr; +@@ -364,14 +384,15 @@ static void flow_offload_ipv4_dnat(struct net *net, + offset = offsetof(struct iphdr, saddr); + break; + default: +- return; ++ return -EOPNOTSUPP; + } + + flow_offload_mangle(entry, FLOW_ACT_MANGLE_HDR_TYPE_IP4, offset, + &addr, &mask); ++ return 0; + } + +-static void flow_offload_ipv6_mangle(struct nf_flow_rule *flow_rule, ++static int flow_offload_ipv6_mangle(struct nf_flow_rule *flow_rule, + unsigned int offset, + const __be32 *addr, const __be32 *mask) + { +@@ -380,15 +401,20 @@ static void flow_offload_ipv6_mangle(struct nf_flow_rule *flow_rule, + + for (i = 0; i < sizeof(struct in6_addr) / sizeof(u32); i++) { + entry = flow_action_entry_next(flow_rule); ++ if (!entry) ++ return -E2BIG; ++ + flow_offload_mangle(entry, FLOW_ACT_MANGLE_HDR_TYPE_IP6, + offset + i * sizeof(u32), &addr[i], mask); + } ++ ++ return 0; + } + +-static void flow_offload_ipv6_snat(struct net *net, +- const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_ipv6_snat(struct net *net, ++ const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + u32 mask = ~htonl(0xffffffff); + const __be32 *addr; +@@ -404,16 +430,16 @@ static void flow_offload_ipv6_snat(struct net *net, + offset = offsetof(struct ipv6hdr, daddr); + break; + default: +- return; ++ return -EOPNOTSUPP; + } + +- flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); ++ return flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); + } + +-static void flow_offload_ipv6_dnat(struct net *net, +- const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_ipv6_dnat(struct net *net, ++ const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + u32 mask = ~htonl(0xffffffff); + const __be32 *addr; +@@ -429,10 +455,10 @@ static void flow_offload_ipv6_dnat(struct net *net, + offset = offsetof(struct ipv6hdr, saddr); + break; + default: +- return; ++ return -EOPNOTSUPP; + } + +- flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); ++ return flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); + } + + static int flow_offload_l4proto(const struct flow_offload *flow) +@@ -454,15 +480,18 @@ static int flow_offload_l4proto(const struct flow_offload *flow) + return type; + } + +-static void flow_offload_port_snat(struct net *net, +- const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_port_snat(struct net *net, ++ const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + struct flow_action_entry *entry = flow_action_entry_next(flow_rule); + u32 mask, port; + u32 offset; + ++ if (!entry) ++ return -E2BIG; ++ + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + port = ntohs(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_port); +@@ -477,22 +506,26 @@ static void flow_offload_port_snat(struct net *net, + mask = ~htonl(0xffff); + break; + default: +- return; ++ return -EOPNOTSUPP; + } + + flow_offload_mangle(entry, flow_offload_l4proto(flow), offset, + &port, &mask); ++ return 0; + } + +-static void flow_offload_port_dnat(struct net *net, +- const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_port_dnat(struct net *net, ++ const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + struct flow_action_entry *entry = flow_action_entry_next(flow_rule); + u32 mask, port; + u32 offset; + ++ if (!entry) ++ return -E2BIG; ++ + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + port = ntohs(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_port); +@@ -507,20 +540,24 @@ static void flow_offload_port_dnat(struct net *net, + mask = ~htonl(0xffff0000); + break; + default: +- return; ++ return -EOPNOTSUPP; + } + + flow_offload_mangle(entry, flow_offload_l4proto(flow), offset, + &port, &mask); ++ return 0; + } + +-static void flow_offload_ipv4_checksum(struct net *net, +- const struct flow_offload *flow, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_ipv4_checksum(struct net *net, ++ const struct flow_offload *flow, ++ struct nf_flow_rule *flow_rule) + { + u8 protonum = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l4proto; + struct flow_action_entry *entry = flow_action_entry_next(flow_rule); + ++ if (!entry) ++ return -E2BIG; ++ + entry->id = FLOW_ACTION_CSUM; + entry->csum_flags = TCA_CSUM_UPDATE_FLAG_IPV4HDR; + +@@ -532,12 +569,14 @@ static void flow_offload_ipv4_checksum(struct net *net, + entry->csum_flags |= TCA_CSUM_UPDATE_FLAG_UDP; + break; + } ++ ++ return 0; + } + +-static void flow_offload_redirect(struct net *net, +- const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_redirect(struct net *net, ++ const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + const struct flow_offload_tuple *this_tuple, *other_tuple; + struct flow_action_entry *entry; +@@ -555,21 +594,28 @@ static void flow_offload_redirect(struct net *net, + ifindex = other_tuple->iifidx; + break; + default: +- return; ++ return -EOPNOTSUPP; + } + + dev = dev_get_by_index(net, ifindex); + if (!dev) +- return; ++ return -ENODEV; + + entry = flow_action_entry_next(flow_rule); ++ if (!entry) { ++ dev_put(dev); ++ return -E2BIG; ++ } ++ + entry->id = FLOW_ACTION_REDIRECT; + entry->dev = dev; ++ ++ return 0; + } + +-static void flow_offload_encap_tunnel(const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_encap_tunnel(const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + const struct flow_offload_tuple *this_tuple; + struct flow_action_entry *entry; +@@ -577,7 +623,7 @@ static void flow_offload_encap_tunnel(const struct flow_offload *flow, + + this_tuple = &flow->tuplehash[dir].tuple; + if (this_tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT) +- return; ++ return 0; + + dst = this_tuple->dst_cache; + if (dst && dst->lwtstate) { +@@ -586,15 +632,19 @@ static void flow_offload_encap_tunnel(const struct flow_offload *flow, + tun_info = lwt_tun_info(dst->lwtstate); + if (tun_info && (tun_info->mode & IP_TUNNEL_INFO_TX)) { + entry = flow_action_entry_next(flow_rule); ++ if (!entry) ++ return -E2BIG; + entry->id = FLOW_ACTION_TUNNEL_ENCAP; + entry->tunnel = tun_info; + } + } ++ ++ return 0; + } + +-static void flow_offload_decap_tunnel(const struct flow_offload *flow, +- enum flow_offload_tuple_dir dir, +- struct nf_flow_rule *flow_rule) ++static int flow_offload_decap_tunnel(const struct flow_offload *flow, ++ enum flow_offload_tuple_dir dir, ++ struct nf_flow_rule *flow_rule) + { + const struct flow_offload_tuple *other_tuple; + struct flow_action_entry *entry; +@@ -602,7 +652,7 @@ static void flow_offload_decap_tunnel(const struct flow_offload *flow, + + other_tuple = &flow->tuplehash[!dir].tuple; + if (other_tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT) +- return; ++ return 0; + + dst = other_tuple->dst_cache; + if (dst && dst->lwtstate) { +@@ -611,9 +661,13 @@ static void flow_offload_decap_tunnel(const struct flow_offload *flow, + tun_info = lwt_tun_info(dst->lwtstate); + if (tun_info && (tun_info->mode & IP_TUNNEL_INFO_TX)) { + entry = flow_action_entry_next(flow_rule); ++ if (!entry) ++ return -E2BIG; + entry->id = FLOW_ACTION_TUNNEL_DECAP; + } + } ++ ++ return 0; + } + + static int +@@ -625,8 +679,9 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow, + const struct flow_offload_tuple *tuple; + int i; + +- flow_offload_decap_tunnel(flow, dir, flow_rule); +- flow_offload_encap_tunnel(flow, dir, flow_rule); ++ if (flow_offload_decap_tunnel(flow, dir, flow_rule) < 0 || ++ flow_offload_encap_tunnel(flow, dir, flow_rule) < 0) ++ return -1; + + if (flow_offload_eth_src(net, flow, dir, flow_rule) < 0 || + flow_offload_eth_dst(net, flow, dir, flow_rule) < 0) +@@ -642,6 +697,8 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow, + + if (tuple->encap[i].proto == htons(ETH_P_8021Q)) { + entry = flow_action_entry_next(flow_rule); ++ if (!entry) ++ return -1; + entry->id = FLOW_ACTION_VLAN_POP; + } + } +@@ -655,6 +712,8 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow, + continue; + + entry = flow_action_entry_next(flow_rule); ++ if (!entry) ++ return -1; + + switch (other_tuple->encap[i].proto) { + case htons(ETH_P_PPP_SES): +@@ -680,18 +739,22 @@ int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow, + return -1; + + if (test_bit(NF_FLOW_SNAT, &flow->flags)) { +- flow_offload_ipv4_snat(net, flow, dir, flow_rule); +- flow_offload_port_snat(net, flow, dir, flow_rule); ++ if (flow_offload_ipv4_snat(net, flow, dir, flow_rule) < 0 || ++ flow_offload_port_snat(net, flow, dir, flow_rule) < 0) ++ return -1; + } + if (test_bit(NF_FLOW_DNAT, &flow->flags)) { +- flow_offload_ipv4_dnat(net, flow, dir, flow_rule); +- flow_offload_port_dnat(net, flow, dir, flow_rule); ++ if (flow_offload_ipv4_dnat(net, flow, dir, flow_rule) < 0 || ++ flow_offload_port_dnat(net, flow, dir, flow_rule) < 0) ++ return -1; + } + if (test_bit(NF_FLOW_SNAT, &flow->flags) || + test_bit(NF_FLOW_DNAT, &flow->flags)) +- flow_offload_ipv4_checksum(net, flow, flow_rule); ++ if (flow_offload_ipv4_checksum(net, flow, flow_rule) < 0) ++ return -1; + +- flow_offload_redirect(net, flow, dir, flow_rule); ++ if (flow_offload_redirect(net, flow, dir, flow_rule) < 0) ++ return -1; + + return 0; + } +@@ -705,22 +768,23 @@ int nf_flow_rule_route_ipv6(struct net *net, struct flow_offload *flow, + return -1; + + if (test_bit(NF_FLOW_SNAT, &flow->flags)) { +- flow_offload_ipv6_snat(net, flow, dir, flow_rule); +- flow_offload_port_snat(net, flow, dir, flow_rule); ++ if (flow_offload_ipv6_snat(net, flow, dir, flow_rule) < 0 || ++ flow_offload_port_snat(net, flow, dir, flow_rule) < 0) ++ return -1; + } + if (test_bit(NF_FLOW_DNAT, &flow->flags)) { +- flow_offload_ipv6_dnat(net, flow, dir, flow_rule); +- flow_offload_port_dnat(net, flow, dir, flow_rule); ++ if (flow_offload_ipv6_dnat(net, flow, dir, flow_rule) < 0 || ++ flow_offload_port_dnat(net, flow, dir, flow_rule) < 0) ++ return -1; + } + +- flow_offload_redirect(net, flow, dir, flow_rule); ++ if (flow_offload_redirect(net, flow, dir, flow_rule) < 0) ++ return -1; + + return 0; + } + EXPORT_SYMBOL_GPL(nf_flow_rule_route_ipv6); + +-#define NF_FLOW_RULE_ACTION_MAX 16 +- + static struct nf_flow_rule * + nf_flow_offload_rule_alloc(struct net *net, + const struct flow_offload_work *offload, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1314-drm-amd-display-do-not-skip-unrelated-mode-changes-in-dsc-va.patch b/SOURCES/1314-drm-amd-display-do-not-skip-unrelated-mode-changes-in-dsc-va.patch new file mode 100644 index 000000000..2672ad9a8 --- /dev/null +++ b/SOURCES/1314-drm-amd-display-do-not-skip-unrelated-mode-changes-in-dsc-va.patch @@ -0,0 +1,112 @@ +From aed3d041ab061ec8a64f50a3edda0f4db7280025 Mon Sep 17 00:00:00 2001 +From: Yussuf Khalil +Date: Fri, 6 Mar 2026 12:06:35 +0000 +Subject: [PATCH] drm/amd/display: Do not skip unrelated mode changes in DSC + validation + +Starting with commit 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in +atomic check"), amdgpu resets the CRTC state mode_changed flag to false when +recomputing the DSC configuration results in no timing change for a particular +stream. + +However, this is incorrect in scenarios where a change in MST/DSC configuration +happens in the same KMS commit as another (unrelated) mode change. For example, +the integrated panel of a laptop may be configured differently (e.g., HDR +enabled/disabled) depending on whether external screens are attached. In this +case, plugging in external DP-MST screens may result in the mode_changed flag +being dropped incorrectly for the integrated panel if its DSC configuration +did not change during precomputation in pre_validate_dsc(). + +At this point, however, dm_update_crtc_state() has already created new streams +for CRTCs with DSC-independent mode changes. In turn, +amdgpu_dm_commit_streams() will never release the old stream, resulting in a +memory leak. amdgpu_dm_atomic_commit_tail() will never acquire a reference to +the new stream either, which manifests as a use-after-free when the stream gets +disabled later on: + +BUG: KASAN: use-after-free in dc_stream_release+0x25/0x90 [amdgpu] +Write of size 4 at addr ffff88813d836524 by task kworker/9:9/29977 + +Workqueue: events drm_mode_rmfb_work_fn +Call Trace: + + dump_stack_lvl+0x6e/0xa0 + print_address_description.constprop.0+0x88/0x320 + ? dc_stream_release+0x25/0x90 [amdgpu] + print_report+0xfc/0x1ff + ? srso_alias_return_thunk+0x5/0xfbef5 + ? __virt_addr_valid+0x225/0x4e0 + ? dc_stream_release+0x25/0x90 [amdgpu] + kasan_report+0xe1/0x180 + ? dc_stream_release+0x25/0x90 [amdgpu] + kasan_check_range+0x125/0x200 + dc_stream_release+0x25/0x90 [amdgpu] + dc_state_destruct+0x14d/0x5c0 [amdgpu] + dc_state_release.part.0+0x4e/0x130 [amdgpu] + dm_atomic_destroy_state+0x3f/0x70 [amdgpu] + drm_atomic_state_default_clear+0x8ee/0xf30 + ? drm_mode_object_put.part.0+0xb1/0x130 + __drm_atomic_state_free+0x15c/0x2d0 + atomic_remove_fb+0x67e/0x980 + +Since there is no reliable way of figuring out whether a CRTC has unrelated +mode changes pending at the time of DSC validation, remember the value of the +mode_changed flag from before the point where a CRTC was marked as potentially +affected by a change in DSC configuration. Reset the mode_changed flag to this +earlier value instead in pre_validate_dsc(). + +Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5004 +Fixes: 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in atomic check") +Signed-off-by: Yussuf Khalil +Reviewed-by: Harry Wentland +Signed-off-by: Alex Deucher +(cherry picked from commit cc7c7121ae082b7b82891baa7280f1ff2608f22b) + +diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +index 085cc98bd875..a9c398b1516b 100644 +--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c ++++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +@@ -12523,6 +12523,11 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev, + } + + if (dc_resource_is_dsc_encoding_supported(dc)) { ++ for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { ++ dm_new_crtc_state = to_dm_crtc_state(new_crtc_state); ++ dm_new_crtc_state->mode_changed_independent_from_dsc = new_crtc_state->mode_changed; ++ } ++ + for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { + if (drm_atomic_crtc_needs_modeset(new_crtc_state)) { + ret = add_affected_mst_dsc_crtcs(state, crtc); +diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +index 800813671748..d15812d51d72 100644 +--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h ++++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +@@ -984,6 +984,7 @@ struct dm_crtc_state { + + bool freesync_vrr_info_changed; + ++ bool mode_changed_independent_from_dsc; + bool dsc_force_changed; + bool vrr_supported; + struct mod_freesync_config freesync_config; +diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +index 7be50e8c0636..5d8c4c7020b1 100644 +--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c ++++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +@@ -1744,9 +1744,11 @@ int pre_validate_dsc(struct drm_atomic_state *state, + int ind = find_crtc_index_in_state_by_stream(state, stream); + + if (ind >= 0) { ++ struct dm_crtc_state *dm_new_crtc_state = to_dm_crtc_state(state->crtcs[ind].new_state); ++ + DRM_INFO_ONCE("%s:%d MST_DSC no mode changed for stream 0x%p\n", + __func__, __LINE__, stream); +- state->crtcs[ind].new_state->mode_changed = 0; ++ dm_new_crtc_state->base.mode_changed = dm_new_crtc_state->mode_changed_independent_from_dsc; + } + } + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1315-ipv6-icmp-clear-skb2-cb-in-ip6-err-gen-icmpv6-unreach.patch b/SOURCES/1315-ipv6-icmp-clear-skb2-cb-in-ip6-err-gen-icmpv6-unreach.patch new file mode 100644 index 000000000..8c6c89bdf --- /dev/null +++ b/SOURCES/1315-ipv6-icmp-clear-skb2-cb-in-ip6-err-gen-icmpv6-unreach.patch @@ -0,0 +1,63 @@ +From 0452b6526b2f54b2413b9cb4ff1ea2ac542c99c7 Mon Sep 17 00:00:00 2001 +From: Eric Dumazet +Date: Thu, 26 Mar 2026 20:26:08 +0000 +Subject: [PATCH] ipv6: icmp: clear skb2->cb[] in ip6_err_gen_icmpv6_unreach() + +[ Upstream commit 86ab3e55673a7a49a841838776f1ab18d23a67b5 ] + +Sashiko AI-review observed: + + In ip6_err_gen_icmpv6_unreach(), the skb is an outer IPv4 ICMP error packet + where its cb contains an IPv4 inet_skb_parm. When skb is cloned into skb2 + and passed to icmp6_send(), it uses IP6CB(skb2). + + IP6CB interprets the IPv4 inet_skb_parm as an inet6_skb_parm. The cipso + offset in inet_skb_parm.opt directly overlaps with dsthao in inet6_skb_parm + at offset 18. + + If an attacker sends a forged ICMPv4 error with a CIPSO IP option, dsthao + would be a non-zero offset. Inside icmp6_send(), mip6_addr_swap() is called + and uses ipv6_find_tlv(skb, opt->dsthao, IPV6_TLV_HAO). + + This would scan the inner, attacker-controlled IPv6 packet starting at that + offset, potentially returning a fake TLV without checking if the remaining + packet length can hold the full 18-byte struct ipv6_destopt_hao. + + Could mip6_addr_swap() then perform a 16-byte swap that extends past the end + of the packet data into skb_shared_info? + + Should the cb array also be cleared in ip6_err_gen_icmpv6_unreach() and + ip6ip6_err() to prevent this? + +This patch implements the first suggestion. + +I am not sure if ip6ip6_err() needs to be changed. +A separate patch would be better anyway. + +Fixes: ca15a078bd90 ("sit: generate icmpv6 error when receiving icmpv4 error") +Reported-by: Ido Schimmel +Closes: https://sashiko.dev/#/patchset/20260326155138.2429480-1-edumazet%40google.com +Signed-off-by: Eric Dumazet +Cc: Oskar Kjos +Reviewed-by: Ido Schimmel +Link: https://patch.msgid.link/20260326202608.2976021-1-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin + +diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c +index 8601c76f3cc9..6f053874de74 100644 +--- a/net/ipv6/icmp.c ++++ b/net/ipv6/icmp.c +@@ -674,6 +674,9 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type, + if (!skb2) + return 1; + ++ /* Remove debris left by IPv4 stack. */ ++ memset(IP6CB(skb2), 0, sizeof(*IP6CB(skb2))); ++ + skb_dst_drop(skb2); + skb_pull(skb2, nhs); + skb_reset_network_header(skb2); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1316-alsa-aloop-fix-peer-runtime-uaf-during-format-change-stop.patch b/SOURCES/1316-alsa-aloop-fix-peer-runtime-uaf-during-format-change-stop.patch new file mode 100644 index 000000000..be0a77bc8 --- /dev/null +++ b/SOURCES/1316-alsa-aloop-fix-peer-runtime-uaf-during-format-change-stop.patch @@ -0,0 +1,137 @@ +From dc9c57624e89fab59f90148b663d5171e0fa2416 Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Wed, 27 May 2026 17:14:34 +0000 +Subject: [PATCH] ALSA: aloop: Fix peer runtime UAF during format-change stop +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-179312 +CVE: CVE-2026-46090 +Backported from tree(s): linux + +commit e5c33cdc6f402eab8abd36ecf436b22c9d3a8aff +Author: Cássio Gabriel +Date: Fri Apr 24 09:48:41 2026 -0300 + + ALSA: aloop: Fix peer runtime UAF during format-change stop + + loopback_check_format() may stop the capture side when playback starts + with parameters that no longer match a running capture stream. Commit + 826af7fa62e3 ("ALSA: aloop: Fix racy access at PCM trigger") moved + the peer lookup under cable->lock, but the actual snd_pcm_stop() still + runs after dropping that lock. + + A concurrent close can clear the capture entry from cable->streams[] and + detach or free its runtime while the playback trigger path still holds a + stale peer substream pointer. + + Keep a per-cable count of in-flight peer stops before dropping + cable->lock, and make free_cable() wait for those stops before + detaching the runtime. This preserves the existing behavior while + making the peer runtime lifetime explicit. + + Reported-by: syzbot+8fa95c41eafbc9d2ff6f@syzkaller.appspotmail.com + Closes: https://syzkaller.appspot.com/bug?extid=8fa95c41eafbc9d2ff6f + Fixes: 597603d615d2 ("ALSA: introduce the snd-aloop module for the PCM loopback") + Cc: stable@vger.kernel.org + Suggested-by: Takashi Iwai + Signed-off-by: Cássio Gabriel + Link: https://patch.msgid.link/20260424-alsa-aloop-peer-stop-uaf-v2-1-94e68101db8a@gmail.com + Signed-off-by: Takashi Iwai + +Signed-off-by: CKI Backport Bot + +diff --git a/sound/drivers/aloop.c b/sound/drivers/aloop.c +index db137222d319..d2b9160a08dd 100644 +--- a/sound/drivers/aloop.c ++++ b/sound/drivers/aloop.c +@@ -99,6 +99,9 @@ struct loopback_ops { + struct loopback_cable { + spinlock_t lock; + struct loopback_pcm *streams[2]; ++ /* in-flight peer stops running outside cable->lock */ ++ atomic_t stop_count; ++ wait_queue_head_t stop_wait; + struct snd_pcm_hardware hw; + /* flags */ + unsigned int valid; +@@ -366,8 +369,11 @@ static int loopback_check_format(struct loopback_cable *cable, int stream) + return 0; + if (stream == SNDRV_PCM_STREAM_CAPTURE) + return -EIO; +- else if (cruntime->state == SNDRV_PCM_STATE_RUNNING) ++ else if (cruntime->state == SNDRV_PCM_STATE_RUNNING) { ++ /* close must not free the peer runtime below */ ++ atomic_inc(&cable->stop_count); + stop_capture = true; ++ } + } + + setup = get_setup(dpcm_play); +@@ -396,8 +402,11 @@ static int loopback_check_format(struct loopback_cable *cable, int stream) + } + } + +- if (stop_capture) ++ if (stop_capture) { + snd_pcm_stop(dpcm_capt->substream, SNDRV_PCM_STATE_DRAINING); ++ if (atomic_dec_and_test(&cable->stop_count)) ++ wake_up(&cable->stop_wait); ++ } + + return 0; + } +@@ -1049,23 +1058,29 @@ static void free_cable(struct snd_pcm_substream *substream) + struct loopback *loopback = substream->private_data; + int dev = get_cable_index(substream); + struct loopback_cable *cable; ++ struct loopback_pcm *dpcm; ++ bool other_alive; + + cable = loopback->cables[substream->number][dev]; + if (!cable) + return; +- if (cable->streams[!substream->stream]) { +- /* other stream is still alive */ +- guard(spinlock_irq)(&cable->lock); +- cable->streams[substream->stream] = NULL; +- } else { +- struct loopback_pcm *dpcm = substream->runtime->private_data; + +- if (cable->ops && cable->ops->close_cable && dpcm) +- cable->ops->close_cable(dpcm); +- /* free the cable */ +- loopback->cables[substream->number][dev] = NULL; +- kfree(cable); ++ scoped_guard(spinlock_irq, &cable->lock) { ++ cable->streams[substream->stream] = NULL; ++ other_alive = cable->streams[!substream->stream]; + } ++ ++ /* Pair with the stop_count increment in loopback_check_format(). */ ++ wait_event(cable->stop_wait, !atomic_read(&cable->stop_count)); ++ if (other_alive) ++ return; ++ ++ dpcm = substream->runtime->private_data; ++ if (cable->ops && cable->ops->close_cable && dpcm) ++ cable->ops->close_cable(dpcm); ++ /* free the cable */ ++ loopback->cables[substream->number][dev] = NULL; ++ kfree(cable); + } + + static int loopback_jiffies_timer_open(struct loopback_pcm *dpcm) +@@ -1260,6 +1275,8 @@ static int loopback_open(struct snd_pcm_substream *substream) + goto unlock; + } + spin_lock_init(&cable->lock); ++ atomic_set(&cable->stop_count, 0); ++ init_waitqueue_head(&cable->stop_wait); + cable->hw = loopback_pcm_hardware; + if (loopback->timer_source) + cable->ops = &loopback_snd_timer_ops; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1317-rdma-iwcm-fix-workqueue-list-corruption-by-removing-work-lis.patch b/SOURCES/1317-rdma-iwcm-fix-workqueue-list-corruption-by-removing-work-lis.patch new file mode 100644 index 000000000..addd828d2 --- /dev/null +++ b/SOURCES/1317-rdma-iwcm-fix-workqueue-list-corruption-by-removing-work-lis.patch @@ -0,0 +1,208 @@ +From efd0aa1426972ae0542b15484850fdd73395262f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Tue, 28 Apr 2026 13:24:18 -0400 +Subject: [PATCH] RDMA/iwcm: Fix workqueue list corruption by removing + work_list + +JIRA: https://redhat.atlassian.net/browse/RHEL-163491 + +commit 7874eeacfa42177565c01d5198726671acf7adf2 +Author: Jacob Moroni +Date: Mon Jan 12 02:00:06 2026 +0000 + + RDMA/iwcm: Fix workqueue list corruption by removing work_list + + The commit e1168f0 ("RDMA/iwcm: Simplify cm_event_handler()") + changed the work submission logic to unconditionally call + queue_work() with the expectation that queue_work() would + have no effect if work was already pending. The problem is + that a free list of struct iwcm_work is used (for which + struct work_struct is embedded), so each call to queue_work() + is basically unique and therefore does indeed queue the work. + + This causes a problem in the work handler which walks the work_list + until it's empty to process entries. This means that a single + run of the work handler could process item N+1 and release it + back to the free list while the actual workqueue entry is still + queued. It could then get reused (INIT_WORK...) and lead to + list corruption in the workqueue logic. + + Fix this by just removing the work_list. The workqueue already + does this for us. + + This fixes the following error that was observed when stress + testing with ucmatose on an Intel E830 in iWARP mode: + + [ 151.465780] list_del corruption. next->prev should be ffff9f0915c69c08, but was ffff9f0a1116be08. (next=ffff9f0a15b11c08) + [ 151.466639] ------------[ cut here ]------------ + [ 151.466986] kernel BUG at lib/list_debug.c:67! + [ 151.467349] Oops: invalid opcode: 0000 [#1] SMP NOPTI + [ 151.467753] CPU: 14 UID: 0 PID: 2306 Comm: kworker/u64:18 Not tainted 6.19.0-rc4+ #1 PREEMPT(voluntary) + [ 151.468466] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 + [ 151.469192] Workqueue: 0x0 (iw_cm_wq) + [ 151.469478] RIP: 0010:__list_del_entry_valid_or_report+0xf0/0x100 + [ 151.469942] Code: c7 58 5f 4c b2 e8 10 50 aa ff 0f 0b 48 89 ef e8 36 57 cb ff 48 8b 55 08 48 89 e9 48 89 de 48 c7 c7 a8 5f 4c b2 e8 f0 4f aa ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 + [ 151.471323] RSP: 0000:ffffb15644e7bd68 EFLAGS: 00010046 + [ 151.471712] RAX: 000000000000006d RBX: ffff9f0915c69c08 RCX: 0000000000000027 + [ 151.472243] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f0a37d9c600 + [ 151.472768] RBP: ffff9f0a15b11c08 R08: 0000000000000000 R09: c0000000ffff7fff + [ 151.473294] R10: 0000000000000001 R11: ffffb15644e7bba8 R12: ffff9f092339ee68 + [ 151.473817] R13: ffff9f0900059c28 R14: ffff9f092339ee78 R15: 0000000000000000 + [ 151.474344] FS: 0000000000000000(0000) GS:ffff9f0a847b5000(0000) knlGS:0000000000000000 + [ 151.474934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + [ 151.475362] CR2: 0000559e233a9088 CR3: 000000020296b004 CR4: 0000000000770ef0 + [ 151.475895] PKRU: 55555554 + [ 151.476118] Call Trace: + [ 151.476331] + [ 151.476497] move_linked_works+0x49/0xa0 + [ 151.476792] __pwq_activate_work.isra.46+0x2f/0xa0 + [ 151.477151] pwq_dec_nr_in_flight+0x1e0/0x2f0 + [ 151.477479] process_scheduled_works+0x1c8/0x410 + [ 151.477823] worker_thread+0x125/0x260 + [ 151.478108] ? __pfx_worker_thread+0x10/0x10 + [ 151.478430] kthread+0xfe/0x240 + [ 151.478671] ? __pfx_kthread+0x10/0x10 + [ 151.478955] ? __pfx_kthread+0x10/0x10 + [ 151.479240] ret_from_fork+0x208/0x270 + [ 151.479523] ? __pfx_kthread+0x10/0x10 + [ 151.479806] ret_from_fork_asm+0x1a/0x30 + [ 151.480103] + + Fixes: e1168f09b331 ("RDMA/iwcm: Simplify cm_event_handler()") + Signed-off-by: Jacob Moroni + Link: https://patch.msgid.link/20260112020006.1352438-1-jmoroni@google.com + Reviewed-by: Bart Van Assche + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c +index 9419ab4435df..a2cf6135fcde 100644 +--- a/drivers/infiniband/core/iwcm.c ++++ b/drivers/infiniband/core/iwcm.c +@@ -95,7 +95,6 @@ static struct workqueue_struct *iwcm_wq; + struct iwcm_work { + struct work_struct work; + struct iwcm_id_private *cm_id; +- struct list_head list; + struct iw_cm_event event; + struct list_head free_list; + }; +@@ -179,7 +178,6 @@ static int alloc_work_entries(struct iwcm_id_private *cm_id_priv, int count) + return -ENOMEM; + } + work->cm_id = cm_id_priv; +- INIT_LIST_HEAD(&work->list); + put_work(work); + } + return 0; +@@ -214,7 +212,6 @@ static void free_cm_id(struct iwcm_id_private *cm_id_priv) + static bool iwcm_deref_id(struct iwcm_id_private *cm_id_priv) + { + if (refcount_dec_and_test(&cm_id_priv->refcount)) { +- BUG_ON(!list_empty(&cm_id_priv->work_list)); + free_cm_id(cm_id_priv); + return true; + } +@@ -261,7 +258,6 @@ struct iw_cm_id *iw_create_cm_id(struct ib_device *device, + refcount_set(&cm_id_priv->refcount, 1); + init_waitqueue_head(&cm_id_priv->connect_wait); + init_completion(&cm_id_priv->destroy_comp); +- INIT_LIST_HEAD(&cm_id_priv->work_list); + INIT_LIST_HEAD(&cm_id_priv->work_free_list); + + return &cm_id_priv->id; +@@ -1008,13 +1004,13 @@ static int process_event(struct iwcm_id_private *cm_id_priv, + } + + /* +- * Process events on the work_list for the cm_id. If the callback +- * function requests that the cm_id be deleted, a flag is set in the +- * cm_id flags to indicate that when the last reference is +- * removed, the cm_id is to be destroyed. This is necessary to +- * distinguish between an object that will be destroyed by the app +- * thread asleep on the destroy_comp list vs. an object destroyed +- * here synchronously when the last reference is removed. ++ * Process events for the cm_id. If the callback function requests ++ * that the cm_id be deleted, a flag is set in the cm_id flags to ++ * indicate that when the last reference is removed, the cm_id is ++ * to be destroyed. This is necessary to distinguish between an ++ * object that will be destroyed by the app thread asleep on the ++ * destroy_comp list vs. an object destroyed here synchronously ++ * when the last reference is removed. + */ + static void cm_work_handler(struct work_struct *_work) + { +@@ -1025,35 +1021,26 @@ static void cm_work_handler(struct work_struct *_work) + int ret = 0; + + spin_lock_irqsave(&cm_id_priv->lock, flags); +- while (!list_empty(&cm_id_priv->work_list)) { +- work = list_first_entry(&cm_id_priv->work_list, +- struct iwcm_work, list); +- list_del_init(&work->list); +- levent = work->event; +- put_work(work); +- spin_unlock_irqrestore(&cm_id_priv->lock, flags); +- +- if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) { +- ret = process_event(cm_id_priv, &levent); +- if (ret) { +- destroy_cm_id(&cm_id_priv->id); +- WARN_ON_ONCE(iwcm_deref_id(cm_id_priv)); +- } +- } else +- pr_debug("dropping event %d\n", levent.event); +- if (iwcm_deref_id(cm_id_priv)) +- return; +- spin_lock_irqsave(&cm_id_priv->lock, flags); +- } ++ levent = work->event; ++ put_work(work); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); ++ ++ if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) { ++ ret = process_event(cm_id_priv, &levent); ++ if (ret) { ++ destroy_cm_id(&cm_id_priv->id); ++ WARN_ON_ONCE(iwcm_deref_id(cm_id_priv)); ++ } ++ } else ++ pr_debug("dropping event %d\n", levent.event); ++ if (iwcm_deref_id(cm_id_priv)) ++ return; + } + + /* + * This function is called on interrupt context. Schedule events on + * the iwcm_wq thread to allow callback functions to downcall into +- * the CM and/or block. Events are queued to a per-CM_ID +- * work_list. If this is the first event on the work_list, the work +- * element is also queued on the iwcm_wq thread. ++ * the CM and/or block. + * + * Each event holds a reference on the cm_id. Until the last posted + * event has been delivered and processed, the cm_id cannot be +@@ -1095,7 +1082,6 @@ static int cm_event_handler(struct iw_cm_id *cm_id, + } + + refcount_inc(&cm_id_priv->refcount); +- list_add_tail(&work->list, &cm_id_priv->work_list); + queue_work(iwcm_wq, &work->work); + out: + spin_unlock_irqrestore(&cm_id_priv->lock, flags); +diff --git a/drivers/infiniband/core/iwcm.h b/drivers/infiniband/core/iwcm.h +index bf74639be128..b56fb12edece 100644 +--- a/drivers/infiniband/core/iwcm.h ++++ b/drivers/infiniband/core/iwcm.h +@@ -50,7 +50,6 @@ struct iwcm_id_private { + struct ib_qp *qp; + struct completion destroy_comp; + wait_queue_head_t connect_wait; +- struct list_head work_list; + spinlock_t lock; + refcount_t refcount; + struct list_head work_free_list; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1318-binder-use-cred-instead-of-task-for-selinux-checks.patch b/SOURCES/1318-binder-use-cred-instead-of-task-for-selinux-checks.patch new file mode 100644 index 000000000..739b88315 --- /dev/null +++ b/SOURCES/1318-binder-use-cred-instead-of-task-for-selinux-checks.patch @@ -0,0 +1,332 @@ +From 3a8d8e4f0e85b288fc43ec65f8ffac858af16239 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 15:16:45 +0200 +Subject: [PATCH] binder: use cred instead of task for selinux checks + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 52f88693378a58094c538662ba652aff0253c4fe +Author: Todd Kjos +Date: Tue Oct 12 09:56:13 2021 -0700 + + binder: use cred instead of task for selinux checks + + Since binder was integrated with selinux, it has passed + 'struct task_struct' associated with the binder_proc + to represent the source and target of transactions. + The conversion of task to SID was then done in the hook + implementations. It turns out that there are race conditions + which can result in an incorrect security context being used. + + Fix by using the 'struct cred' saved during binder_open and pass + it to the selinux subsystem. + + Cc: stable@vger.kernel.org # 5.14 (need backport for earlier stables) + Fixes: 79af73079d75 ("Add security hooks to binder and implement the hooks for SELinux.") + Suggested-by: Jann Horn + Signed-off-by: Todd Kjos + Acked-by: Casey Schaufler + Signed-off-by: Paul Moore + +Signed-off-by: Ondrej Mosnacek + +diff --git a/drivers/android/binder.c b/drivers/android/binder.c +index 4ef4e2dc47cb..3a01e1862d9e 100644 +--- a/drivers/android/binder.c ++++ b/drivers/android/binder.c +@@ -2049,7 +2049,7 @@ static int binder_translate_binder(struct flat_binder_object *fp, + ret = -EINVAL; + goto done; + } +- if (security_binder_transfer_binder(proc->tsk, target_proc->tsk)) { ++ if (security_binder_transfer_binder(proc->cred, target_proc->cred)) { + ret = -EPERM; + goto done; + } +@@ -2095,7 +2095,7 @@ static int binder_translate_handle(struct flat_binder_object *fp, + proc->pid, thread->pid, fp->handle); + return -EINVAL; + } +- if (security_binder_transfer_binder(proc->tsk, target_proc->tsk)) { ++ if (security_binder_transfer_binder(proc->cred, target_proc->cred)) { + ret = -EPERM; + goto done; + } +@@ -2183,7 +2183,7 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset, + ret = -EBADF; + goto err_fget; + } +- ret = security_binder_transfer_file(proc->tsk, target_proc->tsk, file); ++ ret = security_binder_transfer_file(proc->cred, target_proc->cred, file); + if (ret < 0) { + ret = -EPERM; + goto err_security; +@@ -2588,8 +2588,8 @@ static void binder_transaction(struct binder_proc *proc, + return_error_line = __LINE__; + goto err_invalid_target_handle; + } +- if (security_binder_transaction(proc->tsk, +- target_proc->tsk) < 0) { ++ if (security_binder_transaction(proc->cred, ++ target_proc->cred) < 0) { + return_error = BR_FAILED_REPLY; + return_error_param = -EPERM; + return_error_line = __LINE__; +@@ -4554,7 +4554,7 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp, + ret = -EBUSY; + goto out; + } +- ret = security_binder_set_context_mgr(proc->tsk); ++ ret = security_binder_set_context_mgr(proc->cred); + if (ret < 0) + goto out; + if (uid_valid(context->binder_context_mgr_uid)) { +diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h +index fc89fae1ea60..1c2be7057bd9 100644 +--- a/include/linux/lsm_hook_defs.h ++++ b/include/linux/lsm_hook_defs.h +@@ -26,13 +26,13 @@ + * #undef LSM_HOOK + * }; + */ +-LSM_HOOK(int, 0, binder_set_context_mgr, struct task_struct *mgr) +-LSM_HOOK(int, 0, binder_transaction, struct task_struct *from, +- struct task_struct *to) +-LSM_HOOK(int, 0, binder_transfer_binder, struct task_struct *from, +- struct task_struct *to) +-LSM_HOOK(int, 0, binder_transfer_file, struct task_struct *from, +- struct task_struct *to, struct file *file) ++LSM_HOOK(int, 0, binder_set_context_mgr, const struct cred *mgr) ++LSM_HOOK(int, 0, binder_transaction, const struct cred *from, ++ const struct cred *to) ++LSM_HOOK(int, 0, binder_transfer_binder, const struct cred *from, ++ const struct cred *to) ++LSM_HOOK(int, 0, binder_transfer_file, const struct cred *from, ++ const struct cred *to, struct file *file) + LSM_HOOK(int, 0, ptrace_access_check, struct task_struct *child, + unsigned int mode) + LSM_HOOK(int, 0, ptrace_traceme, struct task_struct *parent) +diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h +index 3f04476cc692..7577ecfc79e4 100644 +--- a/include/linux/lsm_hooks.h ++++ b/include/linux/lsm_hooks.h +@@ -1330,22 +1330,22 @@ + * + * @binder_set_context_mgr: + * Check whether @mgr is allowed to be the binder context manager. +- * @mgr contains the task_struct for the task being registered. ++ * @mgr contains the struct cred for the current binder process. + * Return 0 if permission is granted. + * @binder_transaction: + * Check whether @from is allowed to invoke a binder transaction call + * to @to. +- * @from contains the task_struct for the sending task. +- * @to contains the task_struct for the receiving task. ++ * @from contains the struct cred for the sending process. ++ * @to contains the struct cred for the receiving process. + * @binder_transfer_binder: + * Check whether @from is allowed to transfer a binder reference to @to. +- * @from contains the task_struct for the sending task. +- * @to contains the task_struct for the receiving task. ++ * @from contains the struct cred for the sending process. ++ * @to contains the struct cred for the receiving process. + * @binder_transfer_file: + * Check whether @from is allowed to transfer @file to @to. +- * @from contains the task_struct for the sending task. ++ * @from contains the struct cred for the sending process. + * @file contains the struct file being transferred. +- * @to contains the task_struct for the receiving task. ++ * @to contains the struct cred for the receiving process. + * + * @ptrace_access_check: + * Check permission before allowing the current process to trace the +diff --git a/include/linux/security.h b/include/linux/security.h +index 16f44e78b7e6..3d216c94fd69 100644 +--- a/include/linux/security.h ++++ b/include/linux/security.h +@@ -263,13 +263,13 @@ extern int security_init(void); + extern int early_security_init(void); + + /* Security operations */ +-int security_binder_set_context_mgr(struct task_struct *mgr); +-int security_binder_transaction(struct task_struct *from, +- struct task_struct *to); +-int security_binder_transfer_binder(struct task_struct *from, +- struct task_struct *to); +-int security_binder_transfer_file(struct task_struct *from, +- struct task_struct *to, struct file *file); ++int security_binder_set_context_mgr(const struct cred *mgr); ++int security_binder_transaction(const struct cred *from, ++ const struct cred *to); ++int security_binder_transfer_binder(const struct cred *from, ++ const struct cred *to); ++int security_binder_transfer_file(const struct cred *from, ++ const struct cred *to, struct file *file); + int security_ptrace_access_check(struct task_struct *child, unsigned int mode); + int security_ptrace_traceme(struct task_struct *parent); + int security_capget(struct task_struct *target, +@@ -520,25 +520,25 @@ static inline int early_security_init(void) + return 0; + } + +-static inline int security_binder_set_context_mgr(struct task_struct *mgr) ++static inline int security_binder_set_context_mgr(const struct cred *mgr) + { + return 0; + } + +-static inline int security_binder_transaction(struct task_struct *from, +- struct task_struct *to) ++static inline int security_binder_transaction(const struct cred *from, ++ const struct cred *to) + { + return 0; + } + +-static inline int security_binder_transfer_binder(struct task_struct *from, +- struct task_struct *to) ++static inline int security_binder_transfer_binder(const struct cred *from, ++ const struct cred *to) + { + return 0; + } + +-static inline int security_binder_transfer_file(struct task_struct *from, +- struct task_struct *to, ++static inline int security_binder_transfer_file(const struct cred *from, ++ const struct cred *to, + struct file *file) + { + return 0; +diff --git a/security/security.c b/security/security.c +index 5660bbab9845..2092b657af9f 100644 +--- a/security/security.c ++++ b/security/security.c +@@ -887,25 +887,25 @@ OUT: \ + + /* Security operations */ + +-int security_binder_set_context_mgr(struct task_struct *mgr) ++int security_binder_set_context_mgr(const struct cred *mgr) + { + return call_int_hook(binder_set_context_mgr, mgr); + } + +-int security_binder_transaction(struct task_struct *from, +- struct task_struct *to) ++int security_binder_transaction(const struct cred *from, ++ const struct cred *to) + { + return call_int_hook(binder_transaction, from, to); + } + +-int security_binder_transfer_binder(struct task_struct *from, +- struct task_struct *to) ++int security_binder_transfer_binder(const struct cred *from, ++ const struct cred *to) + { + return call_int_hook(binder_transfer_binder, from, to); + } + +-int security_binder_transfer_file(struct task_struct *from, +- struct task_struct *to, struct file *file) ++int security_binder_transfer_file(const struct cred *from, ++ const struct cred *to, struct file *file) + { + return call_int_hook(binder_transfer_file, from, to, file); + } +diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c +index 9a69a2a4b31d..22173e8e88e2 100644 +--- a/security/selinux/hooks.c ++++ b/security/selinux/hooks.c +@@ -245,29 +245,6 @@ static inline u32 task_sid_obj(const struct task_struct *task) + return sid; + } + +-/* +- * get the security ID of a task for use with binder +- */ +-static inline u32 task_sid_binder(const struct task_struct *task) +-{ +- /* +- * In many case where this function is used we should be using the +- * task's subjective SID, but we can't reliably access the subjective +- * creds of a task other than our own so we must use the objective +- * creds/SID, which are safe to access. The downside is that if a task +- * is temporarily overriding it's creds it will not be reflected here; +- * however, it isn't clear that binder would handle that case well +- * anyway. +- * +- * If this ever changes and we can safely reference the subjective +- * creds/SID of another task, this function will make it easier to +- * identify the various places where we make use of the task SIDs in +- * the binder code. It is also likely that we will need to adjust +- * the main drivers/android binder code as well. +- */ +- return task_sid_obj(task); +-} +- + static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dentry); + + /* +@@ -2039,18 +2016,19 @@ static inline u32 open_file_to_av(struct file *file) + + /* Hook functions begin here. */ + +-static int selinux_binder_set_context_mgr(struct task_struct *mgr) ++static int selinux_binder_set_context_mgr(const struct cred *mgr) + { + return avc_has_perm(&selinux_state, +- current_sid(), task_sid_binder(mgr), SECCLASS_BINDER, ++ current_sid(), cred_sid(mgr), SECCLASS_BINDER, + BINDER__SET_CONTEXT_MGR, NULL); + } + +-static int selinux_binder_transaction(struct task_struct *from, +- struct task_struct *to) ++static int selinux_binder_transaction(const struct cred *from, ++ const struct cred *to) + { + u32 mysid = current_sid(); +- u32 fromsid = task_sid_binder(from); ++ u32 fromsid = cred_sid(from); ++ u32 tosid = cred_sid(to); + int rc; + + if (mysid != fromsid) { +@@ -2061,24 +2039,24 @@ static int selinux_binder_transaction(struct task_struct *from, + return rc; + } + +- return avc_has_perm(&selinux_state, fromsid, task_sid_binder(to), ++ return avc_has_perm(&selinux_state, fromsid, tosid, + SECCLASS_BINDER, BINDER__CALL, NULL); + } + +-static int selinux_binder_transfer_binder(struct task_struct *from, +- struct task_struct *to) ++static int selinux_binder_transfer_binder(const struct cred *from, ++ const struct cred *to) + { + return avc_has_perm(&selinux_state, +- task_sid_binder(from), task_sid_binder(to), ++ cred_sid(from), cred_sid(to), + SECCLASS_BINDER, BINDER__TRANSFER, + NULL); + } + +-static int selinux_binder_transfer_file(struct task_struct *from, +- struct task_struct *to, ++static int selinux_binder_transfer_file(const struct cred *from, ++ const struct cred *to, + struct file *file) + { +- u32 sid = task_sid_binder(to); ++ u32 sid = cred_sid(to); + struct file_security_struct *fsec = selinux_file(file); + struct dentry *dentry = file->f_path.dentry; + struct inode_security_struct *isec; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1319-locks-fix-toctou-race-when-granting-write-lease.patch b/SOURCES/1319-locks-fix-toctou-race-when-granting-write-lease.patch new file mode 100644 index 000000000..b96d88b13 --- /dev/null +++ b/SOURCES/1319-locks-fix-toctou-race-when-granting-write-lease.patch @@ -0,0 +1,115 @@ +From 16ac6be5ae309b2c31c37898511025294dcadcc8 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:30:04 +0200 +Subject: [PATCH] locks: fix TOCTOU race when granting write lease + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit d6da19c9cace63290ccfccb1fc35151ffefc0bec +Author: Amir Goldstein +Date: Tue Aug 16 17:53:17 2022 +0300 + + locks: fix TOCTOU race when granting write lease + + Thread A trying to acquire a write lease checks the value of i_readcount + and i_writecount in check_conflicting_open() to verify that its own fd + is the only fd referencing the file. + + Thread B trying to open the file for read will call break_lease() in + do_dentry_open() before incrementing i_readcount, which leaves a small + window where thread A can acquire the write lease and then thread B + completes the open of the file for read without breaking the write lease + that was acquired by thread A. + + Fix this race by incrementing i_readcount before checking for existing + leases, same as the case with i_writecount. + + Use a helper put_file_access() to decrement i_readcount or i_writecount + in do_dentry_open() and __fput(). + + Fixes: 387e3746d01c ("locks: eliminate false positive conflicts for write lease") + Reviewed-by: Jeff Layton + Signed-off-by: Amir Goldstein + Signed-off-by: Al Viro + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/file_table.c b/fs/file_table.c +index cdc1dea33154..845c741dc518 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -366,12 +366,7 @@ static void __fput(struct file *file) + } + fops_put(file->f_op); + put_pid(file->f_owner.pid); +- if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) +- i_readcount_dec(inode); +- if (mode & FMODE_WRITER) { +- put_write_access(inode); +- __mnt_drop_write(mnt); +- } ++ put_file_access(file); + dput(dentry); + if (unlikely(mode & FMODE_NEED_UNMOUNT)) + dissolve_on_fput(mnt); +diff --git a/fs/internal.h b/fs/internal.h +index 3e8dbf777ce2..c3701d285c69 100644 +--- a/fs/internal.h ++++ b/fs/internal.h +@@ -96,6 +96,16 @@ extern void chroot_fs_refs(const struct path *, const struct path *); + extern struct file *alloc_empty_file(int, const struct cred *); + extern struct file *alloc_empty_file_noaccount(int, const struct cred *); + ++static inline void put_file_access(struct file *file) ++{ ++ if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { ++ i_readcount_dec(file->f_inode); ++ } else if (file->f_mode & FMODE_WRITER) { ++ put_write_access(file->f_inode); ++ __mnt_drop_write(file->f_path.mnt); ++ } ++} ++ + /* + * super.c + */ +diff --git a/fs/open.c b/fs/open.c +index 51052202ecdc..a84909d62168 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -861,7 +861,9 @@ static int do_dentry_open(struct file *f, + return 0; + } + +- if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { ++ if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { ++ i_readcount_inc(inode); ++ } else if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { + error = get_write_access(inode); + if (unlikely(error)) + goto cleanup_file; +@@ -901,8 +903,6 @@ static int do_dentry_open(struct file *f, + goto cleanup_all; + } + f->f_mode |= FMODE_OPENED; +- if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) +- i_readcount_inc(inode); + if ((f->f_mode & FMODE_READ) && + likely(f->f_op->read || f->f_op->read_iter)) + f->f_mode |= FMODE_CAN_READ; +@@ -948,10 +948,7 @@ static int do_dentry_open(struct file *f, + if (WARN_ON_ONCE(error > 0)) + error = -EINVAL; + fops_put(f->f_op); +- if (f->f_mode & FMODE_WRITER) { +- put_write_access(inode); +- __mnt_drop_write(f->f_path.mnt); +- } ++ put_file_access(f); + cleanup_file: + path_put(&f->f_path); + f->f_path.mnt = NULL; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1320-fs-use-a-helper-for-opening-kernel-internal-files.patch b/SOURCES/1320-fs-use-a-helper-for-opening-kernel-internal-files.patch new file mode 100644 index 000000000..09ee79bda --- /dev/null +++ b/SOURCES/1320-fs-use-a-helper-for-opening-kernel-internal-files.patch @@ -0,0 +1,122 @@ +From f3196ef468589e50d09ad0edc6d2874c267418e2 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 09:46:36 +0200 +Subject: [PATCH] fs: use a helper for opening kernel internal files + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - include/linux/fs.h: context fuzz + - fs/overlayfs/util.c: previous backport introduced an + open_with_fake_path() caller that also needs to be renamed here + +commit cbb0b9d4bbcfa96e7872808a63be03202536f1bc +Author: Amir Goldstein +Date: Thu Jun 15 14:22:26 2023 +0300 + + fs: use a helper for opening kernel internal files + + cachefiles uses kernel_open_tmpfile() to open kernel internal tmpfile + without accounting for nr_files. + + cachefiles uses open_with_fake_path() for the same reason without the + need for a fake path. + + Fork open_with_fake_path() to kernel_file_open() which only does the + noaccount part and use it in cachefiles. + + Signed-off-by: Amir Goldstein + Reviewed-by: Christoph Hellwig + Message-Id: <20230615112229.2143178-3-amir73il@gmail.com> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c +index 6f5c59baec08..bc2bb2001318 100644 +--- a/fs/cachefiles/namei.c ++++ b/fs/cachefiles/namei.c +@@ -560,8 +560,8 @@ static bool cachefiles_open_file(struct cachefiles_object *object, + */ + path.mnt = cache->mnt; + path.dentry = dentry; +- file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DIRECT, +- d_backing_inode(dentry), cache->cache_cred); ++ file = kernel_file_open(&path, O_RDWR | O_LARGEFILE | O_DIRECT, ++ d_backing_inode(dentry), cache->cache_cred); + if (IS_ERR(file)) { + trace_cachefiles_vfs_error(object, d_backing_inode(dentry), + PTR_ERR(file), +diff --git a/fs/open.c b/fs/open.c +index a84909d62168..3eac96e10eb0 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -1089,6 +1089,39 @@ struct file *dentry_create(const struct path *path, int flags, umode_t mode, + } + EXPORT_SYMBOL(dentry_create); + ++/** ++ * kernel_file_open - open a file for kernel internal use ++ * @path: path of the file to open ++ * @flags: open flags ++ * @inode: the inode ++ * @cred: credentials for open ++ * ++ * Open a file for use by in-kernel consumers. The file is not accounted ++ * against nr_files and must not be installed into the file descriptor ++ * table. ++ * ++ * Return: Opened file on success, an error pointer on failure. ++ */ ++struct file *kernel_file_open(const struct path *path, int flags, ++ struct inode *inode, const struct cred *cred) ++{ ++ struct file *f; ++ int error; ++ ++ f = alloc_empty_file_noaccount(flags, cred); ++ if (IS_ERR(f)) ++ return f; ++ ++ f->f_path = *path; ++ error = do_dentry_open(f, inode, NULL); ++ if (error) { ++ fput(f); ++ f = ERR_PTR(error); ++ } ++ return f; ++} ++EXPORT_SYMBOL_GPL(kernel_file_open); ++ + struct file *open_with_fake_path(const struct path *path, int flags, + struct inode *inode, const struct cred *cred) + { +diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c +index 2cdbb70d2b5d..6b31b6587e4d 100644 +--- a/fs/overlayfs/util.c ++++ b/fs/overlayfs/util.c +@@ -1370,7 +1370,7 @@ int ovl_ensure_verity_loaded(struct path *datapath) + * If this inode was not yet opened, the verity info hasn't been + * loaded yet, so we need to do that here to force it into memory. + */ +- filp = open_with_fake_path(datapath, O_RDONLY, inode, current_cred()); ++ filp = kernel_file_open(datapath, O_RDONLY, inode, current_cred()); + if (IS_ERR(filp)) + return PTR_ERR(filp); + fput(filp); +diff --git a/include/linux/fs.h b/include/linux/fs.h +index be94651061c1..363cdadb04ba 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -1833,6 +1833,8 @@ static inline int vfs_whiteout(struct mnt_idmap *idmap, + struct file *vfs_tmpfile_open(struct mnt_idmap *idmap, + const struct path *parentpath, + umode_t mode, int open_flag, const struct cred *cred); ++struct file *kernel_file_open(const struct path *path, int flags, ++ struct inode *inode, const struct cred *cred); + + int vfs_mkobj(struct dentry *, umode_t, + int (*f)(struct dentry *, umode_t, void *), +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1321-fs-move-kmem-cache-zalloc-into-alloc-empty-file-helpers.patch b/SOURCES/1321-fs-move-kmem-cache-zalloc-into-alloc-empty-file-helpers.patch new file mode 100644 index 000000000..5feb059f4 --- /dev/null +++ b/SOURCES/1321-fs-move-kmem-cache-zalloc-into-alloc-empty-file-helpers.patch @@ -0,0 +1,124 @@ +From f226635eb71a0c5f680f89a64ec6332e5b2f8ee7 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 15:22:00 +0200 +Subject: [PATCH] fs: move kmem_cache_zalloc() into alloc_empty_file*() helpers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - only context fuzz + +commit 8a05a8c31d06c5d0d67b273a4a00f87269adde82 +Author: Amir Goldstein +Date: Thu Jun 15 14:22:27 2023 +0300 + + fs: move kmem_cache_zalloc() into alloc_empty_file*() helpers + + Use a common helper init_file() instead of __alloc_file() for + alloc_empty_file*() helpers and improrve the documentation. + + This is needed for a follow up patch that allocates a backing_file + container. + + Suggested-by: Christoph Hellwig + Signed-off-by: Amir Goldstein + Reviewed-by: Christoph Hellwig + Message-Id: <20230615112229.2143178-4-amir73il@gmail.com> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/file_table.c b/fs/file_table.c +index 845c741dc518..9fee3de138d6 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -131,20 +131,15 @@ static int __init init_fs_stat_sysctls(void) + fs_initcall(init_fs_stat_sysctls); + #endif + +-static struct file *__alloc_file(int flags, const struct cred *cred) ++static int init_file(struct file *f, int flags, const struct cred *cred) + { +- struct file *f; + int error; + +- f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL); +- if (unlikely(!f)) +- return ERR_PTR(-ENOMEM); +- + f->f_cred = get_cred(cred); + error = security_file_alloc(f); + if (unlikely(error)) { + file_free_rcu(&f->f_u.fu_rcuhead); +- return ERR_PTR(error); ++ return error; + } + + atomic_long_set(&f->f_count, 1); +@@ -155,7 +150,7 @@ static struct file *__alloc_file(int flags, const struct cred *cred) + f->f_mode = OPEN_FMODE(flags); + /* f->f_version: 0 */ + +- return f; ++ return 0; + } + + /* Find an unused file structure and return a pointer to it. +@@ -172,6 +167,7 @@ struct file *alloc_empty_file(int flags, const struct cred *cred) + { + static long old_max; + struct file *f; ++ int error; + + /* + * Privileged users can go above max_files +@@ -185,9 +181,15 @@ struct file *alloc_empty_file(int flags, const struct cred *cred) + goto over; + } + +- f = __alloc_file(flags, cred); +- if (!IS_ERR(f)) +- percpu_counter_inc(&nr_files); ++ f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL); ++ if (unlikely(!f)) ++ return ERR_PTR(-ENOMEM); ++ ++ error = init_file(f, flags, cred); ++ if (unlikely(error)) ++ return ERR_PTR(error); ++ ++ percpu_counter_inc(&nr_files); + + return f; + +@@ -203,14 +205,23 @@ struct file *alloc_empty_file(int flags, const struct cred *cred) + /* + * Variant of alloc_empty_file() that doesn't check and modify nr_files. + * +- * Should not be used unless there's a very good reason to do so. ++ * This is only for kernel internal use, and the allocate file must not be ++ * installed into file tables or such. + */ + struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred) + { +- struct file *f = __alloc_file(flags, cred); ++ struct file *f; ++ int error; ++ ++ f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL); ++ if (unlikely(!f)) ++ return ERR_PTR(-ENOMEM); ++ ++ error = init_file(f, flags, cred); ++ if (unlikely(error)) ++ return ERR_PTR(error); + +- if (!IS_ERR(f)) +- f->f_mode |= FMODE_NOACCOUNT; ++ f->f_mode |= FMODE_NOACCOUNT; + + return f; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1322-fs-use-backing-file-container-for-internal-files-with-fake-f.patch b/SOURCES/1322-fs-use-backing-file-container-for-internal-files-with-fake-f.patch new file mode 100644 index 000000000..7832f4463 --- /dev/null +++ b/SOURCES/1322-fs-use-backing-file-container-for-internal-files-with-fake-f.patch @@ -0,0 +1,253 @@ +From 9e62303890ee2ae993e8c78bb442176d2467e927 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 09:49:05 +0200 +Subject: [PATCH] fs: use backing_file container for internal files with "fake" + f_path + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 62d53c4a1dfe347bd87ede46ffad38c9a3870338 +Author: Amir Goldstein +Date: Thu Jun 15 14:22:28 2023 +0300 + + fs: use backing_file container for internal files with "fake" f_path + + Overlayfs uses open_with_fake_path() to allocate internal kernel files, + with a "fake" path - whose f_path is not on the same fs as f_inode. + + Allocate a container struct backing_file for those internal files, that + is used to hold the "fake" ovl path along with the real path. + + backing_file_real_path() can be used to access the stored real path. + + Signed-off-by: Amir Goldstein + Message-Id: <20230615112229.2143178-5-amir73il@gmail.com> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/file_table.c b/fs/file_table.c +index 9fee3de138d6..3f02d00c1396 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -44,18 +44,40 @@ static struct kmem_cache *filp_cachep __read_mostly; + + static struct percpu_counter nr_files __cacheline_aligned_in_smp; + ++/* Container for backing file with optional real path */ ++struct backing_file { ++ struct file file; ++ struct path real_path; ++}; ++ ++static inline struct backing_file *backing_file(struct file *f) ++{ ++ return container_of(f, struct backing_file, file); ++} ++ ++struct path *backing_file_real_path(struct file *f) ++{ ++ return &backing_file(f)->real_path; ++} ++EXPORT_SYMBOL_GPL(backing_file_real_path); ++ + static void file_free_rcu(struct rcu_head *head) + { + struct file *f = container_of(head, struct file, f_u.fu_rcuhead); + + put_cred(f->f_cred); +- kmem_cache_free(filp_cachep, f); ++ if (unlikely(f->f_mode & FMODE_BACKING)) ++ kfree(backing_file(f)); ++ else ++ kmem_cache_free(filp_cachep, f); + } + + static inline void file_free(struct file *f) + { + security_file_free(f); +- if (!(f->f_mode & FMODE_NOACCOUNT)) ++ if (unlikely(f->f_mode & FMODE_BACKING)) ++ path_put(backing_file_real_path(f)); ++ if (likely(!(f->f_mode & FMODE_NOACCOUNT))) + percpu_counter_dec(&nr_files); + call_rcu(&f->f_u.fu_rcuhead, file_free_rcu); + } +@@ -226,6 +248,30 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred) + return f; + } + ++/* ++ * Variant of alloc_empty_file() that allocates a backing_file container ++ * and doesn't check and modify nr_files. ++ * ++ * This is only for kernel internal use, and the allocate file must not be ++ * installed into file tables or such. ++ */ ++struct file *alloc_empty_backing_file(int flags, const struct cred *cred) ++{ ++ struct backing_file *ff; ++ int error; ++ ++ ff = kzalloc(sizeof(struct backing_file), GFP_KERNEL); ++ if (unlikely(!ff)) ++ return ERR_PTR(-ENOMEM); ++ ++ error = init_file(&ff->file, flags, cred); ++ if (unlikely(error)) ++ return ERR_PTR(error); ++ ++ ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT; ++ return &ff->file; ++} ++ + /** + * file_init_path - initialize a 'struct file' based on path + * +diff --git a/fs/internal.h b/fs/internal.h +index c3701d285c69..62b558fa6395 100644 +--- a/fs/internal.h ++++ b/fs/internal.h +@@ -93,8 +93,9 @@ extern void chroot_fs_refs(const struct path *, const struct path *); + /* + * file_table.c + */ +-extern struct file *alloc_empty_file(int, const struct cred *); +-extern struct file *alloc_empty_file_noaccount(int, const struct cred *); ++struct file *alloc_empty_file(int flags, const struct cred *cred); ++struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred); ++struct file *alloc_empty_backing_file(int flags, const struct cred *cred); + + static inline void put_file_access(struct file *file) + { +diff --git a/fs/open.c b/fs/open.c +index 3eac96e10eb0..e2419242456e 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -1122,23 +1122,44 @@ struct file *kernel_file_open(const struct path *path, int flags, + } + EXPORT_SYMBOL_GPL(kernel_file_open); + +-struct file *open_with_fake_path(const struct path *path, int flags, +- struct inode *inode, const struct cred *cred) ++/** ++ * backing_file_open - open a backing file for kernel internal use ++ * @path: path of the file to open ++ * @flags: open flags ++ * @path: path of the backing file ++ * @cred: credentials for open ++ * ++ * Open a backing file for a stackable filesystem (e.g., overlayfs). ++ * @path may be on the stackable filesystem and backing inode on the ++ * underlying filesystem. In this case, we want to be able to return ++ * the @real_path of the backing inode. This is done by embedding the ++ * returned file into a container structure that also stores the path of ++ * the backing inode on the underlying filesystem, which can be ++ * retrieved using backing_file_real_path(). ++ */ ++struct file *backing_file_open(const struct path *path, int flags, ++ const struct path *real_path, ++ const struct cred *cred) + { +- struct file *f = alloc_empty_file_noaccount(flags, cred); +- if (!IS_ERR(f)) { +- int error; ++ struct file *f; ++ int error; + +- f->f_path = *path; +- error = do_dentry_open(f, inode, NULL); +- if (error) { +- fput(f); +- f = ERR_PTR(error); +- } ++ f = alloc_empty_backing_file(flags, cred); ++ if (IS_ERR(f)) ++ return f; ++ ++ f->f_path = *path; ++ path_get(real_path); ++ *backing_file_real_path(f) = *real_path; ++ error = do_dentry_open(f, d_inode(real_path->dentry), NULL); ++ if (error) { ++ fput(f); ++ f = ERR_PTR(error); + } ++ + return f; + } +-EXPORT_SYMBOL(open_with_fake_path); ++EXPORT_SYMBOL_GPL(backing_file_open); + + #define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE)) + #define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC) +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index 5db89c8de140..99f2ae8e3864 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -65,8 +65,8 @@ static struct file *ovl_open_realfile(const struct file *file, + if (!inode_owner_or_capable(real_idmap, realinode)) + flags &= ~O_NOATIME; + +- realfile = open_with_fake_path(&file->f_path, flags, realinode, +- current_cred()); ++ realfile = backing_file_open(&file->f_path, flags, realpath, ++ current_cred()); + } + revert_creds(old_cred); + +diff --git a/include/linux/fs.h b/include/linux/fs.h +index 363cdadb04ba..48ec31b9d230 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -167,6 +167,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, + /* Supports IOCB_HAS_METADATA */ + #define FMODE_HAS_METADATA ((__force fmode_t)0x800000) + ++/* File is embedded in backing_file object */ ++#define FMODE_BACKING ((__force fmode_t)0x2000000) ++ + /* File was opened by fanotify and shouldn't generate fanotify events */ + #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) + +@@ -2579,11 +2582,31 @@ static inline struct file *file_open_root_mnt(struct vfsmount *mnt, + return file_open_root(&(struct path){.mnt = mnt, .dentry = mnt->mnt_root}, + name, flags, mode); + } +-extern struct file * dentry_open(const struct path *, int, const struct cred *); +-extern struct file *dentry_create(const struct path *path, int flags, +- umode_t mode, const struct cred *cred); +-extern struct file * open_with_fake_path(const struct path *, int, +- struct inode*, const struct cred *); ++struct file *dentry_open(const struct path *path, int flags, ++ const struct cred *creds); ++struct file *dentry_create(const struct path *path, int flags, umode_t mode, ++ const struct cred *cred); ++struct file *backing_file_open(const struct path *path, int flags, ++ const struct path *real_path, ++ const struct cred *cred); ++struct path *backing_file_real_path(struct file *f); ++ ++/* ++ * file_real_path - get the path corresponding to f_inode ++ * ++ * When opening a backing file for a stackable filesystem (e.g., ++ * overlayfs) f_path may be on the stackable filesystem and f_inode on ++ * the underlying filesystem. When the path associated with f_inode is ++ * needed, this helper should be used instead of accessing f_path ++ * directly. ++*/ ++static inline const struct path *file_real_path(struct file *f) ++{ ++ if (unlikely(f->f_mode & FMODE_BACKING)) ++ return backing_file_real_path(f); ++ return &f->f_path; ++} ++ + static inline struct file *file_clone_open(struct file *file) + { + return dentry_open(&file->f_path, file->f_flags, file->f_cred); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1323-ovl-enable-fsnotify-events-on-underlying-real-files.patch b/SOURCES/1323-ovl-enable-fsnotify-events-on-underlying-real-files.patch new file mode 100644 index 000000000..09ef3ebf4 --- /dev/null +++ b/SOURCES/1323-ovl-enable-fsnotify-events-on-underlying-real-files.patch @@ -0,0 +1,75 @@ +From c030fdbbb0542056bdd257409b781cee8d8b8c39 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 09:49:52 +0200 +Subject: [PATCH] ovl: enable fsnotify events on underlying real files + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit bc2473c90fca55bf95b2ab6af1dacee26a4f92f6 +Author: Amir Goldstein +Date: Thu Jun 15 14:22:29 2023 +0300 + + ovl: enable fsnotify events on underlying real files + + Overlayfs creates the real underlying files with fake f_path, whose + f_inode is on the underlying fs and f_path on overlayfs. + + Those real files were open with FMODE_NONOTIFY, because fsnotify code was + not prapared to handle fsnotify hooks on files with fake path correctly + and fanotify would report unexpected event->fd with fake overlayfs path, + when the underlying fs was being watched. + + Teach fsnotify to handle events on the real files, and do not set real + files to FMODE_NONOTIFY to allow operations on real file (e.g. open, + access, modify, close) to generate async and permission events. + + Because fsnotify does not have notifications on address space + operations, we do not need to worry about ->vm_file not reporting + events to a watched overlayfs when users are accessing a mapped + overlayfs file. + + Acked-by: Jan Kara + Signed-off-by: Amir Goldstein + Message-Id: <20230615112229.2143178-6-amir73il@gmail.com> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index 99f2ae8e3864..cd0770bb3020 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -38,8 +38,8 @@ static char ovl_whatisit(struct inode *inode, struct inode *realinode) + return 'm'; + } + +-/* No atime modification nor notify on underlying */ +-#define OVL_OPEN_FLAGS (O_NOATIME | FMODE_NONOTIFY) ++/* No atime modification on underlying */ ++#define OVL_OPEN_FLAGS (O_NOATIME) + + static struct file *ovl_open_realfile(const struct file *file, + const struct path *realpath) +diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h +index bb8467cd11ae..ed48e4f1e755 100644 +--- a/include/linux/fsnotify.h ++++ b/include/linux/fsnotify.h +@@ -91,11 +91,13 @@ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask) + + static inline int fsnotify_file(struct file *file, __u32 mask) + { +- const struct path *path = &file->f_path; ++ const struct path *path; + + if (file->f_mode & FMODE_NONOTIFY) + return 0; + ++ /* Overlayfs internal files have fake f_path */ ++ path = file_real_path(file); + return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1324-fs-move-cleanup-from-init-file-into-its-callers.patch b/SOURCES/1324-fs-move-cleanup-from-init-file-into-its-callers.patch new file mode 100644 index 000000000..d79969535 --- /dev/null +++ b/SOURCES/1324-fs-move-cleanup-from-init-file-into-its-callers.patch @@ -0,0 +1,82 @@ +From 119b885de3aa505ccc26df2a8a074555e04be774 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 20:28:16 +0200 +Subject: [PATCH] fs: move cleanup from init_file() into its callers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +Conflicts: + - fs/file_table.c: slightly different argument to file_free_rcu() + downstream, replacement kept the same + +commit dff745c1221a402b4921d54f292288373cff500c +Author: Amir Goldstein +Date: Sat Jul 1 20:11:34 2023 +0300 + + fs: move cleanup from init_file() into its callers + + The use of file_free_rcu() in init_file() to free the struct that was + allocated by the caller was hacky and we got what we deserved. + + Let init_file() and its callers take care of cleaning up each after + their own allocated resources on error. + + Fixes: 62d53c4a1dfe ("fs: use backing_file container for internal files with "fake" f_path") # mainline only + Reported-and-tested-by: syzbot+ada42aab05cf51b00e98@syzkaller.appspotmail.com + Signed-off-by: Amir Goldstein + Message-Id: <20230701171134.239409-1-amir73il@gmail.com> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/file_table.c b/fs/file_table.c +index 3f02d00c1396..b0a8c2608530 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -160,7 +160,7 @@ static int init_file(struct file *f, int flags, const struct cred *cred) + f->f_cred = get_cred(cred); + error = security_file_alloc(f); + if (unlikely(error)) { +- file_free_rcu(&f->f_u.fu_rcuhead); ++ put_cred(f->f_cred); + return error; + } + +@@ -208,8 +208,10 @@ struct file *alloc_empty_file(int flags, const struct cred *cred) + return ERR_PTR(-ENOMEM); + + error = init_file(f, flags, cred); +- if (unlikely(error)) ++ if (unlikely(error)) { ++ kmem_cache_free(filp_cachep, f); + return ERR_PTR(error); ++ } + + percpu_counter_inc(&nr_files); + +@@ -240,8 +242,10 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred) + return ERR_PTR(-ENOMEM); + + error = init_file(f, flags, cred); +- if (unlikely(error)) ++ if (unlikely(error)) { ++ kmem_cache_free(filp_cachep, f); + return ERR_PTR(error); ++ } + + f->f_mode |= FMODE_NOACCOUNT; + +@@ -265,8 +269,10 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred) + return ERR_PTR(-ENOMEM); + + error = init_file(&ff->file, flags, cred); +- if (unlikely(error)) ++ if (unlikely(error)) { ++ kfree(ff); + return ERR_PTR(error); ++ } + + ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT; + return &ff->file; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1325-lsm-constify-the-file-parameter-in-security-binder-transfer-.patch b/SOURCES/1325-lsm-constify-the-file-parameter-in-security-binder-transfer-.patch new file mode 100644 index 000000000..3c99a6743 --- /dev/null +++ b/SOURCES/1325-lsm-constify-the-file-parameter-in-security-binder-transfer-.patch @@ -0,0 +1,119 @@ +From f792642489a3af9cec81ca366edffd79fe1d1359 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 15:16:52 +0200 +Subject: [PATCH] lsm: constify the 'file' parameter in + security_binder_transfer_file() + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 8e4672d6f902d5c4db1e87e8aa9f530149d85bc6 +Author: Khadija Kamran +Date: Sat Aug 12 20:31:08 2023 +0500 + + lsm: constify the 'file' parameter in security_binder_transfer_file() + + SELinux registers the implementation for the "binder_transfer_file" + hook. Looking at the function implementation we observe that the + parameter "file" is not changing. + + Mark the "file" parameter of LSM hook security_binder_transfer_file() as + "const" since it will not be changing in the LSM hook. + + Signed-off-by: Khadija Kamran + [PM: subject line whitespace fix] + Signed-off-by: Paul Moore + +Signed-off-by: Ondrej Mosnacek + +diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h +index 1c2be7057bd9..b6fbb446bab7 100644 +--- a/include/linux/lsm_hook_defs.h ++++ b/include/linux/lsm_hook_defs.h +@@ -32,7 +32,7 @@ LSM_HOOK(int, 0, binder_transaction, const struct cred *from, + LSM_HOOK(int, 0, binder_transfer_binder, const struct cred *from, + const struct cred *to) + LSM_HOOK(int, 0, binder_transfer_file, const struct cred *from, +- const struct cred *to, struct file *file) ++ const struct cred *to, const struct file *file) + LSM_HOOK(int, 0, ptrace_access_check, struct task_struct *child, + unsigned int mode) + LSM_HOOK(int, 0, ptrace_traceme, struct task_struct *parent) +diff --git a/include/linux/security.h b/include/linux/security.h +index 3d216c94fd69..d2888c127859 100644 +--- a/include/linux/security.h ++++ b/include/linux/security.h +@@ -269,7 +269,7 @@ int security_binder_transaction(const struct cred *from, + int security_binder_transfer_binder(const struct cred *from, + const struct cred *to); + int security_binder_transfer_file(const struct cred *from, +- const struct cred *to, struct file *file); ++ const struct cred *to, const struct file *file); + int security_ptrace_access_check(struct task_struct *child, unsigned int mode); + int security_ptrace_traceme(struct task_struct *parent); + int security_capget(struct task_struct *target, +@@ -539,7 +539,7 @@ static inline int security_binder_transfer_binder(const struct cred *from, + + static inline int security_binder_transfer_file(const struct cred *from, + const struct cred *to, +- struct file *file) ++ const struct file *file) + { + return 0; + } +diff --git a/security/security.c b/security/security.c +index 2092b657af9f..b59af216324f 100644 +--- a/security/security.c ++++ b/security/security.c +@@ -905,7 +905,7 @@ int security_binder_transfer_binder(const struct cred *from, + } + + int security_binder_transfer_file(const struct cred *from, +- const struct cred *to, struct file *file) ++ const struct cred *to, const struct file *file) + { + return call_int_hook(binder_transfer_file, from, to, file); + } +diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c +index 22173e8e88e2..deacc9a63fae 100644 +--- a/security/selinux/hooks.c ++++ b/security/selinux/hooks.c +@@ -1703,7 +1703,7 @@ static inline int file_path_has_perm(const struct cred *cred, + } + + #ifdef CONFIG_BPF_SYSCALL +-static int bpf_fd_pass(struct file *file, u32 sid); ++static int bpf_fd_pass(const struct file *file, u32 sid); + #endif + + /* Check whether a task can use an open file descriptor to +@@ -1976,7 +1976,7 @@ static inline u32 file_mask_to_av(int mode, int mask) + } + + /* Convert a Linux file to an access vector. */ +-static inline u32 file_to_av(struct file *file) ++static inline u32 file_to_av(const struct file *file) + { + u32 av = 0; + +@@ -2054,7 +2054,7 @@ static int selinux_binder_transfer_binder(const struct cred *from, + + static int selinux_binder_transfer_file(const struct cred *from, + const struct cred *to, +- struct file *file) ++ const struct file *file) + { + u32 sid = cred_sid(to); + struct file_security_struct *fsec = selinux_file(file); +@@ -6885,7 +6885,7 @@ static u32 bpf_map_fmode_to_av(fmode_t fmode) + * access the bpf object and that's why we have to add this additional check in + * selinux_file_receive and selinux_binder_transfer_files. + */ +-static int bpf_fd_pass(struct file *file, u32 sid) ++static int bpf_fd_pass(const struct file *file, u32 sid) + { + struct bpf_security_struct *bpfsec; + struct bpf_prog *prog; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1326-cachefiles-use-kiocb-start-end-write-helpers.patch b/SOURCES/1326-cachefiles-use-kiocb-start-end-write-helpers.patch new file mode 100644 index 000000000..0e1c54759 --- /dev/null +++ b/SOURCES/1326-cachefiles-use-kiocb-start-end-write-helpers.patch @@ -0,0 +1,73 @@ +From 7adb9b7eb6c32cc5b7cea983ed187cf3f4122cf5 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:49:48 +0200 +Subject: [PATCH] cachefiles: use kiocb_{start,end}_write() helpers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit e6fa4c728fb671765291cca3a905986612c06b6e +Author: Amir Goldstein +Date: Thu Aug 17 17:13:37 2023 +0300 + + cachefiles: use kiocb_{start,end}_write() helpers + + Use helpers instead of the open coded dance to silence lockdep warnings. + + Suggested-by: Jan Kara + Signed-off-by: Amir Goldstein + Reviewed-by: Jan Kara + Reviewed-by: Jens Axboe + Message-Id: <20230817141337.1025891-8-amir73il@gmail.com> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c +index 175a25fcade8..009d23cd435b 100644 +--- a/fs/cachefiles/io.c ++++ b/fs/cachefiles/io.c +@@ -259,9 +259,7 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret) + + _enter("%ld", ret); + +- /* Tell lockdep we inherited freeze protection from submission thread */ +- __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); +- __sb_end_write(inode->i_sb, SB_FREEZE_WRITE); ++ kiocb_end_write(iocb); + + if (ret < 0) + trace_cachefiles_io_error(object, inode, ret, +@@ -286,7 +284,6 @@ int __cachefiles_write(struct cachefiles_object *object, + { + struct cachefiles_cache *cache; + struct cachefiles_kiocb *ki; +- struct inode *inode; + unsigned int old_nofs; + ssize_t ret; + size_t len = iov_iter_count(iter); +@@ -322,19 +319,12 @@ int __cachefiles_write(struct cachefiles_object *object, + ki->iocb.ki_complete = cachefiles_write_complete; + atomic_long_add(ki->b_writing, &cache->b_writing); + +- /* Open-code file_start_write here to grab freeze protection, which +- * will be released by another thread in aio_complete_rw(). Fool +- * lockdep by telling it the lock got released so that it doesn't +- * complain about the held lock when we return to userspace. +- */ +- inode = file_inode(file); +- __sb_start_write(inode->i_sb, SB_FREEZE_WRITE); +- __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE); ++ kiocb_start_write(&ki->iocb); + + get_file(ki->iocb.ki_filp); + cachefiles_grab_object(object, cachefiles_obj_get_ioreq); + +- trace_cachefiles_write(object, inode, ki->iocb.ki_pos, len); ++ trace_cachefiles_write(object, file_inode(file), ki->iocb.ki_pos, len); + old_nofs = memalloc_nofs_save(); + ret = cachefiles_inject_write_error(); + if (ret == 0) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1327-fs-fix-kernel-doc-warnings.patch b/SOURCES/1327-fs-fix-kernel-doc-warnings.patch new file mode 100644 index 000000000..333e12091 --- /dev/null +++ b/SOURCES/1327-fs-fix-kernel-doc-warnings.patch @@ -0,0 +1,188 @@ +From fb0111b77bf45d069d7722682a1a1d202fbf5f7d Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 08:59:20 +0200 +Subject: [PATCH] fs: Fix kernel-doc warnings + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - fs/fs_context.c: one docstring fix was already applied + +commit 35931eb3945b8d38c31f8e956aee3cf31c52121b +Author: Matthew Wilcox (Oracle) +Date: Fri Aug 18 21:08:24 2023 +0100 + + fs: Fix kernel-doc warnings + + These have a variety of causes and a corresponding variety of solutions. + + Signed-off-by: "Matthew Wilcox (Oracle)" + Message-Id: <20230818200824.2720007-1-willy@infradead.org> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/file.c b/fs/file.c +index 236e11a38c18..70962ee8c68b 100644 +--- a/fs/file.c ++++ b/fs/file.c +@@ -679,7 +679,7 @@ EXPORT_SYMBOL(close_fd); /* for ksys_close() */ + + /** + * last_fd - return last valid index into fd table +- * @cur_fds: files struct ++ * @fdt: File descriptor table. + * + * Context: Either rcu read lock or files_lock must be held. + * +@@ -734,6 +734,7 @@ static inline void __range_close(struct files_struct *cur_fds, unsigned int fd, + * + * @fd: starting file descriptor to close + * @max_fd: last file descriptor to close ++ * @flags: CLOSE_RANGE flags. + * + * This closes a range of file descriptors. All file descriptors + * from @fd up to and including @max_fd are closed. +diff --git a/fs/fs_context.c b/fs/fs_context.c +index 648d2ee9e5fc..3473e63e8399 100644 +--- a/fs/fs_context.c ++++ b/fs/fs_context.c +@@ -162,6 +162,10 @@ EXPORT_SYMBOL(vfs_parse_fs_param); + + /** + * vfs_parse_fs_string - Convenience function to just parse a string. ++ * @fc: Filesystem context. ++ * @key: Parameter name. ++ * @value: Default value. ++ * @v_size: Maximum number of bytes in the value. + */ + int vfs_parse_fs_string(struct fs_context *fc, const char *key, + const char *value, size_t v_size) +@@ -357,7 +361,7 @@ void fc_drop_locked(struct fs_context *fc) + static void legacy_fs_context_free(struct fs_context *fc); + + /** +- * vfs_dup_fc_config: Duplicate a filesystem context. ++ * vfs_dup_fs_context - Duplicate a filesystem context. + * @src_fc: The context to copy. + */ + struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc) +@@ -403,7 +407,9 @@ EXPORT_SYMBOL(vfs_dup_fs_context); + + /** + * logfc - Log a message to a filesystem context +- * @fc: The filesystem context to log to. ++ * @log: The filesystem context to log to, or NULL to use printk. ++ * @prefix: A string to prefix the output with, or NULL. ++ * @level: 'w' for a warning, 'e' for an error. Anything else is a notice. + * @fmt: The format of the buffer. + */ + void logfc(struct fc_log *log, const char *prefix, char level, const char *fmt, ...) +diff --git a/fs/ioctl.c b/fs/ioctl.c +index 088462ee5a81..64776891120c 100644 +--- a/fs/ioctl.c ++++ b/fs/ioctl.c +@@ -109,9 +109,6 @@ static int ioctl_fibmap(struct file *filp, int __user *p) + * Returns 0 on success, -errno on error, 1 if this was the last + * extent that will fit in user array. + */ +-#define SET_UNKNOWN_FLAGS (FIEMAP_EXTENT_DELALLOC) +-#define SET_NO_UNMOUNTED_IO_FLAGS (FIEMAP_EXTENT_DATA_ENCRYPTED) +-#define SET_NOT_ALIGNED_FLAGS (FIEMAP_EXTENT_DATA_TAIL|FIEMAP_EXTENT_DATA_INLINE) + int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical, + u64 phys, u64 len, u32 flags) + { +@@ -127,6 +124,10 @@ int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical, + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) + return 1; + ++#define SET_UNKNOWN_FLAGS (FIEMAP_EXTENT_DELALLOC) ++#define SET_NO_UNMOUNTED_IO_FLAGS (FIEMAP_EXTENT_DATA_ENCRYPTED) ++#define SET_NOT_ALIGNED_FLAGS (FIEMAP_EXTENT_DATA_TAIL|FIEMAP_EXTENT_DATA_INLINE) ++ + if (flags & SET_UNKNOWN_FLAGS) + flags |= FIEMAP_EXTENT_UNKNOWN; + if (flags & SET_NO_UNMOUNTED_IO_FLAGS) +@@ -913,6 +914,9 @@ SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg) + #ifdef CONFIG_COMPAT + /** + * compat_ptr_ioctl - generic implementation of .compat_ioctl file operation ++ * @file: The file to operate on. ++ * @cmd: The ioctl command number. ++ * @arg: The argument to the ioctl. + * + * This is not normally called as a function, but instead set in struct + * file_operations as +diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c +index 5d826274570c..c429c42a6867 100644 +--- a/fs/kernel_read_file.c ++++ b/fs/kernel_read_file.c +@@ -8,16 +8,16 @@ + /** + * kernel_read_file() - read file contents into a kernel buffer + * +- * @file file to read from +- * @offset where to start reading from (see below). +- * @buf pointer to a "void *" buffer for reading into (if ++ * @file: file to read from ++ * @offset: where to start reading from (see below). ++ * @buf: pointer to a "void *" buffer for reading into (if + * *@buf is NULL, a buffer will be allocated, and + * @buf_size will be ignored) +- * @buf_size size of buf, if already allocated. If @buf not ++ * @buf_size: size of buf, if already allocated. If @buf not + * allocated, this is the largest size to allocate. +- * @file_size if non-NULL, the full size of @file will be ++ * @file_size: if non-NULL, the full size of @file will be + * written here. +- * @id the kernel_read_file_id identifying the type of ++ * @id: the kernel_read_file_id identifying the type of + * file contents being read (for LSMs to examine) + * + * @offset must be 0 unless both @buf and @file_size are non-NULL +diff --git a/fs/namei.c b/fs/namei.c +index 0a4b15d9a010..23c73afe57d3 100644 +--- a/fs/namei.c ++++ b/fs/namei.c +@@ -644,6 +644,8 @@ static bool nd_alloc_stack(struct nameidata *nd) + + /** + * path_connected - Verify that a dentry is below mnt.mnt_root ++ * @mnt: The mountpoint to check. ++ * @dentry: The dentry to check. + * + * Rename can sometimes move a file or directory outside of a bind + * mount, path_connected allows those cases to be detected. +@@ -1083,6 +1085,7 @@ fs_initcall(init_fs_namei_sysctls); + /** + * may_follow_link - Check symlink following for unsafe situations + * @nd: nameidata pathwalk data ++ * @inode: Used for idmapping. + * + * In the case of the sysctl_protected_symlinks sysctl being enabled, + * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is +diff --git a/fs/open.c b/fs/open.c +index e2419242456e..ef2cc51d468c 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -1126,7 +1126,7 @@ EXPORT_SYMBOL_GPL(kernel_file_open); + * backing_file_open - open a backing file for kernel internal use + * @path: path of the file to open + * @flags: open flags +- * @path: path of the backing file ++ * @real_path: path of the backing file + * @cred: credentials for open + * + * Open a backing file for a stackable filesystem (e.g., overlayfs). +@@ -1534,7 +1534,7 @@ SYSCALL_DEFINE1(close, unsigned int, fd) + } + + /** +- * close_range() - Close all file descriptors in a given range. ++ * sys_close_range() - Close all file descriptors in a given range. + * + * @fd: starting file descriptor to close + * @max_fd: last file descriptor to close +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1328-fs-rename-mnt-want-drop-write-helpers.patch b/SOURCES/1328-fs-rename-mnt-want-drop-write-helpers.patch new file mode 100644 index 000000000..6b2b43c62 --- /dev/null +++ b/SOURCES/1328-fs-rename-mnt-want-drop-write-helpers.patch @@ -0,0 +1,300 @@ +From 484050dcbd77701cd8a295037aaa6365e0dc2f4e Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 09:57:52 +0200 +Subject: [PATCH] fs: rename __mnt_{want,drop}_write*() helpers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - fs/inode.c: context fuzz + - fs/internal.h: dropped one hunk that modifies a comment in + sb_start_ro_state_change(), which is not present downstream + - fs/namespace.c: needed to rename also exports because of downstream + commit fb4415394e59 ("fs: export mnt_{get,put}_write_access() to modules") + - fs/overlayfs/util.c: previous backport introduced __mnt_...() + callers that also need to be renamed here + +commit 3e15dcf77b23b8e9b9b7f3c0d4def8fe9c12c534 +Author: Amir Goldstein +Date: Fri Sep 8 16:28:59 2023 +0300 + + fs: rename __mnt_{want,drop}_write*() helpers + + Before exporting these helpers to modules, make their names more + meaningful. + + The names mnt_{get,put)_write_access*() were chosen, because they rhyme + with the inode {get,put)_write_access() helpers, which have a very close + meaning for the inode object. + + Suggested-by: Christian Brauner + Link: https://lore.kernel.org/r/20230817-anfechtbar-ruhelosigkeit-8c6cca8443fc@brauner/ + Signed-off-by: Amir Goldstein + Message-Id: <20230908132900.2983519-2-amir73il@gmail.com> + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/inode.c b/fs/inode.c +index 2bc233f8db22..fc484773431b 100644 +--- a/fs/inode.c ++++ b/fs/inode.c +@@ -1980,7 +1980,7 @@ void touch_atime(const struct path *path) + if (!sb_start_write_trylock(inode->i_sb)) + return; + +- if (__mnt_want_write(mnt) != 0) ++ if (mnt_get_write_access(mnt) != 0) + goto skip_update; + /* + * File systems can error out when updating inodes if they need to +@@ -1993,7 +1993,7 @@ void touch_atime(const struct path *path) + */ + now = current_time(inode); + update_time(inode, &now, S_ATIME); +- __mnt_drop_write(mnt); ++ mnt_put_write_access(mnt); + skip_update: + sb_end_write(inode->i_sb); + } +@@ -2110,9 +2110,9 @@ static int __file_update_time(struct file *file, struct timespec64 *now, + struct inode *inode = file_inode(file); + + /* try to update time settings */ +- if (!__mnt_want_write_file(file)) { ++ if (!mnt_get_write_access_file(file)) { + ret = update_time(inode, now, sync_mode); +- __mnt_drop_write_file(file); ++ mnt_put_write_access_file(file); + } + + return ret; +diff --git a/fs/internal.h b/fs/internal.h +index 62b558fa6395..85bf69115cf4 100644 +--- a/fs/internal.h ++++ b/fs/internal.h +@@ -76,8 +76,8 @@ extern int sb_prepare_remount_readonly(struct super_block *); + + extern void __init mnt_init(void); + +-extern int __mnt_want_write_file(struct file *); +-extern void __mnt_drop_write_file(struct file *); ++int mnt_get_write_access_file(struct file *file); ++void mnt_put_write_access_file(struct file *file); + + extern void dissolve_on_fput(struct vfsmount *); + +@@ -103,7 +103,7 @@ static inline void put_file_access(struct file *file) + i_readcount_dec(file->f_inode); + } else if (file->f_mode & FMODE_WRITER) { + put_write_access(file->f_inode); +- __mnt_drop_write(file->f_path.mnt); ++ mnt_put_write_access(file->f_path.mnt); + } + } + +diff --git a/fs/namespace.c b/fs/namespace.c +index d032d84d66ed..218f1b77bf56 100644 +--- a/fs/namespace.c ++++ b/fs/namespace.c +@@ -333,16 +333,16 @@ static int mnt_is_readonly(struct vfsmount *mnt) + * can determine when writes are able to occur to a filesystem. + */ + /** +- * __mnt_want_write - get write access to a mount without freeze protection ++ * mnt_get_write_access - get write access to a mount without freeze protection + * @m: the mount on which to take a write + * + * This tells the low-level filesystem that a write is about to be performed to + * it, and makes sure that writes are allowed (mnt it read-write) before + * returning success. This operation does not protect against filesystem being +- * frozen. When the write operation is finished, __mnt_drop_write() must be ++ * frozen. When the write operation is finished, mnt_put_write_access() must be + * called. This is effectively a refcount. + */ +-int __mnt_want_write(struct vfsmount *m) ++int mnt_get_write_access(struct vfsmount *m) + { + struct mount *mnt = real_mount(m); + int ret = 0; +@@ -371,7 +371,7 @@ int __mnt_want_write(struct vfsmount *m) + + return ret; + } +-EXPORT_SYMBOL_GPL(__mnt_want_write); ++EXPORT_SYMBOL_GPL(mnt_get_write_access); + + /** + * mnt_want_write - get write access to a mount +@@ -387,7 +387,7 @@ int mnt_want_write(struct vfsmount *m) + int ret; + + sb_start_write(m->mnt_sb); +- ret = __mnt_want_write(m); ++ ret = mnt_get_write_access(m); + if (ret) + sb_end_write(m->mnt_sb); + return ret; +@@ -395,15 +395,15 @@ int mnt_want_write(struct vfsmount *m) + EXPORT_SYMBOL_GPL(mnt_want_write); + + /** +- * __mnt_want_write_file - get write access to a file's mount ++ * mnt_get_write_access_file - get write access to a file's mount + * @file: the file who's mount on which to take a write + * +- * This is like __mnt_want_write, but if the file is already open for writing it ++ * This is like mnt_get_write_access, but if @file is already open for write it + * skips incrementing mnt_writers (since the open file already has a reference) + * and instead only does the check for emergency r/o remounts. This must be +- * paired with __mnt_drop_write_file. ++ * paired with mnt_put_write_access_file. + */ +-int __mnt_want_write_file(struct file *file) ++int mnt_get_write_access_file(struct file *file) + { + if (file->f_mode & FMODE_WRITER) { + /* +@@ -414,7 +414,7 @@ int __mnt_want_write_file(struct file *file) + return -EROFS; + return 0; + } +- return __mnt_want_write(file->f_path.mnt); ++ return mnt_get_write_access(file->f_path.mnt); + } + + /** +@@ -431,7 +431,7 @@ int mnt_want_write_file(struct file *file) + int ret; + + sb_start_write(file_inode(file)->i_sb); +- ret = __mnt_want_write_file(file); ++ ret = mnt_get_write_access_file(file); + if (ret) + sb_end_write(file_inode(file)->i_sb); + return ret; +@@ -439,20 +439,20 @@ int mnt_want_write_file(struct file *file) + EXPORT_SYMBOL_GPL(mnt_want_write_file); + + /** +- * __mnt_drop_write - give up write access to a mount ++ * mnt_put_write_access - give up write access to a mount + * @mnt: the mount on which to give up write access + * + * Tells the low-level filesystem that we are done + * performing writes to it. Must be matched with +- * __mnt_want_write() call above. ++ * mnt_get_write_access() call above. + */ +-void __mnt_drop_write(struct vfsmount *mnt) ++void mnt_put_write_access(struct vfsmount *mnt) + { + preempt_disable(); + mnt_dec_writers(real_mount(mnt)); + preempt_enable(); + } +-EXPORT_SYMBOL_GPL(__mnt_drop_write); ++EXPORT_SYMBOL_GPL(mnt_put_write_access); + + /** + * mnt_drop_write - give up write access to a mount +@@ -464,20 +464,20 @@ EXPORT_SYMBOL_GPL(__mnt_drop_write); + */ + void mnt_drop_write(struct vfsmount *mnt) + { +- __mnt_drop_write(mnt); ++ mnt_put_write_access(mnt); + sb_end_write(mnt->mnt_sb); + } + EXPORT_SYMBOL_GPL(mnt_drop_write); + +-void __mnt_drop_write_file(struct file *file) ++void mnt_put_write_access_file(struct file *file) + { + if (!(file->f_mode & FMODE_WRITER)) +- __mnt_drop_write(file->f_path.mnt); ++ mnt_put_write_access(file->f_path.mnt); + } + + void mnt_drop_write_file(struct file *file) + { +- __mnt_drop_write_file(file); ++ mnt_put_write_access_file(file); + sb_end_write(file_inode(file)->i_sb); + } + EXPORT_SYMBOL(mnt_drop_write_file); +diff --git a/fs/open.c b/fs/open.c +index ef2cc51d468c..b75b1ab6305b 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -867,7 +867,7 @@ static int do_dentry_open(struct file *f, + error = get_write_access(inode); + if (unlikely(error)) + goto cleanup_file; +- error = __mnt_want_write(f->f_path.mnt); ++ error = mnt_get_write_access(f->f_path.mnt); + if (unlikely(error)) { + put_write_access(inode); + goto cleanup_file; +diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c +index 6b31b6587e4d..81ef76c77cab 100644 +--- a/fs/overlayfs/util.c ++++ b/fs/overlayfs/util.c +@@ -21,7 +21,7 @@ + int ovl_get_write_access(struct dentry *dentry) + { + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); +- return __mnt_want_write(ovl_upper_mnt(ofs)); ++ return mnt_get_write_access(ovl_upper_mnt(ofs)); + } + + /* Get write access to upper sb - may block if upper sb is frozen */ +@@ -40,7 +40,7 @@ int ovl_want_write(struct dentry *dentry) + void ovl_put_write_access(struct dentry *dentry) + { + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); +- __mnt_drop_write(ovl_upper_mnt(ofs)); ++ mnt_put_write_access(ovl_upper_mnt(ofs)); + } + + void ovl_end_write(struct dentry *dentry) +diff --git a/include/linux/mount.h b/include/linux/mount.h +index 37dc2a161f73..0f214b0a0992 100644 +--- a/include/linux/mount.h ++++ b/include/linux/mount.h +@@ -92,8 +92,8 @@ extern bool __mnt_is_readonly(struct vfsmount *mnt); + extern bool mnt_may_suid(struct vfsmount *mnt); + + extern struct vfsmount *clone_private_mount(const struct path *path); +-extern int __mnt_want_write(struct vfsmount *); +-extern void __mnt_drop_write(struct vfsmount *); ++int mnt_get_write_access(struct vfsmount *mnt); ++void mnt_put_write_access(struct vfsmount *mnt); + + extern struct vfsmount *fc_mount(struct fs_context *fc); + extern struct vfsmount *fc_mount_longterm(struct fs_context *fc); +diff --git a/kernel/acct.c b/kernel/acct.c +index bbea312b9d76..b3e00389d42d 100644 +--- a/kernel/acct.c ++++ b/kernel/acct.c +@@ -235,7 +235,7 @@ static int acct_on(struct filename *pathname) + filp_close(file, NULL); + return PTR_ERR(internal); + } +- err = __mnt_want_write(internal); ++ err = mnt_get_write_access(internal); + if (err) { + mntput(internal); + kfree(acct); +@@ -260,7 +260,7 @@ static int acct_on(struct filename *pathname) + old = xchg(&ns->bacct, &acct->pin); + mutex_unlock(&acct->lock); + pin_kill(old); +- __mnt_drop_write(mnt); ++ mnt_put_write_access(mnt); + mntput(mnt); + return 0; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1329-fs-get-mnt-writers-count-for-an-open-backing-file-s-real-pat.patch b/SOURCES/1329-fs-get-mnt-writers-count-for-an-open-backing-file-s-real-pat.patch new file mode 100644 index 000000000..beba5b7f5 --- /dev/null +++ b/SOURCES/1329-fs-get-mnt-writers-count-for-an-open-backing-file-s-real-pat.patch @@ -0,0 +1,109 @@ +From fed3cc66e15fba2f8b48257cfbe82ab6eac11a39 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:35:52 +0200 +Subject: [PATCH] fs: get mnt_writers count for an open backing file's real + path + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 83bc1d294130cc471a89ce10770daa281a93fcb0 +Author: Amir Goldstein +Date: Mon Oct 9 18:37:10 2023 +0300 + + fs: get mnt_writers count for an open backing file's real path + + A writeable mapped backing file can perform writes to the real inode. + Therefore, the real path mount must be kept writable so long as the + writable map exists. + + This may not be strictly needed for ovelrayfs private upper mount, + but it is correct to take the mnt_writers count in the vfs helper. + + Signed-off-by: Amir Goldstein + Link: https://lore.kernel.org/r/20231009153712.1566422-2-amir73il@gmail.com + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/internal.h b/fs/internal.h +index 85bf69115cf4..29369382249d 100644 +--- a/fs/internal.h ++++ b/fs/internal.h +@@ -97,13 +97,20 @@ struct file *alloc_empty_file(int flags, const struct cred *cred); + struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred); + struct file *alloc_empty_backing_file(int flags, const struct cred *cred); + ++static inline void file_put_write_access(struct file *file) ++{ ++ put_write_access(file->f_inode); ++ mnt_put_write_access(file->f_path.mnt); ++ if (unlikely(file->f_mode & FMODE_BACKING)) ++ mnt_put_write_access(backing_file_real_path(file)->mnt); ++} ++ + static inline void put_file_access(struct file *file) + { + if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { + i_readcount_dec(file->f_inode); + } else if (file->f_mode & FMODE_WRITER) { +- put_write_access(file->f_inode); +- mnt_put_write_access(file->f_path.mnt); ++ file_put_write_access(file); + } + } + +diff --git a/fs/open.c b/fs/open.c +index b75b1ab6305b..64e4bbd1f28c 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -842,6 +842,30 @@ SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group) + return ksys_fchown(fd, user, group); + } + ++static inline int file_get_write_access(struct file *f) ++{ ++ int error; ++ ++ error = get_write_access(f->f_inode); ++ if (unlikely(error)) ++ return error; ++ error = mnt_get_write_access(f->f_path.mnt); ++ if (unlikely(error)) ++ goto cleanup_inode; ++ if (unlikely(f->f_mode & FMODE_BACKING)) { ++ error = mnt_get_write_access(backing_file_real_path(f)->mnt); ++ if (unlikely(error)) ++ goto cleanup_mnt; ++ } ++ return 0; ++ ++cleanup_mnt: ++ mnt_put_write_access(f->f_path.mnt); ++cleanup_inode: ++ put_write_access(f->f_inode); ++ return error; ++} ++ + static int do_dentry_open(struct file *f, + struct inode *inode, + int (*open)(struct inode *, struct file *)) +@@ -864,14 +888,9 @@ static int do_dentry_open(struct file *f, + if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { + i_readcount_inc(inode); + } else if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { +- error = get_write_access(inode); ++ error = file_get_write_access(f); + if (unlikely(error)) + goto cleanup_file; +- error = mnt_get_write_access(f->f_path.mnt); +- if (unlikely(error)) { +- put_write_access(inode); +- goto cleanup_file; +- } + f->f_mode |= FMODE_WRITER; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1330-fs-create-helper-file-user-path-for-user-displayed-mapped-fi.patch b/SOURCES/1330-fs-create-helper-file-user-path-for-user-displayed-mapped-fi.patch new file mode 100644 index 000000000..2b21a7f5b --- /dev/null +++ b/SOURCES/1330-fs-create-helper-file-user-path-for-user-displayed-mapped-fi.patch @@ -0,0 +1,153 @@ +From f3d228ade9542a4fac0e0d4d2721e2a94f2d6d98 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:36:39 +0200 +Subject: [PATCH] fs: create helper file_user_path() for user displayed mapped + file path + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 08582d678fcf11fc86188f0b92239d3d49667d8e +Author: Amir Goldstein +Date: Mon Oct 9 18:37:11 2023 +0300 + + fs: create helper file_user_path() for user displayed mapped file path + + Overlayfs uses backing files with "fake" overlayfs f_path and "real" + underlying f_inode, in order to use underlying inode aops for mapped + files and to display the overlayfs path in /proc//maps. + + In preparation for storing the overlayfs "fake" path instead of the + underlying "real" path in struct backing_file, define a noop helper + file_user_path() that returns f_path for now. + + Use the new helper in procfs and kernel logs whenever a path of a + mapped file is displayed to users. + + Signed-off-by: Amir Goldstein + Link: https://lore.kernel.org/r/20231009153712.1566422-3-amir73il@gmail.com + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c +index 7654c2e42dc0..134c48374ecd 100644 +--- a/arch/arc/kernel/troubleshoot.c ++++ b/arch/arc/kernel/troubleshoot.c +@@ -90,10 +90,12 @@ static void show_faulting_vma(unsigned long address) + */ + if (vma) { + char buf[ARC_PATH_MAX]; +- char *nm = "?"; ++ char *nm = "anon"; + + if (vma->vm_file) { +- nm = file_path(vma->vm_file, buf, ARC_PATH_MAX-1); ++ /* XXX: can we use %pD below and get rid of buf? */ ++ nm = d_path(file_user_path(vma->vm_file), buf, ++ ARC_PATH_MAX-1); + if (IS_ERR(nm)) + nm = "?"; + } +diff --git a/fs/proc/base.c b/fs/proc/base.c +index dbb251465954..cb79b5f1d459 100644 +--- a/fs/proc/base.c ++++ b/fs/proc/base.c +@@ -2190,7 +2190,7 @@ static int map_files_get_link(struct dentry *dentry, struct path *path) + rc = -ENOENT; + vma = find_exact_vma(mm, vm_start, vm_end); + if (vma && vma->vm_file) { +- *path = vma->vm_file->f_path; ++ *path = *file_user_path(vma->vm_file); + path_get(path); + rc = 0; + } +diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c +index 13452b32e2bd..b7e06be41224 100644 +--- a/fs/proc/nommu.c ++++ b/fs/proc/nommu.c +@@ -59,7 +59,7 @@ static int nommu_region_show(struct seq_file *m, struct vm_region *region) + + if (file) { + seq_pad(m, ' '); +- seq_file_path(m, file, ""); ++ seq_path(m, file_user_path(file), ""); + } + + seq_putc(m, '\n'); +diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c +index e396e52ca096..6180dc935136 100644 +--- a/fs/proc/task_mmu.c ++++ b/fs/proc/task_mmu.c +@@ -295,7 +295,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) + if (anon_name) + seq_printf(m, "[anon_shmem:%s]", anon_name->name); + else +- seq_file_path(m, file, "\n"); ++ seq_path(m, file_user_path(file), "\n"); + goto done; + } + +@@ -1952,7 +1952,7 @@ static int show_numa_map(struct seq_file *m, void *v) + + if (file) { + seq_puts(m, " file="); +- seq_file_path(m, file, "\n\t= "); ++ seq_path(m, file_user_path(file), "\n\t= "); + } else if (vma_is_initial_heap(vma)) { + seq_puts(m, " heap"); + } else if (vma_is_initial_stack(vma)) { +diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c +index 4d52623e1bff..a3822c149f12 100644 +--- a/fs/proc/task_nommu.c ++++ b/fs/proc/task_nommu.c +@@ -162,7 +162,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) + + if (file) { + seq_pad(m, ' '); +- seq_file_path(m, file, ""); ++ seq_path(m, file_user_path(file), ""); + } else if (mm && vma_is_initial_stack(vma)) { + seq_pad(m, ' '); + seq_puts(m, "[stack]"); +diff --git a/include/linux/fs.h b/include/linux/fs.h +index 48ec31b9d230..4ee6ed9e2634 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -2607,6 +2607,20 @@ static inline const struct path *file_real_path(struct file *f) + return &f->f_path; + } + ++/* ++ * file_user_path - get the path to display for memory mapped file ++ * ++ * When mmapping a file on a stackable filesystem (e.g., overlayfs), the file ++ * stored in ->vm_file is a backing file whose f_inode is on the underlying ++ * filesystem. When the mapped file path is displayed to user (e.g. via ++ * /proc//maps), this helper should be used to get the path to display ++ * to the user, which is the path of the fd that user has requested to map. ++ */ ++static inline const struct path *file_user_path(struct file *f) ++{ ++ return &f->f_path; ++} ++ + static inline struct file *file_clone_open(struct file *file) + { + return dentry_open(&file->f_path, file->f_flags, file->f_cred); +diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c +index 0fda3619c425..6d89bc793c96 100644 +--- a/kernel/trace/trace_output.c ++++ b/kernel/trace/trace_output.c +@@ -405,7 +405,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, + vmstart = vma->vm_start; + } + if (file) { +- ret = trace_seq_path(s, &file->f_path); ++ ret = trace_seq_path(s, file_user_path(file)); + if (ret) + trace_seq_printf(s, "[+0x%lx]", + ip - vmstart); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1331-fs-store-real-path-instead-of-fake-path-in-backing-file-f-pa.patch b/SOURCES/1331-fs-store-real-path-instead-of-fake-path-in-backing-file-f-pa.patch new file mode 100644 index 000000000..829a6cdfa --- /dev/null +++ b/SOURCES/1331-fs-store-real-path-instead-of-fake-path-in-backing-file-f-pa.patch @@ -0,0 +1,240 @@ +From da5c3bc8093871a2c6c90187b645cc29cc0230ef Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:37:33 +0200 +Subject: [PATCH] fs: store real path instead of fake path in backing file + f_path + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - fs/internal.h: context fuzz + +commit def3ae83da02f87005210fa3d448c5dd37ba4105 +Author: Amir Goldstein +Date: Mon Oct 9 18:37:12 2023 +0300 + + fs: store real path instead of fake path in backing file f_path + + A backing file struct stores two path's, one "real" path that is referring + to f_inode and one "fake" path, which should be displayed to users in + /proc//maps. + + There is a lot more potential code that needs to know the "real" path, then + code that needs to know the "fake" path. + + Instead of code having to request the "real" path with file_real_path(), + store the "real" path in f_path and require code that needs to know the + "fake" path request it with file_user_path(). + Replace the file_real_path() helper with a simple const accessor f_path(). + + After this change, file_dentry() is not expected to observe any files + with overlayfs f_path and real f_inode, so the call to ->d_real() should + not be needed. Leave the ->d_real() call for now and add an assertion + in ovl_d_real() to catch if we made wrong assumptions. + + Suggested-by: Miklos Szeredi + Link: https://lore.kernel.org/r/CAJfpegtt48eXhhjDFA1ojcHPNKj3Go6joryCPtEFAKpocyBsnw@mail.gmail.com/ + Signed-off-by: Amir Goldstein + Link: https://lore.kernel.org/r/20231009153712.1566422-4-amir73il@gmail.com + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/file_table.c b/fs/file_table.c +index b0a8c2608530..e5c7b9705109 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -44,10 +44,10 @@ static struct kmem_cache *filp_cachep __read_mostly; + + static struct percpu_counter nr_files __cacheline_aligned_in_smp; + +-/* Container for backing file with optional real path */ ++/* Container for backing file with optional user path */ + struct backing_file { + struct file file; +- struct path real_path; ++ struct path user_path; + }; + + static inline struct backing_file *backing_file(struct file *f) +@@ -55,11 +55,11 @@ static inline struct backing_file *backing_file(struct file *f) + return container_of(f, struct backing_file, file); + } + +-struct path *backing_file_real_path(struct file *f) ++struct path *backing_file_user_path(struct file *f) + { +- return &backing_file(f)->real_path; ++ return &backing_file(f)->user_path; + } +-EXPORT_SYMBOL_GPL(backing_file_real_path); ++EXPORT_SYMBOL_GPL(backing_file_user_path); + + static void file_free_rcu(struct rcu_head *head) + { +@@ -76,7 +76,7 @@ static inline void file_free(struct file *f) + { + security_file_free(f); + if (unlikely(f->f_mode & FMODE_BACKING)) +- path_put(backing_file_real_path(f)); ++ path_put(backing_file_user_path(f)); + if (likely(!(f->f_mode & FMODE_NOACCOUNT))) + percpu_counter_dec(&nr_files); + call_rcu(&f->f_u.fu_rcuhead, file_free_rcu); +diff --git a/fs/internal.h b/fs/internal.h +index 29369382249d..bd0934d0521b 100644 +--- a/fs/internal.h ++++ b/fs/internal.h +@@ -102,7 +102,7 @@ static inline void file_put_write_access(struct file *file) + put_write_access(file->f_inode); + mnt_put_write_access(file->f_path.mnt); + if (unlikely(file->f_mode & FMODE_BACKING)) +- mnt_put_write_access(backing_file_real_path(file)->mnt); ++ mnt_put_write_access(backing_file_user_path(file)->mnt); + } + + static inline void put_file_access(struct file *file) +diff --git a/fs/open.c b/fs/open.c +index 64e4bbd1f28c..45547548a0e5 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -853,7 +853,7 @@ static inline int file_get_write_access(struct file *f) + if (unlikely(error)) + goto cleanup_inode; + if (unlikely(f->f_mode & FMODE_BACKING)) { +- error = mnt_get_write_access(backing_file_real_path(f)->mnt); ++ error = mnt_get_write_access(backing_file_user_path(f)->mnt); + if (unlikely(error)) + goto cleanup_mnt; + } +@@ -1143,20 +1143,19 @@ EXPORT_SYMBOL_GPL(kernel_file_open); + + /** + * backing_file_open - open a backing file for kernel internal use +- * @path: path of the file to open ++ * @user_path: path that the user reuqested to open + * @flags: open flags + * @real_path: path of the backing file + * @cred: credentials for open + * + * Open a backing file for a stackable filesystem (e.g., overlayfs). +- * @path may be on the stackable filesystem and backing inode on the +- * underlying filesystem. In this case, we want to be able to return +- * the @real_path of the backing inode. This is done by embedding the +- * returned file into a container structure that also stores the path of +- * the backing inode on the underlying filesystem, which can be +- * retrieved using backing_file_real_path(). ++ * @user_path may be on the stackable filesystem and @real_path on the ++ * underlying filesystem. In this case, we want to be able to return the ++ * @user_path of the stackable filesystem. This is done by embedding the ++ * returned file into a container structure that also stores the stacked ++ * file's path, which can be retrieved using backing_file_user_path(). + */ +-struct file *backing_file_open(const struct path *path, int flags, ++struct file *backing_file_open(const struct path *user_path, int flags, + const struct path *real_path, + const struct cred *cred) + { +@@ -1167,9 +1166,9 @@ struct file *backing_file_open(const struct path *path, int flags, + if (IS_ERR(f)) + return f; + +- f->f_path = *path; +- path_get(real_path); +- *backing_file_real_path(f) = *real_path; ++ path_get(user_path); ++ *backing_file_user_path(f) = *user_path; ++ f->f_path = *real_path; + error = do_dentry_open(f, d_inode(real_path->dentry), NULL); + if (error) { + fput(f); +diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c +index c49b1e7575d3..0779c8290ec4 100644 +--- a/fs/overlayfs/super.c ++++ b/fs/overlayfs/super.c +@@ -34,14 +34,22 @@ static struct dentry *ovl_d_real(struct dentry *dentry, + struct dentry *real = NULL, *lower; + int err; + +- /* It's an overlay file */ ++ /* ++ * vfs is only expected to call d_real() with NULL from d_real_inode() ++ * and with overlay inode from file_dentry() on an overlay file. ++ * ++ * TODO: remove @inode argument from d_real() API, remove code in this ++ * function that deals with non-NULL @inode and remove d_real() call ++ * from file_dentry(). ++ */ + if (inode && d_inode(dentry) == inode) + return dentry; ++ else if (inode) ++ goto bug; + + if (!d_is_reg(dentry)) { +- if (!inode || inode == d_inode(dentry)) +- return dentry; +- goto bug; ++ /* d_real_inode() is only relevant for regular files */ ++ return dentry; + } + + real = ovl_dentry_upper(dentry); +diff --git a/include/linux/fs.h b/include/linux/fs.h +index 4ee6ed9e2634..1927fcdad989 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -2586,26 +2586,10 @@ struct file *dentry_open(const struct path *path, int flags, + const struct cred *creds); + struct file *dentry_create(const struct path *path, int flags, umode_t mode, + const struct cred *cred); +-struct file *backing_file_open(const struct path *path, int flags, ++struct file *backing_file_open(const struct path *user_path, int flags, + const struct path *real_path, + const struct cred *cred); +-struct path *backing_file_real_path(struct file *f); +- +-/* +- * file_real_path - get the path corresponding to f_inode +- * +- * When opening a backing file for a stackable filesystem (e.g., +- * overlayfs) f_path may be on the stackable filesystem and f_inode on +- * the underlying filesystem. When the path associated with f_inode is +- * needed, this helper should be used instead of accessing f_path +- * directly. +-*/ +-static inline const struct path *file_real_path(struct file *f) +-{ +- if (unlikely(f->f_mode & FMODE_BACKING)) +- return backing_file_real_path(f); +- return &f->f_path; +-} ++struct path *backing_file_user_path(struct file *f); + + /* + * file_user_path - get the path to display for memory mapped file +@@ -2618,6 +2602,8 @@ static inline const struct path *file_real_path(struct file *f) + */ + static inline const struct path *file_user_path(struct file *f) + { ++ if (unlikely(f->f_mode & FMODE_BACKING)) ++ return backing_file_user_path(f); + return &f->f_path; + } + +diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h +index ed48e4f1e755..bcb6609b54b3 100644 +--- a/include/linux/fsnotify.h ++++ b/include/linux/fsnotify.h +@@ -96,8 +96,7 @@ static inline int fsnotify_file(struct file *file, __u32 mask) + if (file->f_mode & FMODE_NONOTIFY) + return 0; + +- /* Overlayfs internal files have fake f_path */ +- path = file_real_path(file); ++ path = &file->f_path; + return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1332-fs-prepare-for-stackable-filesystems-backing-file-helpers.patch b/SOURCES/1332-fs-prepare-for-stackable-filesystems-backing-file-helpers.patch new file mode 100644 index 000000000..5937137c0 --- /dev/null +++ b/SOURCES/1332-fs-prepare-for-stackable-filesystems-backing-file-helpers.patch @@ -0,0 +1,243 @@ +From ecbb1b105f3616d4342dba0f840dcf9265dfbea0 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:42:36 +0200 +Subject: [PATCH] fs: prepare for stackable filesystems backing file helpers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit f91a704f7161c2cf0fcd41fa9fbec4355b813fff +Author: Amir Goldstein +Date: Mon Oct 2 17:19:46 2023 +0300 + + fs: prepare for stackable filesystems backing file helpers + + In preparation for factoring out some backing file io helpers from + overlayfs, move backing_file_open() into a new file fs/backing-file.c + and header. + + Add a MAINTAINERS entry for stackable filesystems and add a Kconfig + FS_STACK which stackable filesystems need to select. + + For now, the backing_file struct, the backing_file alloc/free functions + and the backing_file_real_path() accessor remain internal to file_table.c. + We may change that in the future. + + Signed-off-by: Amir Goldstein + +Signed-off-by: Ondrej Mosnacek + +diff --git a/MAINTAINERS b/MAINTAINERS +index f5dcab467670..3a29f2d3a2b1 100644 +--- a/MAINTAINERS ++++ b/MAINTAINERS +@@ -7475,6 +7475,15 @@ F: fs/mnt_idmapping.c + F: include/linux/mnt_idmapping.* + F: tools/testing/selftests/mount_setattr/ + ++FILESYSTEMS [STACKABLE] ++M: Miklos Szeredi ++M: Amir Goldstein ++L: linux-fsdevel@vger.kernel.org ++L: linux-unionfs@vger.kernel.org ++S: Maintained ++F: fs/backing-file.c ++F: include/linux/backing-file.h ++ + FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER + M: Riku Voipio + L: linux-hwmon@vger.kernel.org +diff --git a/fs/Kconfig b/fs/Kconfig +index 5378e55f87d3..a9c6fa9cff1f 100644 +--- a/fs/Kconfig ++++ b/fs/Kconfig +@@ -18,6 +18,10 @@ config VALIDATE_FS_PARSER + config FS_IOMAP + bool + ++# Stackable filesystems ++config FS_STACK ++ bool ++ + config BUFFER_HEAD + bool + +diff --git a/fs/Makefile b/fs/Makefile +index 0da17ff145c6..716c9fe04dec 100644 +--- a/fs/Makefile ++++ b/fs/Makefile +@@ -41,6 +41,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o + obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o + obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o + ++obj-$(CONFIG_FS_STACK) += backing-file.o + obj-$(CONFIG_FS_MBCACHE) += mbcache.o + obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o + obj-$(CONFIG_NFS_COMMON) += nfs_common/ +diff --git a/fs/backing-file.c b/fs/backing-file.c +new file mode 100644 +index 000000000000..04b33036f709 +--- /dev/null ++++ b/fs/backing-file.c +@@ -0,0 +1,48 @@ ++// SPDX-License-Identifier: GPL-2.0-only ++/* ++ * Common helpers for stackable filesystems and backing files. ++ * ++ * Copyright (C) 2023 CTERA Networks. ++ */ ++ ++#include ++#include ++ ++#include "internal.h" ++ ++/** ++ * backing_file_open - open a backing file for kernel internal use ++ * @user_path: path that the user reuqested to open ++ * @flags: open flags ++ * @real_path: path of the backing file ++ * @cred: credentials for open ++ * ++ * Open a backing file for a stackable filesystem (e.g., overlayfs). ++ * @user_path may be on the stackable filesystem and @real_path on the ++ * underlying filesystem. In this case, we want to be able to return the ++ * @user_path of the stackable filesystem. This is done by embedding the ++ * returned file into a container structure that also stores the stacked ++ * file's path, which can be retrieved using backing_file_user_path(). ++ */ ++struct file *backing_file_open(const struct path *user_path, int flags, ++ const struct path *real_path, ++ const struct cred *cred) ++{ ++ struct file *f; ++ int error; ++ ++ f = alloc_empty_backing_file(flags, cred); ++ if (IS_ERR(f)) ++ return f; ++ ++ path_get(user_path); ++ *backing_file_user_path(f) = *user_path; ++ error = vfs_open(real_path, f); ++ if (error) { ++ fput(f); ++ f = ERR_PTR(error); ++ } ++ ++ return f; ++} ++EXPORT_SYMBOL_GPL(backing_file_open); +diff --git a/fs/open.c b/fs/open.c +index 45547548a0e5..4260d61560d4 100644 +--- a/fs/open.c ++++ b/fs/open.c +@@ -1141,44 +1141,6 @@ struct file *kernel_file_open(const struct path *path, int flags, + } + EXPORT_SYMBOL_GPL(kernel_file_open); + +-/** +- * backing_file_open - open a backing file for kernel internal use +- * @user_path: path that the user reuqested to open +- * @flags: open flags +- * @real_path: path of the backing file +- * @cred: credentials for open +- * +- * Open a backing file for a stackable filesystem (e.g., overlayfs). +- * @user_path may be on the stackable filesystem and @real_path on the +- * underlying filesystem. In this case, we want to be able to return the +- * @user_path of the stackable filesystem. This is done by embedding the +- * returned file into a container structure that also stores the stacked +- * file's path, which can be retrieved using backing_file_user_path(). +- */ +-struct file *backing_file_open(const struct path *user_path, int flags, +- const struct path *real_path, +- const struct cred *cred) +-{ +- struct file *f; +- int error; +- +- f = alloc_empty_backing_file(flags, cred); +- if (IS_ERR(f)) +- return f; +- +- path_get(user_path); +- *backing_file_user_path(f) = *user_path; +- f->f_path = *real_path; +- error = do_dentry_open(f, d_inode(real_path->dentry), NULL); +- if (error) { +- fput(f); +- f = ERR_PTR(error); +- } +- +- return f; +-} +-EXPORT_SYMBOL_GPL(backing_file_open); +- + #define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE)) + #define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC) + +diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig +index 6708e54b0e30..148d9567b5c3 100644 +--- a/fs/overlayfs/Kconfig ++++ b/fs/overlayfs/Kconfig +@@ -1,6 +1,7 @@ + # SPDX-License-Identifier: GPL-2.0-only + config OVERLAY_FS + tristate "Overlay filesystem support" ++ select FS_STACK + select EXPORTFS + help + An overlay filesystem combines two filesystems - an 'upper' filesystem +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index cd0770bb3020..634a96a65bfd 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -13,6 +13,7 @@ + #include + #include + #include ++#include + #include "overlayfs.h" + + #include "../internal.h" /* for sb_init_dio_done_wq */ +diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h +new file mode 100644 +index 000000000000..55c9e804f780 +--- /dev/null ++++ b/include/linux/backing-file.h +@@ -0,0 +1,17 @@ ++/* SPDX-License-Identifier: GPL-2.0-only */ ++/* ++ * Common helpers for stackable filesystems and backing files. ++ * ++ * Copyright (C) 2023 CTERA Networks. ++ */ ++ ++#ifndef _LINUX_BACKING_FILE_H ++#define _LINUX_BACKING_FILE_H ++ ++#include ++ ++struct file *backing_file_open(const struct path *user_path, int flags, ++ const struct path *real_path, ++ const struct cred *cred); ++ ++#endif /* _LINUX_BACKING_FILE_H */ +diff --git a/include/linux/fs.h b/include/linux/fs.h +index 1927fcdad989..5f3ca25c77e5 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -2586,9 +2586,6 @@ struct file *dentry_open(const struct path *path, int flags, + const struct cred *creds); + struct file *dentry_create(const struct path *path, int flags, umode_t mode, + const struct cred *cred); +-struct file *backing_file_open(const struct path *user_path, int flags, +- const struct path *real_path, +- const struct cred *cred); + struct path *backing_file_user_path(struct file *f); + + /* +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1333-fs-factor-out-backing-file-read-write-iter-helpers.patch b/SOURCES/1333-fs-factor-out-backing-file-read-write-iter-helpers.patch new file mode 100644 index 000000000..48da62712 --- /dev/null +++ b/SOURCES/1333-fs-factor-out-backing-file-read-write-iter-helpers.patch @@ -0,0 +1,600 @@ +From 37577a7bb4d8c217f23a2b47e8fbe66640e0588d Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:51:36 +0200 +Subject: [PATCH] fs: factor out backing_file_{read,write}_iter() helpers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - fs/overlayfs/file.c & fs/backing-file.c: carried over downstream + logic of calling *_start_write()/*_end_write() because of not + backported commits: 269aed7014b3 ("fs: move file_start_write() into + vfs_iter_write()") and 6ae654392bb5 ("fs: move kiocb_start_write() + into vfs_iocb_iter_write()") - those commits were skipped in this + MR, because vfs_iter_write() appears in K-A-B-I stable lists and + changing the convention could break partner modules + +commit a6293b3e285cd0d7692141d7981a5f144f0e2f0b +Author: Amir Goldstein +Date: Wed Nov 22 17:48:52 2023 +0200 + + fs: factor out backing_file_{read,write}_iter() helpers + + Overlayfs submits files io to backing files on other filesystems. + Factor out some common helpers to perform io to backing files, into + fs/backing-file.c. + + Suggested-by: Miklos Szeredi + Link: https://lore.kernel.org/r/CAJfpeguhmZbjP3JLqtUy0AdWaHOkAPWeP827BBWwRFEAUgnUcQ@mail.gmail.com + Signed-off-by: Amir Goldstein + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/backing-file.c b/fs/backing-file.c +index 04b33036f709..6d915a45e288 100644 +--- a/fs/backing-file.c ++++ b/fs/backing-file.c +@@ -2,6 +2,9 @@ + /* + * Common helpers for stackable filesystems and backing files. + * ++ * Forked from fs/overlayfs/file.c. ++ * ++ * Copyright (C) 2017 Red Hat, Inc. + * Copyright (C) 2023 CTERA Networks. + */ + +@@ -46,3 +49,213 @@ struct file *backing_file_open(const struct path *user_path, int flags, + return f; + } + EXPORT_SYMBOL_GPL(backing_file_open); ++ ++struct backing_aio { ++ struct kiocb iocb; ++ refcount_t ref; ++ struct kiocb *orig_iocb; ++ /* used for aio completion */ ++ void (*end_write)(struct file *); ++ struct work_struct work; ++ long res; ++}; ++ ++static struct kmem_cache *backing_aio_cachep; ++ ++#define BACKING_IOCB_MASK \ ++ (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND) ++ ++static rwf_t iocb_to_rw_flags(int flags) ++{ ++ return (__force rwf_t)(flags & BACKING_IOCB_MASK); ++} ++ ++static void backing_aio_put(struct backing_aio *aio) ++{ ++ if (refcount_dec_and_test(&aio->ref)) { ++ fput(aio->iocb.ki_filp); ++ kmem_cache_free(backing_aio_cachep, aio); ++ } ++} ++ ++static void backing_aio_cleanup(struct backing_aio *aio, long res) ++{ ++ struct kiocb *iocb = &aio->iocb; ++ struct kiocb *orig_iocb = aio->orig_iocb; ++ ++ if (iocb->ki_flags & IOCB_WRITE) ++ kiocb_end_write(iocb); ++ ++ if (aio->end_write) ++ aio->end_write(orig_iocb->ki_filp); ++ ++ orig_iocb->ki_pos = iocb->ki_pos; ++ backing_aio_put(aio); ++} ++ ++static void backing_aio_rw_complete(struct kiocb *iocb, long res) ++{ ++ struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb); ++ struct kiocb *orig_iocb = aio->orig_iocb; ++ ++ backing_aio_cleanup(aio, res); ++ orig_iocb->ki_complete(orig_iocb, res); ++} ++ ++static void backing_aio_complete_work(struct work_struct *work) ++{ ++ struct backing_aio *aio = container_of(work, struct backing_aio, work); ++ ++ backing_aio_rw_complete(&aio->iocb, aio->res); ++} ++ ++static void backing_aio_queue_completion(struct kiocb *iocb, long res) ++{ ++ struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb); ++ ++ /* ++ * Punt to a work queue to serialize updates of mtime/size. ++ */ ++ aio->res = res; ++ INIT_WORK(&aio->work, backing_aio_complete_work); ++ queue_work(file_inode(aio->orig_iocb->ki_filp)->i_sb->s_dio_done_wq, ++ &aio->work); ++} ++ ++static int backing_aio_init_wq(struct kiocb *iocb) ++{ ++ struct super_block *sb = file_inode(iocb->ki_filp)->i_sb; ++ ++ if (sb->s_dio_done_wq) ++ return 0; ++ ++ return sb_init_dio_done_wq(sb); ++} ++ ++ ++ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter, ++ struct kiocb *iocb, int flags, ++ struct backing_file_ctx *ctx) ++{ ++ struct backing_aio *aio = NULL; ++ const struct cred *old_cred; ++ ssize_t ret; ++ ++ if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING))) ++ return -EIO; ++ ++ if (!iov_iter_count(iter)) ++ return 0; ++ ++ if (iocb->ki_flags & IOCB_DIRECT && ++ !(file->f_mode & FMODE_CAN_ODIRECT)) ++ return -EINVAL; ++ ++ old_cred = override_creds(ctx->cred); ++ if (is_sync_kiocb(iocb)) { ++ rwf_t rwf = iocb_to_rw_flags(flags); ++ ++ ret = vfs_iter_read(file, iter, &iocb->ki_pos, rwf); ++ } else { ++ ret = -ENOMEM; ++ aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL); ++ if (!aio) ++ goto out; ++ ++ aio->orig_iocb = iocb; ++ kiocb_clone(&aio->iocb, iocb, get_file(file)); ++ aio->iocb.ki_complete = backing_aio_rw_complete; ++ refcount_set(&aio->ref, 2); ++ ret = vfs_iocb_iter_read(file, &aio->iocb, iter); ++ backing_aio_put(aio); ++ if (ret != -EIOCBQUEUED) ++ backing_aio_cleanup(aio, ret); ++ } ++out: ++ revert_creds(old_cred); ++ ++ if (ctx->accessed) ++ ctx->accessed(ctx->user_file); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(backing_file_read_iter); ++ ++ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, ++ struct kiocb *iocb, int flags, ++ struct backing_file_ctx *ctx) ++{ ++ const struct cred *old_cred; ++ ssize_t ret; ++ ++ if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING))) ++ return -EIO; ++ ++ if (!iov_iter_count(iter)) ++ return 0; ++ ++ ret = file_remove_privs(ctx->user_file); ++ if (ret) ++ return ret; ++ ++ if (iocb->ki_flags & IOCB_DIRECT && ++ !(file->f_mode & FMODE_CAN_ODIRECT)) ++ return -EINVAL; ++ ++ /* ++ * Stacked filesystems don't support deferred completions, don't copy ++ * this property in case it is set by the issuer. ++ */ ++ flags &= ~IOCB_DIO_CALLER_COMP; ++ ++ old_cred = override_creds(ctx->cred); ++ if (is_sync_kiocb(iocb)) { ++ rwf_t rwf = iocb_to_rw_flags(flags); ++ ++ file_start_write(file); ++ ret = vfs_iter_write(file, iter, &iocb->ki_pos, rwf); ++ file_end_write(file); ++ if (ctx->end_write) ++ ctx->end_write(ctx->user_file); ++ } else { ++ struct backing_aio *aio; ++ ++ ret = backing_aio_init_wq(iocb); ++ if (ret) ++ goto out; ++ ++ ret = -ENOMEM; ++ aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL); ++ if (!aio) ++ goto out; ++ ++ aio->orig_iocb = iocb; ++ aio->end_write = ctx->end_write; ++ kiocb_clone(&aio->iocb, iocb, get_file(file)); ++ aio->iocb.ki_flags = flags; ++ aio->iocb.ki_complete = backing_aio_queue_completion; ++ refcount_set(&aio->ref, 2); ++ kiocb_start_write(&aio->iocb); ++ ret = vfs_iocb_iter_write(file, &aio->iocb, iter); ++ backing_aio_put(aio); ++ if (ret != -EIOCBQUEUED) ++ backing_aio_cleanup(aio, ret); ++ } ++out: ++ revert_creds(old_cred); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(backing_file_write_iter); ++ ++static int __init backing_aio_init(void) ++{ ++ backing_aio_cachep = kmem_cache_create("backing_aio", ++ sizeof(struct backing_aio), ++ 0, SLAB_HWCACHE_ALIGN, NULL); ++ if (!backing_aio_cachep) ++ return -ENOMEM; ++ ++ return 0; ++} ++fs_initcall(backing_aio_init); +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index 634a96a65bfd..3eee9f45971e 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -16,19 +16,6 @@ + #include + #include "overlayfs.h" + +-#include "../internal.h" /* for sb_init_dio_done_wq */ +- +-struct ovl_aio_req { +- struct kiocb iocb; +- refcount_t ref; +- struct kiocb *orig_iocb; +- /* used for aio completion */ +- struct work_struct work; +- long res; +-}; +- +-static struct kmem_cache *ovl_aio_request_cachep; +- + static char ovl_whatisit(struct inode *inode, struct inode *realinode) + { + if (realinode != ovl_inode_upper(inode)) +@@ -271,83 +258,16 @@ static void ovl_file_accessed(struct file *file) + touch_atime(&file->f_path); + } + +-#define OVL_IOCB_MASK \ +- (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND) +- +-static rwf_t iocb_to_rw_flags(int flags) +-{ +- return (__force rwf_t)(flags & OVL_IOCB_MASK); +-} +- +-static inline void ovl_aio_put(struct ovl_aio_req *aio_req) +-{ +- if (refcount_dec_and_test(&aio_req->ref)) { +- fput(aio_req->iocb.ki_filp); +- kmem_cache_free(ovl_aio_request_cachep, aio_req); +- } +-} +- +-static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req) +-{ +- struct kiocb *iocb = &aio_req->iocb; +- struct kiocb *orig_iocb = aio_req->orig_iocb; +- +- if (iocb->ki_flags & IOCB_WRITE) { +- kiocb_end_write(iocb); +- ovl_file_modified(orig_iocb->ki_filp); +- } +- +- orig_iocb->ki_pos = iocb->ki_pos; +- ovl_aio_put(aio_req); +-} +- +-static void ovl_aio_rw_complete(struct kiocb *iocb, long res) +-{ +- struct ovl_aio_req *aio_req = container_of(iocb, +- struct ovl_aio_req, iocb); +- struct kiocb *orig_iocb = aio_req->orig_iocb; +- +- ovl_aio_cleanup_handler(aio_req); +- orig_iocb->ki_complete(orig_iocb, res); +-} +- +-static void ovl_aio_complete_work(struct work_struct *work) +-{ +- struct ovl_aio_req *aio_req = container_of(work, +- struct ovl_aio_req, work); +- +- ovl_aio_rw_complete(&aio_req->iocb, aio_req->res); +-} +- +-static void ovl_aio_queue_completion(struct kiocb *iocb, long res) +-{ +- struct ovl_aio_req *aio_req = container_of(iocb, +- struct ovl_aio_req, iocb); +- struct kiocb *orig_iocb = aio_req->orig_iocb; +- +- /* +- * Punt to a work queue to serialize updates of mtime/size. +- */ +- aio_req->res = res; +- INIT_WORK(&aio_req->work, ovl_aio_complete_work); +- queue_work(file_inode(orig_iocb->ki_filp)->i_sb->s_dio_done_wq, +- &aio_req->work); +-} +- +-static int ovl_init_aio_done_wq(struct super_block *sb) +-{ +- if (sb->s_dio_done_wq) +- return 0; +- +- return sb_init_dio_done_wq(sb); +-} +- + static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) + { + struct file *file = iocb->ki_filp; + struct fd real; +- const struct cred *old_cred; + ssize_t ret; ++ struct backing_file_ctx ctx = { ++ .cred = ovl_creds(file_inode(file)->i_sb), ++ .user_file = file, ++ .accessed = ovl_file_accessed, ++ }; + + if (!iov_iter_count(iter)) + return 0; +@@ -356,37 +276,8 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) + if (ret) + return ret; + +- ret = -EINVAL; +- if (iocb->ki_flags & IOCB_DIRECT && +- !(real.file->f_mode & FMODE_CAN_ODIRECT)) +- goto out_fdput; +- +- old_cred = ovl_override_creds(file_inode(file)->i_sb); +- if (is_sync_kiocb(iocb)) { +- rwf_t rwf = iocb_to_rw_flags(iocb->ki_flags); +- +- ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, rwf); +- } else { +- struct ovl_aio_req *aio_req; +- +- ret = -ENOMEM; +- aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); +- if (!aio_req) +- goto out; +- +- aio_req->orig_iocb = iocb; +- kiocb_clone(&aio_req->iocb, iocb, get_file(real.file)); +- aio_req->iocb.ki_complete = ovl_aio_rw_complete; +- refcount_set(&aio_req->ref, 2); +- ret = vfs_iocb_iter_read(real.file, &aio_req->iocb, iter); +- ovl_aio_put(aio_req); +- if (ret != -EIOCBQUEUED) +- ovl_aio_cleanup_handler(aio_req); +- } +-out: +- revert_creds(old_cred); +- ovl_file_accessed(file); +-out_fdput: ++ ret = backing_file_read_iter(real.file, iter, iocb, iocb->ki_flags, ++ &ctx); + fdput(real); + + return ret; +@@ -397,9 +288,13 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + struct fd real; +- const struct cred *old_cred; + ssize_t ret; + int ifl = iocb->ki_flags; ++ struct backing_file_ctx ctx = { ++ .cred = ovl_creds(inode->i_sb), ++ .user_file = file, ++ .end_write = ovl_file_modified, ++ }; + + if (!iov_iter_count(iter)) + return 0; +@@ -407,19 +302,11 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) + inode_lock(inode); + /* Update mode */ + ovl_copyattr(inode); +- ret = file_remove_privs(file); +- if (ret) +- goto out_unlock; + + ret = ovl_real_fdget(file, &real); + if (ret) + goto out_unlock; + +- ret = -EINVAL; +- if (iocb->ki_flags & IOCB_DIRECT && +- !(real.file->f_mode & FMODE_CAN_ODIRECT)) +- goto out_fdput; +- + if (!ovl_should_sync(OVL_FS(inode->i_sb))) + ifl &= ~(IOCB_DSYNC | IOCB_SYNC); + +@@ -428,42 +315,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) + * this property in case it is set by the issuer. + */ + ifl &= ~IOCB_DIO_CALLER_COMP; +- +- old_cred = ovl_override_creds(file_inode(file)->i_sb); +- if (is_sync_kiocb(iocb)) { +- rwf_t rwf = iocb_to_rw_flags(ifl); +- +- file_start_write(real.file); +- ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf); +- file_end_write(real.file); +- /* Update size */ +- ovl_file_modified(file); +- } else { +- struct ovl_aio_req *aio_req; +- +- ret = ovl_init_aio_done_wq(inode->i_sb); +- if (ret) +- goto out; +- +- ret = -ENOMEM; +- aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); +- if (!aio_req) +- goto out; +- +- aio_req->orig_iocb = iocb; +- kiocb_clone(&aio_req->iocb, iocb, get_file(real.file)); +- aio_req->iocb.ki_flags = ifl; +- aio_req->iocb.ki_complete = ovl_aio_queue_completion; +- refcount_set(&aio_req->ref, 2); +- kiocb_start_write(&aio_req->iocb); +- ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); +- ovl_aio_put(aio_req); +- if (ret != -EIOCBQUEUED) +- ovl_aio_cleanup_handler(aio_req); +- } +-out: +- revert_creds(old_cred); +-out_fdput: ++ ret = backing_file_write_iter(real.file, iter, iocb, ifl, &ctx); + fdput(real); + + out_unlock: +@@ -775,19 +627,3 @@ const struct file_operations ovl_file_operations = { + .copy_file_range = ovl_copy_file_range, + .remap_file_range = ovl_remap_file_range, + }; +- +-int __init ovl_aio_request_cache_init(void) +-{ +- ovl_aio_request_cachep = kmem_cache_create("ovl_aio_req", +- sizeof(struct ovl_aio_req), +- 0, SLAB_HWCACHE_ALIGN, NULL); +- if (!ovl_aio_request_cachep) +- return -ENOMEM; +- +- return 0; +-} +- +-void ovl_aio_request_cache_destroy(void) +-{ +- kmem_cache_destroy(ovl_aio_request_cachep); +-} +diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h +index a4b94a74b854..8b31bc3ee7a0 100644 +--- a/fs/overlayfs/overlayfs.h ++++ b/fs/overlayfs/overlayfs.h +@@ -417,6 +417,12 @@ int ovl_want_write(struct dentry *dentry); + void ovl_drop_write(struct dentry *dentry); + struct dentry *ovl_workdir(struct dentry *dentry); + const struct cred *ovl_override_creds(struct super_block *sb); ++ ++static inline const struct cred *ovl_creds(struct super_block *sb) ++{ ++ return OVL_FS(sb)->creator_cred; ++} ++ + int ovl_can_decode_fh(struct super_block *sb); + struct dentry *ovl_indexdir(struct super_block *sb); + bool ovl_index_all(struct super_block *sb); +@@ -835,8 +841,6 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir, + + /* file.c */ + extern const struct file_operations ovl_file_operations; +-int __init ovl_aio_request_cache_init(void); +-void ovl_aio_request_cache_destroy(void); + int ovl_real_fileattr_get(const struct path *realpath, struct fileattr *fa); + int ovl_real_fileattr_set(const struct path *realpath, struct fileattr *fa); + int ovl_fileattr_get(struct dentry *dentry, struct fileattr *fa); +diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c +index 0779c8290ec4..37387fd98e34 100644 +--- a/fs/overlayfs/super.c ++++ b/fs/overlayfs/super.c +@@ -1522,14 +1522,10 @@ static int __init ovl_init(void) + if (ovl_inode_cachep == NULL) + return -ENOMEM; + +- err = ovl_aio_request_cache_init(); +- if (!err) { +- err = register_filesystem(&ovl_fs_type); +- if (!err) +- return 0; ++ err = register_filesystem(&ovl_fs_type); ++ if (!err) ++ return 0; + +- ovl_aio_request_cache_destroy(); +- } + kmem_cache_destroy(ovl_inode_cachep); + + return err; +@@ -1545,7 +1541,6 @@ static void __exit ovl_exit(void) + */ + rcu_barrier(); + kmem_cache_destroy(ovl_inode_cachep); +- ovl_aio_request_cache_destroy(); + } + + module_init(ovl_init); +diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h +index 55c9e804f780..0648d548a418 100644 +--- a/include/linux/backing-file.h ++++ b/include/linux/backing-file.h +@@ -9,9 +9,24 @@ + #define _LINUX_BACKING_FILE_H + + #include ++#include ++#include ++ ++struct backing_file_ctx { ++ const struct cred *cred; ++ struct file *user_file; ++ void (*accessed)(struct file *); ++ void (*end_write)(struct file *); ++}; + + struct file *backing_file_open(const struct path *user_path, int flags, + const struct path *real_path, + const struct cred *cred); ++ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter, ++ struct kiocb *iocb, int flags, ++ struct backing_file_ctx *ctx); ++ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, ++ struct kiocb *iocb, int flags, ++ struct backing_file_ctx *ctx); + + #endif /* _LINUX_BACKING_FILE_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1334-fs-factor-out-backing-file-splice-read-write-helpers.patch b/SOURCES/1334-fs-factor-out-backing-file-splice-read-write-helpers.patch new file mode 100644 index 000000000..ed78f51ab --- /dev/null +++ b/SOURCES/1334-fs-factor-out-backing-file-splice-read-write-helpers.patch @@ -0,0 +1,189 @@ +From 5c28f6ec073077ce1239652c7a74555904eb0577 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:51:41 +0200 +Subject: [PATCH] fs: factor out backing_file_splice_{read,write}() helpers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 9b7e9e2f5d5c3d079ec46bc71b114012e362ea6e +Author: Amir Goldstein +Date: Fri Oct 13 12:13:12 2023 +0300 + + fs: factor out backing_file_splice_{read,write}() helpers + + There is not much in those helpers, but it makes sense to have them + logically next to the backing_file_{read,write}_iter() helpers as they + may grow more common logic in the future. + + Signed-off-by: Amir Goldstein + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/backing-file.c b/fs/backing-file.c +index 6d915a45e288..5cc411566ce0 100644 +--- a/fs/backing-file.c ++++ b/fs/backing-file.c +@@ -10,6 +10,7 @@ + + #include + #include ++#include + + #include "internal.h" + +@@ -248,6 +249,56 @@ ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, + } + EXPORT_SYMBOL_GPL(backing_file_write_iter); + ++ssize_t backing_file_splice_read(struct file *in, loff_t *ppos, ++ struct pipe_inode_info *pipe, size_t len, ++ unsigned int flags, ++ struct backing_file_ctx *ctx) ++{ ++ const struct cred *old_cred; ++ ssize_t ret; ++ ++ if (WARN_ON_ONCE(!(in->f_mode & FMODE_BACKING))) ++ return -EIO; ++ ++ old_cred = override_creds(ctx->cred); ++ ret = vfs_splice_read(in, ppos, pipe, len, flags); ++ revert_creds(old_cred); ++ ++ if (ctx->accessed) ++ ctx->accessed(ctx->user_file); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(backing_file_splice_read); ++ ++ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, ++ struct file *out, loff_t *ppos, size_t len, ++ unsigned int flags, ++ struct backing_file_ctx *ctx) ++{ ++ const struct cred *old_cred; ++ ssize_t ret; ++ ++ if (WARN_ON_ONCE(!(out->f_mode & FMODE_BACKING))) ++ return -EIO; ++ ++ ret = file_remove_privs(ctx->user_file); ++ if (ret) ++ return ret; ++ ++ old_cred = override_creds(ctx->cred); ++ file_start_write(out); ++ ret = iter_file_splice_write(pipe, out, ppos, len, flags); ++ file_end_write(out); ++ revert_creds(old_cred); ++ ++ if (ctx->end_write) ++ ctx->end_write(ctx->user_file); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(backing_file_splice_write); ++ + static int __init backing_aio_init(void) + { + backing_aio_cachep = kmem_cache_create("backing_aio", +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index 3eee9f45971e..165a92b25c0a 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -9,7 +9,6 @@ + #include + #include + #include +-#include + #include + #include + #include +@@ -328,20 +327,21 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) + { +- const struct cred *old_cred; + struct fd real; + ssize_t ret; ++ struct backing_file_ctx ctx = { ++ .cred = ovl_creds(file_inode(in)->i_sb), ++ .user_file = in, ++ .accessed = ovl_file_accessed, ++ }; + + ret = ovl_real_fdget(in, &real); + if (ret) + return ret; + +- old_cred = ovl_override_creds(file_inode(in)->i_sb); +- ret = vfs_splice_read(real.file, ppos, pipe, len, flags); +- revert_creds(old_cred); +- ovl_file_accessed(in); +- ++ ret = backing_file_splice_read(real.file, ppos, pipe, len, flags, &ctx); + fdput(real); ++ + return ret; + } + +@@ -357,30 +357,23 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) + { + struct fd real; +- const struct cred *old_cred; + struct inode *inode = file_inode(out); + ssize_t ret; ++ struct backing_file_ctx ctx = { ++ .cred = ovl_creds(inode->i_sb), ++ .user_file = out, ++ .end_write = ovl_file_modified, ++ }; + + inode_lock(inode); + /* Update mode */ + ovl_copyattr(inode); +- ret = file_remove_privs(out); +- if (ret) +- goto out_unlock; + + ret = ovl_real_fdget(out, &real); + if (ret) + goto out_unlock; + +- old_cred = ovl_override_creds(inode->i_sb); +- file_start_write(real.file); +- +- ret = iter_file_splice_write(pipe, real.file, ppos, len, flags); +- +- file_end_write(real.file); +- /* Update size */ +- ovl_file_modified(out); +- revert_creds(old_cred); ++ ret = backing_file_splice_write(pipe, real.file, ppos, len, flags, &ctx); + fdput(real); + + out_unlock: +diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h +index 0648d548a418..0546d5b1c9f5 100644 +--- a/include/linux/backing-file.h ++++ b/include/linux/backing-file.h +@@ -28,5 +28,13 @@ ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter, + ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, + struct kiocb *iocb, int flags, + struct backing_file_ctx *ctx); ++ssize_t backing_file_splice_read(struct file *in, loff_t *ppos, ++ struct pipe_inode_info *pipe, size_t len, ++ unsigned int flags, ++ struct backing_file_ctx *ctx); ++ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, ++ struct file *out, loff_t *ppos, size_t len, ++ unsigned int flags, ++ struct backing_file_ctx *ctx); + + #endif /* _LINUX_BACKING_FILE_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1335-fs-factor-out-backing-file-mmap-helper.patch b/SOURCES/1335-fs-factor-out-backing-file-mmap-helper.patch new file mode 100644 index 000000000..854df3e6f --- /dev/null +++ b/SOURCES/1335-fs-factor-out-backing-file-mmap-helper.patch @@ -0,0 +1,124 @@ +From 5176a4370a2e9f1ebe16e502bf93897820461b7f Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 10:51:45 +0200 +Subject: [PATCH] fs: factor out backing_file_mmap() helper + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit f567377e406c032fff0799bde4fdf4a977529b84 +Author: Amir Goldstein +Date: Fri Oct 13 12:49:37 2023 +0300 + + fs: factor out backing_file_mmap() helper + + Assert that the file object is allocated in a backing_file container + so that file_user_path() could be used to display the user path and + not the backing file's path in /proc//maps. + + Signed-off-by: Amir Goldstein + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/backing-file.c b/fs/backing-file.c +index 5cc411566ce0..6ea14b6214c1 100644 +--- a/fs/backing-file.c ++++ b/fs/backing-file.c +@@ -11,6 +11,7 @@ + #include + #include + #include ++#include + + #include "internal.h" + +@@ -299,6 +300,32 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, + } + EXPORT_SYMBOL_GPL(backing_file_splice_write); + ++int backing_file_mmap(struct file *file, struct vm_area_struct *vma, ++ struct backing_file_ctx *ctx) ++{ ++ const struct cred *old_cred; ++ int ret; ++ ++ if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING)) || ++ WARN_ON_ONCE(ctx->user_file != vma->vm_file)) ++ return -EIO; ++ ++ if (!file->f_op->mmap) ++ return -ENODEV; ++ ++ vma_set_file(vma, file); ++ ++ old_cred = override_creds(ctx->cred); ++ ret = call_mmap(vma->vm_file, vma); ++ revert_creds(old_cred); ++ ++ if (ctx->accessed) ++ ctx->accessed(ctx->user_file); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(backing_file_mmap); ++ + static int __init backing_aio_init(void) + { + backing_aio_cachep = kmem_cache_create("backing_aio", +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index 165a92b25c0a..d85385f37ba6 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -10,7 +10,6 @@ + #include + #include + #include +-#include + #include + #include + #include "overlayfs.h" +@@ -411,23 +410,13 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) + static int ovl_mmap(struct file *file, struct vm_area_struct *vma) + { + struct file *realfile = file->private_data; +- const struct cred *old_cred; +- int ret; +- +- if (!realfile->f_op->mmap) +- return -ENODEV; +- +- if (WARN_ON(file != vma->vm_file)) +- return -EIO; +- +- vma_set_file(vma, realfile); +- +- old_cred = ovl_override_creds(file_inode(file)->i_sb); +- ret = call_mmap(vma->vm_file, vma); +- revert_creds(old_cred); +- ovl_file_accessed(file); ++ struct backing_file_ctx ctx = { ++ .cred = ovl_creds(file_inode(file)->i_sb), ++ .user_file = file, ++ .accessed = ovl_file_accessed, ++ }; + +- return ret; ++ return backing_file_mmap(realfile, vma, &ctx); + } + + static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len) +diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h +index 0546d5b1c9f5..3f1fe1774f1b 100644 +--- a/include/linux/backing-file.h ++++ b/include/linux/backing-file.h +@@ -36,5 +36,7 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, + struct file *out, loff_t *ppos, size_t len, + unsigned int flags, + struct backing_file_ctx *ctx); ++int backing_file_mmap(struct file *file, struct vm_area_struct *vma, ++ struct backing_file_ctx *ctx); + + #endif /* _LINUX_BACKING_FILE_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1336-lsm-add-helper-for-blob-allocations.patch b/SOURCES/1336-lsm-add-helper-for-blob-allocations.patch new file mode 100644 index 000000000..5a371cdd2 --- /dev/null +++ b/SOURCES/1336-lsm-add-helper-for-blob-allocations.patch @@ -0,0 +1,158 @@ +From f034201051121a87f98c1651368c3883f633182f Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 11:47:18 +0200 +Subject: [PATCH] lsm: add helper for blob allocations + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - security/security.c: context fuzz + dropped hunks changing functions + not present downstream + +commit 09001284eebfc1b684e81d1db0f006787d35f3e1 +Author: Casey Schaufler +Date: Wed Jul 10 14:32:27 2024 -0700 + + lsm: add helper for blob allocations + + Create a helper function lsm_blob_alloc() for general use in the hook + specific functions that allocate LSM blobs. Change the hook specific + functions to use this helper. This reduces the code size by a small + amount and will make adding new instances of infrastructure managed + security blobs easier. + + Signed-off-by: Casey Schaufler + Reviewed-by: John Johansen + [PM: subject tweak] + Signed-off-by: Paul Moore + +Signed-off-by: Ondrej Mosnacek + +diff --git a/security/security.c b/security/security.c +index b59af216324f..1e63f23a504a 100644 +--- a/security/security.c ++++ b/security/security.c +@@ -645,27 +645,42 @@ int unregister_blocking_lsm_notifier(struct notifier_block *nb) + EXPORT_SYMBOL(unregister_blocking_lsm_notifier); + + /** +- * lsm_cred_alloc - allocate a composite cred blob +- * @cred: the cred that needs a blob ++ * lsm_blob_alloc - allocate a composite blob ++ * @dest: the destination for the blob ++ * @size: the size of the blob + * @gfp: allocation type + * +- * Allocate the cred blob for all the modules ++ * Allocate a blob for all the modules + * + * Returns 0, or -ENOMEM if memory can't be allocated. + */ +-static int lsm_cred_alloc(struct cred *cred, gfp_t gfp) ++static int lsm_blob_alloc(void **dest, size_t size, gfp_t gfp) + { +- if (blob_sizes.lbs_cred == 0) { +- cred->security = NULL; ++ if (size == 0) { ++ *dest = NULL; + return 0; + } + +- cred->security = kzalloc(blob_sizes.lbs_cred, gfp); +- if (cred->security == NULL) ++ *dest = kzalloc(size, gfp); ++ if (*dest == NULL) + return -ENOMEM; + return 0; + } + ++/** ++ * lsm_cred_alloc - allocate a composite cred blob ++ * @cred: the cred that needs a blob ++ * @gfp: allocation type ++ * ++ * Allocate the cred blob for all the modules ++ * ++ * Returns 0, or -ENOMEM if memory can't be allocated. ++ */ ++static int lsm_cred_alloc(struct cred *cred, gfp_t gfp) ++{ ++ return lsm_blob_alloc(&cred->security, blob_sizes.lbs_cred, gfp); ++} ++ + /** + * lsm_early_cred - during initialization allocate a composite cred blob + * @cred: the cred that needs a blob +@@ -732,15 +747,7 @@ static int lsm_inode_alloc(struct inode *inode) + */ + static int lsm_task_alloc(struct task_struct *task) + { +- if (blob_sizes.lbs_task == 0) { +- task->security = NULL; +- return 0; +- } +- +- task->security = kzalloc(blob_sizes.lbs_task, GFP_KERNEL); +- if (task->security == NULL) +- return -ENOMEM; +- return 0; ++ return lsm_blob_alloc(&task->security, blob_sizes.lbs_task, GFP_KERNEL); + } + + /** +@@ -753,15 +760,7 @@ static int lsm_task_alloc(struct task_struct *task) + */ + static int lsm_ipc_alloc(struct kern_ipc_perm *kip) + { +- if (blob_sizes.lbs_ipc == 0) { +- kip->security = NULL; +- return 0; +- } +- +- kip->security = kzalloc(blob_sizes.lbs_ipc, GFP_KERNEL); +- if (kip->security == NULL) +- return -ENOMEM; +- return 0; ++ return lsm_blob_alloc(&kip->security, blob_sizes.lbs_ipc, GFP_KERNEL); + } + + /** +@@ -774,15 +773,8 @@ static int lsm_ipc_alloc(struct kern_ipc_perm *kip) + */ + static int lsm_msg_msg_alloc(struct msg_msg *mp) + { +- if (blob_sizes.lbs_msg_msg == 0) { +- mp->security = NULL; +- return 0; +- } +- +- mp->security = kzalloc(blob_sizes.lbs_msg_msg, GFP_KERNEL); +- if (mp->security == NULL) +- return -ENOMEM; +- return 0; ++ return lsm_blob_alloc(&mp->security, blob_sizes.lbs_msg_msg, ++ GFP_KERNEL); + } + + /** +@@ -809,15 +801,8 @@ static void __init lsm_early_task(struct task_struct *task) + */ + static int lsm_superblock_alloc(struct super_block *sb) + { +- if (blob_sizes.lbs_superblock == 0) { +- sb->s_security = NULL; +- return 0; +- } +- +- sb->s_security = kzalloc(blob_sizes.lbs_superblock, GFP_KERNEL); +- if (sb->s_security == NULL) +- return -ENOMEM; +- return 0; ++ return lsm_blob_alloc(&sb->s_security, blob_sizes.lbs_superblock, ++ GFP_KERNEL); + } + + /* +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1337-ovl-fix-nested-backing-file-paths.patch b/SOURCES/1337-ovl-fix-nested-backing-file-paths.patch new file mode 100644 index 000000000..55ac28946 --- /dev/null +++ b/SOURCES/1337-ovl-fix-nested-backing-file-paths.patch @@ -0,0 +1,74 @@ +From c884ff1e458df0e5d801f19b4e847a4673d7471b Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 11:48:02 +0200 +Subject: [PATCH] ovl: Fix nested backing file paths +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 924577e4f6ca473de1528953a0e13505fae61d7b +Author: André Almeida +Date: Tue Apr 29 15:38:50 2025 -0300 + + ovl: Fix nested backing file paths + + When the lowerdir of an overlayfs is a merged directory of another + overlayfs, ovl_open_realfile() will fail to open the real file and point + to a lower dentry copy, without the proper parent path. After this, + d_path() will then display the path incorrectly as if the file is placed + in the root directory. + + This bug can be triggered with the following setup: + + mkdir -p ovl-A/lower ovl-A/upper ovl-A/merge ovl-A/work + mkdir -p ovl-B/upper ovl-B/merge ovl-B/work + + cp /bin/cat ovl-A/lower/ + + mount -t overlay overlay -o \ + lowerdir=ovl-A/lower,upperdir=ovl-A/upper,workdir=ovl-A/work \ + ovl-A/merge + + mount -t overlay overlay -o \ + lowerdir=ovl-A/merge,upperdir=ovl-B/upper,workdir=ovl-B/work \ + ovl-B/merge + + ovl-A/merge/cat /proc/self/maps | grep --color cat + ovl-B/merge/cat /proc/self/maps | grep --color cat + + The first cat will correctly show `/ovl-A/merge/cat`, while the second + one shows just `/cat`. + + To fix that, uses file_user_path() inside of backing_file_open() to get + the correct file path for the dentry. + + Co-developed-by: John Schoenick + Signed-off-by: John Schoenick + Signed-off-by: André Almeida + Fixes: def3ae83da02 ("fs: store real path instead of fake path in backing file f_path") + Cc: # v6.7 + Signed-off-by: Miklos Szeredi + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index d85385f37ba6..3bf52eace698 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -51,8 +51,8 @@ static struct file *ovl_open_realfile(const struct file *file, + if (!inode_owner_or_capable(real_idmap, realinode)) + flags &= ~O_NOATIME; + +- realfile = backing_file_open(&file->f_path, flags, realpath, +- current_cred()); ++ realfile = backing_file_open(file_user_path((struct file *) file), ++ flags, realpath, current_cred()); + } + revert_creds(old_cred); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1338-fs-constify-file-ptr-in-backing-file-accessor-helpers.patch b/SOURCES/1338-fs-constify-file-ptr-in-backing-file-accessor-helpers.patch new file mode 100644 index 000000000..13c2e42b9 --- /dev/null +++ b/SOURCES/1338-fs-constify-file-ptr-in-backing-file-accessor-helpers.patch @@ -0,0 +1,105 @@ +From 3869325e0bf98aba624155c8355abe6e5db6e674 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 11:18:45 +0200 +Subject: [PATCH] fs: constify file ptr in backing_file accessor helpers + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - context fuzz and dropped changes to functions not present downstream + +commit 4e301d858af17ae2ce56886296e5458c5a08219a +Author: Amir Goldstein +Date: Sat Jun 7 13:53:03 2025 +0200 + + fs: constify file ptr in backing_file accessor helpers + + Add internal helper backing_file_set_user_path() for the only + two cases that need to modify backing_file fields. + + Signed-off-by: Amir Goldstein + Link: https://lore.kernel.org/20250607115304.2521155-2-amir73il@gmail.com + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/backing-file.c b/fs/backing-file.c +index 6ea14b6214c1..840b45366557 100644 +--- a/fs/backing-file.c ++++ b/fs/backing-file.c +@@ -41,7 +41,7 @@ struct file *backing_file_open(const struct path *user_path, int flags, + return f; + + path_get(user_path); +- *backing_file_user_path(f) = *user_path; ++ backing_file_set_user_path(f, user_path); + error = vfs_open(real_path, f); + if (error) { + fput(f); +diff --git a/fs/file_table.c b/fs/file_table.c +index e5c7b9705109..fa8f4d34efa5 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -50,17 +50,20 @@ struct backing_file { + struct path user_path; + }; + +-static inline struct backing_file *backing_file(struct file *f) +-{ +- return container_of(f, struct backing_file, file); +-} ++#define backing_file(f) container_of(f, struct backing_file, file) + +-struct path *backing_file_user_path(struct file *f) ++struct path *backing_file_user_path(const struct file *f) + { + return &backing_file(f)->user_path; + } + EXPORT_SYMBOL_GPL(backing_file_user_path); + ++void backing_file_set_user_path(struct file *f, const struct path *path) ++{ ++ backing_file(f)->user_path = *path; ++} ++EXPORT_SYMBOL_GPL(backing_file_set_user_path); ++ + static void file_free_rcu(struct rcu_head *head) + { + struct file *f = container_of(head, struct file, f_u.fu_rcuhead); +diff --git a/fs/internal.h b/fs/internal.h +index bd0934d0521b..78fcdad80e53 100644 +--- a/fs/internal.h ++++ b/fs/internal.h +@@ -96,6 +96,7 @@ extern void chroot_fs_refs(const struct path *, const struct path *); + struct file *alloc_empty_file(int flags, const struct cred *cred); + struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred); + struct file *alloc_empty_backing_file(int flags, const struct cred *cred); ++void backing_file_set_user_path(struct file *f, const struct path *path); + + static inline void file_put_write_access(struct file *file) + { +diff --git a/include/linux/fs.h b/include/linux/fs.h +index 5f3ca25c77e5..7ed9232f579d 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -2586,7 +2586,7 @@ struct file *dentry_open(const struct path *path, int flags, + const struct cred *creds); + struct file *dentry_create(const struct path *path, int flags, umode_t mode, + const struct cred *cred); +-struct path *backing_file_user_path(struct file *f); ++struct path *backing_file_user_path(const struct file *f); + + /* + * file_user_path - get the path to display for memory mapped file +@@ -2597,7 +2597,7 @@ struct path *backing_file_user_path(struct file *f); + * /proc//maps), this helper should be used to get the path to display + * to the user, which is the path of the fd that user has requested to map. + */ +-static inline const struct path *file_user_path(struct file *f) ++static inline const struct path *file_user_path(const struct file *f) + { + if (unlikely(f->f_mode & FMODE_BACKING)) + return backing_file_user_path(f); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1339-ovl-remove-unneeded-non-const-conversion.patch b/SOURCES/1339-ovl-remove-unneeded-non-const-conversion.patch new file mode 100644 index 000000000..be914b9be --- /dev/null +++ b/SOURCES/1339-ovl-remove-unneeded-non-const-conversion.patch @@ -0,0 +1,38 @@ +From c3f8db29db9e7b9bb68b107e932315f147046ac5 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 11:48:18 +0200 +Subject: [PATCH] ovl: remove unneeded non-const conversion + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 + +commit 3ec2529eca6f175f4e3e87c4534010e044839b38 +Author: Amir Goldstein +Date: Sat Jun 7 13:53:04 2025 +0200 + + ovl: remove unneeded non-const conversion + + file_user_path() now takes a const file ptr. + + Signed-off-by: Amir Goldstein + Link: https://lore.kernel.org/20250607115304.2521155-3-amir73il@gmail.com + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index 3bf52eace698..c8e45f503db3 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -51,7 +51,7 @@ static struct file *ovl_open_realfile(const struct file *file, + if (!inode_owner_or_capable(real_idmap, realinode)) + flags &= ~O_NOATIME; + +- realfile = backing_file_open(file_user_path((struct file *) file), ++ realfile = backing_file_open(file_user_path(file), + flags, realpath, current_cred()); + } + revert_creds(old_cred); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1340-ovl-remove-redundant-iocb-dio-caller-comp-clearing.patch b/SOURCES/1340-ovl-remove-redundant-iocb-dio-caller-comp-clearing.patch new file mode 100644 index 000000000..c1d20f1b9 --- /dev/null +++ b/SOURCES/1340-ovl-remove-redundant-iocb-dio-caller-comp-clearing.patch @@ -0,0 +1,48 @@ +From 99a9c81094c622efebec7695e551baffdac3f89b Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 20:29:45 +0200 +Subject: [PATCH] ovl: remove redundant IOCB_DIO_CALLER_COMP clearing + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +Conflicts: + - just context fuzz + +commit 7933a585d70ee496fa341b50b8b0a95b131867ff +Author: Seong-Gwang Heo +Date: Thu Oct 9 13:41:48 2025 +0800 + + ovl: remove redundant IOCB_DIO_CALLER_COMP clearing + + The backing_file_write_iter() function, which is called + immediately after this code, already contains identical + logic to clear the IOCB_DIO_CALLER_COMP flag along with + the same explanatory comment. There is no need to duplicate + this operation in the overlayfs code. + + Signed-off-by: Seong-Gwang Heo + Fixes: a6293b3e285c ("fs: factor out backing_file_{read,write}_iter() helpers") + Acked-by: Miklos Szeredi + Reviewed-by: Amir Goldstein + Signed-off-by: Christian Brauner + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index c8e45f503db3..3d8539909f74 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -308,11 +308,6 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) + if (!ovl_should_sync(OVL_FS(inode->i_sb))) + ifl &= ~(IOCB_DSYNC | IOCB_SYNC); + +- /* +- * Overlayfs doesn't support deferred completions, don't copy +- * this property in case it is set by the issuer. +- */ +- ifl &= ~IOCB_DIO_CALLER_COMP; + ret = backing_file_write_iter(real.file, iter, iocb, ifl, &ctx); + fdput(real); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1341-perf-core-fix-mmap-event-path-names-with-backing-files.patch b/SOURCES/1341-perf-core-fix-mmap-event-path-names-with-backing-files.patch new file mode 100644 index 000000000..3e12eb0e3 --- /dev/null +++ b/SOURCES/1341-perf-core-fix-mmap-event-path-names-with-backing-files.patch @@ -0,0 +1,89 @@ +From 627a3d264cb85311131aef67a0ff2397999d5394 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 20:29:57 +0200 +Subject: [PATCH] perf/core: Fix MMAP event path names with backing files + +JIRA: https://issues.redhat.com/browse/RHEL-179443 + +commit 8818f507a9391019a3ec7c57b1a32e4b386e48a5 +Author: Adrian Hunter +Date: Mon Oct 13 10:22:43 2025 +0300 + + perf/core: Fix MMAP event path names with backing files + + Some file systems like FUSE-based ones or overlayfs may record the backing + file in struct vm_area_struct vm_file, instead of the user file that the + user mmapped. + + Since commit def3ae83da02f ("fs: store real path instead of fake path in + backing file f_path"), file_path() no longer returns the user file path + when applied to a backing file. There is an existing helper + file_user_path() for that situation. + + Use file_user_path() instead of file_path() to get the path for MMAP + and MMAP2 events. + + Example: + + Setup: + + # cd /root + # mkdir test ; cd test ; mkdir lower upper work merged + # cp `which cat` lower + # mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged + # perf record -e intel_pt//u -- /root/test/merged/cat /proc/self/maps + ... + 55b0ba399000-55b0ba434000 r-xp 00018000 00:1a 3419 /root/test/merged/cat + ... + [ perf record: Woken up 1 times to write data ] + [ perf record: Captured and wrote 0.060 MB perf.data ] + # + + Before: + + File name is wrong (/cat), so decoding fails: + + # perf script --no-itrace --show-mmap-events + cat 367 [016] 100.491492: PERF_RECORD_MMAP2 367/367: [0x55b0ba399000(0x9b000) @ 0x18000 00:02 3419 489959280]: r-xp /cat + ... + # perf script --itrace=e | wc -l + Warning: + 19 instruction trace errors + 19 + # + + After: + + File name is correct (/root/test/merged/cat), so decoding is ok: + + # perf script --no-itrace --show-mmap-events + cat 364 [016] 72.153006: PERF_RECORD_MMAP2 364/364: [0x55ce4003d000(0x9b000) @ 0x18000 00:02 3419 3132534314]: r-xp /root/test/merged/cat + # perf script --itrace=e + # perf script --itrace=e | wc -l + 0 + # + + Fixes: def3ae83da02f ("fs: store real path instead of fake path in backing file f_path") + Signed-off-by: Adrian Hunter + Signed-off-by: Peter Zijlstra (Intel) + Acked-by: Amir Goldstein + Cc: stable@vger.kernel.org + +Signed-off-by: Ondrej Mosnacek + +diff --git a/kernel/events/core.c b/kernel/events/core.c +index 0d3bd850fee7..5065087dd236 100644 +--- a/kernel/events/core.c ++++ b/kernel/events/core.c +@@ -8953,7 +8953,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) + * need to add enough zero bytes after the string to handle + * the 64bit alignment we do later. + */ +- name = file_path(file, buf, PATH_MAX - sizeof(u64)); ++ name = d_path(file_user_path(file), buf, PATH_MAX - sizeof(u64)); + if (IS_ERR(name)) { + name = "//toolong"; + goto cpy_name; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1342-fs-prepare-for-adding-lsm-blob-to-backing-file.patch b/SOURCES/1342-fs-prepare-for-adding-lsm-blob-to-backing-file.patch new file mode 100644 index 000000000..76fc722fa --- /dev/null +++ b/SOURCES/1342-fs-prepare-for-adding-lsm-blob-to-backing-file.patch @@ -0,0 +1,85 @@ +From 644c8e296eb8628ed6ccaff5609bce2c4b591c8a Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 11:00:17 +0200 +Subject: [PATCH] fs: prepare for adding LSM blob to backing_file + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - fs/file_table.c: adjusted the body and call site of + backing_file_free() to downstream state (sme intermediate commits + not backported) + +commit 880bd496ec72a6dcb00cb70c430ef752ba242ae7 +Author: Amir Goldstein +Date: Mon Mar 30 10:27:51 2026 +0200 + + fs: prepare for adding LSM blob to backing_file + + In preparation to adding LSM blob to backing_file struct, factor out + helpers init_backing_file() and backing_file_free(). + + Cc: stable@vger.kernel.org + Cc: linux-fsdevel@vger.kernel.org + Cc: linux-unionfs@vger.kernel.org + Cc: linux-erofs@lists.ozlabs.org + Signed-off-by: Amir Goldstein + Reviewed-by: Serge Hallyn + [PM: use the term "LSM blob", fix comment style to match file] + Signed-off-by: Paul Moore + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/file_table.c b/fs/file_table.c +index fa8f4d34efa5..34e3863c95b0 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -75,11 +75,16 @@ static void file_free_rcu(struct rcu_head *head) + kmem_cache_free(filp_cachep, f); + } + ++static inline void backing_file_free(struct backing_file *ff) ++{ ++ path_put(&ff->user_path); ++} ++ + static inline void file_free(struct file *f) + { + security_file_free(f); + if (unlikely(f->f_mode & FMODE_BACKING)) +- path_put(backing_file_user_path(f)); ++ backing_file_free(backing_file(f)); + if (likely(!(f->f_mode & FMODE_NOACCOUNT))) + percpu_counter_dec(&nr_files); + call_rcu(&f->f_u.fu_rcuhead, file_free_rcu); +@@ -255,6 +260,12 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred) + return f; + } + ++static int init_backing_file(struct backing_file *ff) ++{ ++ memset(&ff->user_path, 0, sizeof(ff->user_path)); ++ return 0; ++} ++ + /* + * Variant of alloc_empty_file() that allocates a backing_file container + * and doesn't check and modify nr_files. +@@ -277,7 +288,14 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred) + return ERR_PTR(error); + } + ++ /* The f_mode flags must be set before fput(). */ + ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT; ++ error = init_backing_file(ff); ++ if (unlikely(error)) { ++ fput(&ff->file); ++ return ERR_PTR(error); ++ } ++ + return &ff->file; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1343-lsm-add-backing-file-lsm-hooks.patch b/SOURCES/1343-lsm-add-backing-file-lsm-hooks.patch new file mode 100644 index 000000000..d0f172424 --- /dev/null +++ b/SOURCES/1343-lsm-add-backing-file-lsm-hooks.patch @@ -0,0 +1,535 @@ +From 807cf6dc47e9871fa1f77369e69fe8666cb0a1d4 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 11:54:46 +0200 +Subject: [PATCH] lsm: add backing_file LSM hooks + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - fs/backing-file.c: + - adjust backing_file_mmap() to downstream state (missing scoped + guards, ctx->user_file instead of missing user_file variable) + - backing_tmpfile_open() not present downstream + - fs/erofs/ishare.c: hunk dropped, file not present downstream + - fs/file_table.c: context fuzz + put security_backing_file_free() in the + right place + - fs/fuse/passthrough.c: hunk dropped, file not present downstream + - fs/overlayfs/dir.c: hunk dropped, ovl_create_tmpfile() not present downstream + - fs/overlayfs/file.c: adjust to different indentation + - include/linux/backing-file.h: backing_tmpfile_open() not present downstream + - include/linux/lsm_hooks.h: adjust to downstream's definition of struct lsm_blob_sizes + - security/lsm.h: hunk dropped, file not present downstream + - security/lsm_init.c: hunk dropped, file not present downstream + - security/security.c: misc conflicts, port changes to stuff that was + already in lsm.h/lsm_init.c upstream + +commit 6af36aeb147a06dea47c49859cd6ca5659aeb987 +Author: Paul Moore +Date: Fri Dec 19 13:18:22 2025 -0500 + + lsm: add backing_file LSM hooks + + Stacked filesystems such as overlayfs do not currently provide the + necessary mechanisms for LSMs to properly enforce access controls on the + mmap() and mprotect() operations. In order to resolve this gap, a LSM + security blob is being added to the backing_file struct and the following + new LSM hooks are being created: + + security_backing_file_alloc() + security_backing_file_free() + security_mmap_backing_file() + + The first two hooks are to manage the lifecycle of the LSM security blob + in the backing_file struct, while the third provides a new mmap() access + control point for the underlying backing file. It is also expected that + LSMs will likely want to update their security_file_mprotect() callback + to address issues with their mprotect() controls, but that does not + require a change to the security_file_mprotect() LSM hook. + + There are a three other small changes to support these new LSM hooks: + * Pass the user file associated with a backing file down to + alloc_empty_backing_file() so it can be included in the + security_backing_file_alloc() hook. + * Add getter and setter functions for the backing_file struct LSM blob + as the backing_file struct remains private to fs/file_table.c. + * Constify the file struct field in the LSM common_audit_data struct to + better support LSMs that need to pass a const file struct pointer into + the common LSM audit code. + + Thanks to Arnd Bergmann for identifying the missing EXPORT_SYMBOL_GPL() + and supplying a fixup. + + Cc: stable@vger.kernel.org + Cc: linux-fsdevel@vger.kernel.org + Cc: linux-unionfs@vger.kernel.org + Cc: linux-erofs@lists.ozlabs.org + Reviewed-by: Amir Goldstein + Reviewed-by: Serge Hallyn + Reviewed-by: Christian Brauner + Signed-off-by: Paul Moore + +Signed-off-by: Ondrej Mosnacek + +diff --git a/fs/backing-file.c b/fs/backing-file.c +index 840b45366557..e6f4fe27b58b 100644 +--- a/fs/backing-file.c ++++ b/fs/backing-file.c +@@ -12,6 +12,7 @@ + #include + #include + #include ++#include + + #include "internal.h" + +@@ -29,14 +30,15 @@ + * returned file into a container structure that also stores the stacked + * file's path, which can be retrieved using backing_file_user_path(). + */ +-struct file *backing_file_open(const struct path *user_path, int flags, ++struct file *backing_file_open(const struct file *user_file, int flags, + const struct path *real_path, + const struct cred *cred) + { ++ const struct path *user_path = &user_file->f_path; + struct file *f; + int error; + +- f = alloc_empty_backing_file(flags, cred); ++ f = alloc_empty_backing_file(flags, cred, user_file); + if (IS_ERR(f)) + return f; + +@@ -316,6 +318,11 @@ int backing_file_mmap(struct file *file, struct vm_area_struct *vma, + vma_set_file(vma, file); + + old_cred = override_creds(ctx->cred); ++ ret = security_mmap_backing_file(vma, file, ctx->user_file); ++ if (ret) { ++ revert_creds(old_cred); ++ return ret; ++ } + ret = call_mmap(vma->vm_file, vma); + revert_creds(old_cred); + +diff --git a/fs/file_table.c b/fs/file_table.c +index 34e3863c95b0..fc04eb48d550 100644 +--- a/fs/file_table.c ++++ b/fs/file_table.c +@@ -48,6 +48,9 @@ static struct percpu_counter nr_files __cacheline_aligned_in_smp; + struct backing_file { + struct file file; + struct path user_path; ++#ifdef CONFIG_SECURITY ++ void *security; ++#endif + }; + + #define backing_file(f) container_of(f, struct backing_file, file) +@@ -64,6 +67,18 @@ void backing_file_set_user_path(struct file *f, const struct path *path) + } + EXPORT_SYMBOL_GPL(backing_file_set_user_path); + ++#ifdef CONFIG_SECURITY ++void *backing_file_security(const struct file *f) ++{ ++ return backing_file(f)->security; ++} ++ ++void backing_file_set_security(struct file *f, void *security) ++{ ++ backing_file(f)->security = security; ++} ++#endif /* CONFIG_SECURITY */ ++ + static void file_free_rcu(struct rcu_head *head) + { + struct file *f = container_of(head, struct file, f_u.fu_rcuhead); +@@ -77,6 +92,7 @@ static void file_free_rcu(struct rcu_head *head) + + static inline void backing_file_free(struct backing_file *ff) + { ++ security_backing_file_free(&ff->file); + path_put(&ff->user_path); + } + +@@ -260,10 +276,12 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred) + return f; + } + +-static int init_backing_file(struct backing_file *ff) ++static int init_backing_file(struct backing_file *ff, ++ const struct file *user_file) + { + memset(&ff->user_path, 0, sizeof(ff->user_path)); +- return 0; ++ backing_file_set_security(&ff->file, NULL); ++ return security_backing_file_alloc(&ff->file, user_file); + } + + /* +@@ -273,7 +291,8 @@ static int init_backing_file(struct backing_file *ff) + * This is only for kernel internal use, and the allocate file must not be + * installed into file tables or such. + */ +-struct file *alloc_empty_backing_file(int flags, const struct cred *cred) ++struct file *alloc_empty_backing_file(int flags, const struct cred *cred, ++ const struct file *user_file) + { + struct backing_file *ff; + int error; +@@ -290,7 +309,7 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred) + + /* The f_mode flags must be set before fput(). */ + ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT; +- error = init_backing_file(ff); ++ error = init_backing_file(ff, user_file); + if (unlikely(error)) { + fput(&ff->file); + return ERR_PTR(error); +diff --git a/fs/internal.h b/fs/internal.h +index 78fcdad80e53..f48f5fa349c9 100644 +--- a/fs/internal.h ++++ b/fs/internal.h +@@ -95,7 +95,8 @@ extern void chroot_fs_refs(const struct path *, const struct path *); + */ + struct file *alloc_empty_file(int flags, const struct cred *cred); + struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred); +-struct file *alloc_empty_backing_file(int flags, const struct cred *cred); ++struct file *alloc_empty_backing_file(int flags, const struct cred *cred, ++ const struct file *user_file); + void backing_file_set_user_path(struct file *f, const struct path *path); + + static inline void file_put_write_access(struct file *file) +diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c +index 3d8539909f74..a5bc5dfc930f 100644 +--- a/fs/overlayfs/file.c ++++ b/fs/overlayfs/file.c +@@ -51,7 +51,7 @@ static struct file *ovl_open_realfile(const struct file *file, + if (!inode_owner_or_capable(real_idmap, realinode)) + flags &= ~O_NOATIME; + +- realfile = backing_file_open(file_user_path(file), ++ realfile = backing_file_open(file, + flags, realpath, current_cred()); + } + revert_creds(old_cred); +diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h +index 3f1fe1774f1b..103b6992b80a 100644 +--- a/include/linux/backing-file.h ++++ b/include/linux/backing-file.h +@@ -19,7 +19,7 @@ struct backing_file_ctx { + void (*end_write)(struct file *); + }; + +-struct file *backing_file_open(const struct path *user_path, int flags, ++struct file *backing_file_open(const struct file *user_file, int flags, + const struct path *real_path, + const struct cred *cred); + ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter, +diff --git a/include/linux/fs.h b/include/linux/fs.h +index 7ed9232f579d..a94f20ba2bf6 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -2588,6 +2588,19 @@ struct file *dentry_create(const struct path *path, int flags, umode_t mode, + const struct cred *cred); + struct path *backing_file_user_path(const struct file *f); + ++#ifdef CONFIG_SECURITY ++void *backing_file_security(const struct file *f); ++void backing_file_set_security(struct file *f, void *security); ++#else ++static inline void *backing_file_security(const struct file *f) ++{ ++ return NULL; ++} ++static inline void backing_file_set_security(struct file *f, void *security) ++{ ++} ++#endif /* CONFIG_SECURITY */ ++ + /* + * file_user_path - get the path to display for memory mapped file + * +diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h +index 97a8b21eb033..c0a2839253fa 100644 +--- a/include/linux/lsm_audit.h ++++ b/include/linux/lsm_audit.h +@@ -93,7 +93,7 @@ struct common_audit_data { + #endif + char *kmod_name; + struct lsm_ioctlop_audit *op; +- struct file *file; ++ const struct file *file; + struct lsm_ibpkey_audit *ibpkey; + struct lsm_ibendport_audit *ibendport; + int reason; +diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h +index b6fbb446bab7..304da2a90ba7 100644 +--- a/include/linux/lsm_hook_defs.h ++++ b/include/linux/lsm_hook_defs.h +@@ -168,6 +168,9 @@ LSM_HOOK(int, 0, kernfs_init_security, struct kernfs_node *kn_dir, + LSM_HOOK(int, 0, file_permission, struct file *file, int mask) + LSM_HOOK(int, 0, file_alloc_security, struct file *file) + LSM_HOOK(void, LSM_RET_VOID, file_free_security, struct file *file) ++LSM_HOOK(int, 0, backing_file_alloc, struct file *backing_file, ++ const struct file *user_file) ++LSM_HOOK(void, LSM_RET_VOID, backing_file_free, struct file *backing_file) + LSM_HOOK(int, 0, file_ioctl, struct file *file, unsigned int cmd, + unsigned long arg) + LSM_HOOK(int, 0, file_ioctl_compat, struct file *file, unsigned int cmd, +@@ -175,6 +178,8 @@ LSM_HOOK(int, 0, file_ioctl_compat, struct file *file, unsigned int cmd, + LSM_HOOK(int, 0, mmap_addr, unsigned long addr) + LSM_HOOK(int, 0, mmap_file, struct file *file, unsigned long reqprot, + unsigned long prot, unsigned long flags) ++LSM_HOOK(int, 0, mmap_backing_file, struct vm_area_struct *vma, ++ struct file *backing_file, struct file *user_file) + LSM_HOOK(int, 0, file_mprotect, struct vm_area_struct *vma, + unsigned long reqprot, unsigned long prot) + LSM_HOOK(int, 0, file_lock, struct file *file, unsigned int cmd) +diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h +index 7577ecfc79e4..a16571929f7b 100644 +--- a/include/linux/lsm_hooks.h ++++ b/include/linux/lsm_hooks.h +@@ -1637,6 +1637,7 @@ struct security_hook_list { + struct lsm_blob_sizes { + int lbs_cred; + int lbs_file; ++ int lbs_backing_file; + int lbs_inode; + int lbs_superblock; + int lbs_ipc; +diff --git a/include/linux/security.h b/include/linux/security.h +index d2888c127859..db02db9f623a 100644 +--- a/include/linux/security.h ++++ b/include/linux/security.h +@@ -387,11 +387,17 @@ int security_kernfs_init_security(struct kernfs_node *kn_dir, + int security_file_permission(struct file *file, int mask); + int security_file_alloc(struct file *file); + void security_file_free(struct file *file); ++int security_backing_file_alloc(struct file *backing_file, ++ const struct file *user_file); ++void security_backing_file_free(struct file *backing_file); + int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg); + int security_file_ioctl_compat(struct file *file, unsigned int cmd, + unsigned long arg); + int security_mmap_file(struct file *file, unsigned long prot, + unsigned long flags); ++int security_mmap_backing_file(struct vm_area_struct *vma, ++ struct file *backing_file, ++ struct file *user_file); + int security_mmap_addr(unsigned long addr); + int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, + unsigned long prot); +@@ -976,6 +982,15 @@ static inline int security_file_alloc(struct file *file) + static inline void security_file_free(struct file *file) + { } + ++static inline int security_backing_file_alloc(struct file *backing_file, ++ const struct file *user_file) ++{ ++ return 0; ++} ++ ++static inline void security_backing_file_free(struct file *backing_file) ++{ } ++ + static inline int security_file_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) + { +@@ -995,6 +1010,13 @@ static inline int security_mmap_file(struct file *file, unsigned long prot, + return 0; + } + ++static inline int security_mmap_backing_file(struct vm_area_struct *vma, ++ struct file *backing_file, ++ struct file *user_file) ++{ ++ return 0; ++} ++ + static inline int security_mmap_addr(unsigned long addr) + { + return cap_mmap_addr(addr); +diff --git a/security/security.c b/security/security.c +index 1e63f23a504a..27a309ab0b97 100644 +--- a/security/security.c ++++ b/security/security.c +@@ -89,6 +89,7 @@ const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = { + static BLOCKING_NOTIFIER_HEAD(blocking_lsm_notifier_chain); + + static struct kmem_cache *lsm_file_cache; ++static struct kmem_cache *lsm_backing_file_cache; + static struct kmem_cache *lsm_inode_cache; + + char *lsm_names; +@@ -260,6 +261,8 @@ static void __init lsm_set_blob_sizes(struct lsm_blob_sizes *needed) + + lsm_set_blob_size(&needed->lbs_cred, &blob_sizes.lbs_cred); + lsm_set_blob_size(&needed->lbs_file, &blob_sizes.lbs_file); ++ lsm_set_blob_size(&needed->lbs_backing_file, ++ &blob_sizes.lbs_backing_file); + /* + * The inode blob gets an rcu_head in addition to + * what the modules might need. +@@ -447,14 +450,15 @@ static void __init ordered_lsm_init(void) + + report_lsm_order(); + +- init_debug("cred blob size = %d\n", blob_sizes.lbs_cred); +- init_debug("file blob size = %d\n", blob_sizes.lbs_file); +- init_debug("inode blob size = %d\n", blob_sizes.lbs_inode); +- init_debug("ipc blob size = %d\n", blob_sizes.lbs_ipc); +- init_debug("msg_msg blob size = %d\n", blob_sizes.lbs_msg_msg); +- init_debug("superblock blob size = %d\n", blob_sizes.lbs_superblock); +- init_debug("task blob size = %d\n", blob_sizes.lbs_task); +- init_debug("xattr slots = %d\n", blob_sizes.lbs_xattr_count); ++ init_debug("cred blob size = %d\n", blob_sizes.lbs_cred); ++ init_debug("file blob size = %d\n", blob_sizes.lbs_file); ++ init_debug("backing_file blob size = %d\n", blob_sizes.lbs_backing_file); ++ init_debug("inode blob size = %d\n", blob_sizes.lbs_inode); ++ init_debug("ipc blob size = %d\n", blob_sizes.lbs_ipc); ++ init_debug("msg_msg blob size = %d\n", blob_sizes.lbs_msg_msg); ++ init_debug("superblock blob size = %d\n", blob_sizes.lbs_superblock); ++ init_debug("task blob size = %d\n", blob_sizes.lbs_task); ++ init_debug("xattr slots = %d\n", blob_sizes.lbs_xattr_count); + + /* + * Create any kmem_caches needed for blobs +@@ -463,6 +467,11 @@ static void __init ordered_lsm_init(void) + lsm_file_cache = kmem_cache_create("lsm_file_cache", + blob_sizes.lbs_file, 0, + SLAB_PANIC, NULL); ++ if (blob_sizes.lbs_backing_file) ++ lsm_backing_file_cache = kmem_cache_create( ++ "lsm_backing_file_cache", ++ blob_sizes.lbs_backing_file, ++ 0, SLAB_PANIC, NULL); + if (blob_sizes.lbs_inode) + lsm_inode_cache = kmem_cache_create("lsm_inode_cache", + blob_sizes.lbs_inode, 0, +@@ -644,6 +653,30 @@ int unregister_blocking_lsm_notifier(struct notifier_block *nb) + } + EXPORT_SYMBOL(unregister_blocking_lsm_notifier); + ++/** ++ * lsm_backing_file_alloc - allocate a composite backing file blob ++ * @backing_file: the backing file ++ * ++ * Allocate the backing file blob for all the modules. ++ * ++ * Returns 0, or -ENOMEM if memory can't be allocated. ++ */ ++static int lsm_backing_file_alloc(struct file *backing_file) ++{ ++ void *blob; ++ ++ if (!lsm_backing_file_cache) { ++ backing_file_set_security(backing_file, NULL); ++ return 0; ++ } ++ ++ blob = kmem_cache_zalloc(lsm_backing_file_cache, GFP_KERNEL); ++ backing_file_set_security(backing_file, blob); ++ if (!blob) ++ return -ENOMEM; ++ return 0; ++} ++ + /** + * lsm_blob_alloc - allocate a composite blob + * @dest: the destination for the blob +@@ -1689,6 +1722,57 @@ void security_file_free(struct file *file) + } + } + ++/** ++ * security_backing_file_alloc() - Allocate and setup a backing file blob ++ * @backing_file: the backing file ++ * @user_file: the associated user visible file ++ * ++ * Allocate a backing file LSM blob and perform any necessary initialization of ++ * the LSM blob. There will be some operations where the LSM will not have ++ * access to @user_file after this point, so any important state associated ++ * with @user_file that is important to the LSM should be captured in the ++ * backing file's LSM blob. ++ * ++ * LSM's should avoid taking a reference to @user_file in this hook as it will ++ * result in problems later when the system attempts to drop/put the file ++ * references due to a circular dependency. ++ * ++ * Return: Return 0 if the hook is successful, negative values otherwise. ++ */ ++int security_backing_file_alloc(struct file *backing_file, ++ const struct file *user_file) ++{ ++ int rc; ++ ++ rc = lsm_backing_file_alloc(backing_file); ++ if (rc) ++ return rc; ++ rc = call_int_hook(backing_file_alloc, backing_file, user_file); ++ if (unlikely(rc)) ++ security_backing_file_free(backing_file); ++ ++ return rc; ++} ++ ++/** ++ * security_backing_file_free() - Free a backing file blob ++ * @backing_file: the backing file ++ * ++ * Free any LSM state associate with a backing file's LSM blob, including the ++ * blob itself. ++ */ ++void security_backing_file_free(struct file *backing_file) ++{ ++ void *blob = backing_file_security(backing_file); ++ ++ call_void_hook(backing_file_free, backing_file); ++ ++ if (blob) { ++ backing_file_set_security(backing_file, NULL); ++ kmem_cache_free(lsm_backing_file_cache, blob); ++ } ++} ++ + int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg) + { + return call_int_hook(file_ioctl, file, cmd, arg); +@@ -1757,6 +1841,32 @@ int security_mmap_file(struct file *file, unsigned long prot, + return ima_file_mmap(file, prot); + } + ++/** ++ * security_mmap_backing_file - Check if mmap'ing a backing file is allowed ++ * @vma: the vm_area_struct for the mmap'd region ++ * @backing_file: the backing file being mmap'd ++ * @user_file: the user file being mmap'd ++ * ++ * Check permissions for a mmap operation on a stacked filesystem. This hook ++ * is called after the security_mmap_file() and is responsible for authorizing ++ * the mmap on @backing_file. It is important to note that the mmap operation ++ * on @user_file has already been authorized and the @vma->vm_file has been ++ * set to @backing_file. ++ * ++ * Return: Returns 0 if permission is granted. ++ */ ++int security_mmap_backing_file(struct vm_area_struct *vma, ++ struct file *backing_file, ++ struct file *user_file) ++{ ++ /* recommended by the stackable filesystem devs */ ++ if (WARN_ON_ONCE(!(backing_file->f_mode & FMODE_BACKING))) ++ return -EIO; ++ ++ return call_int_hook(mmap_backing_file, vma, backing_file, user_file); ++} ++EXPORT_SYMBOL_GPL(security_mmap_backing_file); ++ + int security_mmap_addr(unsigned long addr) + { + return call_int_hook(mmap_addr, addr); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1344-selinux-fix-overlayfs-mmap-and-mprotect-access-checks.patch b/SOURCES/1344-selinux-fix-overlayfs-mmap-and-mprotect-access-checks.patch new file mode 100644 index 000000000..3d19053cb --- /dev/null +++ b/SOURCES/1344-selinux-fix-overlayfs-mmap-and-mprotect-access-checks.patch @@ -0,0 +1,444 @@ +From df2d263e4dcf6739be4c4b51d7e1f6c1d3316200 Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Wed, 3 Jun 2026 13:01:38 +0200 +Subject: [PATCH] selinux: fix overlayfs mmap() and mprotect() access checks + +JIRA: https://issues.redhat.com/browse/RHEL-179443 +CVE: CVE-2026-46054 +Conflicts: + - security/selinux/hooks.c: + - context fuzz + - preserve passing &selinux_state to avc_has_perm() + - preserve honoring the checkreqprot setting (in case of mmap + backing file check it is ignored, but that's the best we can do - + at worst some access would be denied on overlayfs in extremely + exotic use cases) + - security/selinux/include/objsec.h: context fuzz + +commit 82544d36b1729153c8aeb179e84750f0c085d3b1 +Author: Paul Moore +Date: Thu Jan 1 17:19:18 2026 -0500 + + selinux: fix overlayfs mmap() and mprotect() access checks + + The existing SELinux security model for overlayfs is to allow access if + the current task is able to access the top level file (the "user" file) + and the mounter's credentials are sufficient to access the lower + level file (the "backing" file). Unfortunately, the current code does + not properly enforce these access controls for both mmap() and mprotect() + operations on overlayfs filesystems. + + This patch makes use of the newly created security_mmap_backing_file() + LSM hook to provide the missing backing file enforcement for mmap() + operations, and leverages the backing file API and new LSM blob to + provide the necessary information to properly enforce the mprotect() + access controls. + + Cc: stable@vger.kernel.org + Acked-by: Amir Goldstein + Signed-off-by: Paul Moore + +Signed-off-by: Ondrej Mosnacek + +diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c +index deacc9a63fae..fb5b9cb027d0 100644 +--- a/security/selinux/hooks.c ++++ b/security/selinux/hooks.c +@@ -1706,50 +1706,76 @@ static inline int file_path_has_perm(const struct cred *cred, + static int bpf_fd_pass(const struct file *file, u32 sid); + #endif + +-/* Check whether a task can use an open file descriptor to +- access an inode in a given way. Check access to the +- descriptor itself, and then use dentry_has_perm to +- check a particular permission to the file. +- Access to the descriptor is implicitly granted if it +- has the same SID as the process. If av is zero, then +- access to the file is not checked, e.g. for cases +- where only the descriptor is affected like seek. */ +-static int file_has_perm(const struct cred *cred, +- struct file *file, +- u32 av) ++static int __file_has_perm(const struct cred *cred, const struct file *file, ++ u32 av, bool bf_user_file) ++ + { +- struct file_security_struct *fsec = selinux_file(file); +- struct inode *inode = file_inode(file); + struct common_audit_data ad; +- u32 sid = cred_sid(cred); ++ struct inode *inode; ++ u32 ssid = cred_sid(cred); ++ u32 tsid_fd; + int rc; + +- ad.type = LSM_AUDIT_DATA_FILE; +- ad.u.file = file; ++ if (bf_user_file) { ++ struct backing_file_security_struct *bfsec; ++ const struct path *path; + +- if (sid != fsec->sid) { ++ if (WARN_ON(!(file->f_mode & FMODE_BACKING))) ++ return -EIO; ++ ++ bfsec = selinux_backing_file(file); ++ path = backing_file_user_path(file); ++ tsid_fd = bfsec->uf_sid; ++ inode = d_inode(path->dentry); ++ ++ ad.type = LSM_AUDIT_DATA_PATH; ++ ad.u.path = *path; ++ } else { ++ struct file_security_struct *fsec = selinux_file(file); ++ ++ tsid_fd = fsec->sid; ++ inode = file_inode(file); ++ ++ ad.type = LSM_AUDIT_DATA_FILE; ++ ad.u.file = file; ++ } ++ ++ if (ssid != tsid_fd) { + rc = avc_has_perm(&selinux_state, +- sid, fsec->sid, ++ ssid, tsid_fd, + SECCLASS_FD, + FD__USE, + &ad); + if (rc) +- goto out; ++ return rc; + } + + #ifdef CONFIG_BPF_SYSCALL +- rc = bpf_fd_pass(file, cred_sid(cred)); ++ /* regardless of backing vs user file, use the underlying file here */ ++ rc = bpf_fd_pass(file, ssid); + if (rc) + return rc; + #endif + + /* av is zero if only checking access to the descriptor. */ +- rc = 0; + if (av) +- rc = inode_has_perm(cred, inode, av, &ad); ++ return inode_has_perm(cred, inode, av, &ad); + +-out: +- return rc; ++ return 0; ++} ++ ++/* Check whether a task can use an open file descriptor to ++ access an inode in a given way. Check access to the ++ descriptor itself, and then use dentry_has_perm to ++ check a particular permission to the file. ++ Access to the descriptor is implicitly granted if it ++ has the same SID as the process. If av is zero, then ++ access to the file is not checked, e.g. for cases ++ where only the descriptor is affected like seek. */ ++static inline int file_has_perm(const struct cred *cred, ++ const struct file *file, u32 av) ++{ ++ return __file_has_perm(cred, file, av, false); + } + + /* +@@ -3646,6 +3672,17 @@ static int selinux_file_alloc_security(struct file *file) + return 0; + } + ++static int selinux_backing_file_alloc(struct file *backing_file, ++ const struct file *user_file) ++{ ++ struct backing_file_security_struct *bfsec; ++ ++ bfsec = selinux_backing_file(backing_file); ++ bfsec->uf_sid = selinux_file(user_file)->sid; ++ ++ return 0; ++} ++ + /* + * Check whether a task has the ioctl permission and cmd + * operation to an inode. +@@ -3759,43 +3796,56 @@ static int selinux_file_ioctl_compat(struct file *file, unsigned int cmd, + + static int default_noexec __ro_after_init; + +-static int file_map_prot_check(struct file *file, unsigned long prot, int shared) ++static int __file_map_prot_check(const struct cred *cred, ++ const struct file *file, unsigned long prot, ++ bool shared, bool bf_user_file) + { +- const struct cred *cred = current_cred(); +- u32 sid = cred_sid(cred); +- int rc = 0; ++ struct inode *inode = NULL; ++ bool prot_exec = prot & PROT_EXEC; ++ bool prot_write = prot & PROT_WRITE; ++ ++ if (file) { ++ if (bf_user_file) ++ inode = d_inode(backing_file_user_path(file)->dentry); ++ else ++ inode = file_inode(file); ++ } ++ ++ if (default_noexec && prot_exec && ++ (!file || IS_PRIVATE(inode) || (!shared && prot_write))) { ++ int rc; ++ u32 sid = cred_sid(cred); + +- if (default_noexec && +- (prot & PROT_EXEC) && (!file || IS_PRIVATE(file_inode(file)) || +- (!shared && (prot & PROT_WRITE)))) { + /* +- * We are making executable an anonymous mapping or a +- * private file mapping that will also be writable. +- * This has an additional check. ++ * We are making executable an anonymous mapping or a private ++ * file mapping that will also be writable. + */ + rc = avc_has_perm(&selinux_state, +- sid, sid, SECCLASS_PROCESS, +- PROCESS__EXECMEM, NULL); ++ sid, sid, SECCLASS_PROCESS, PROCESS__EXECMEM, ++ NULL); + if (rc) +- goto error; ++ return rc; + } + + if (file) { +- /* read access is always possible with a mapping */ ++ /* "read" always possible, "write" only if shared */ + u32 av = FILE__READ; +- +- /* write access only matters if the mapping is shared */ +- if (shared && (prot & PROT_WRITE)) ++ if (shared && prot_write) + av |= FILE__WRITE; +- +- if (prot & PROT_EXEC) ++ if (prot_exec) + av |= FILE__EXECUTE; + +- return file_has_perm(cred, file, av); ++ return __file_has_perm(cred, file, av, bf_user_file); + } + +-error: +- return rc; ++ return 0; ++} ++ ++static inline int file_map_prot_check(const struct cred *cred, ++ const struct file *file, ++ unsigned long prot, bool shared) ++{ ++ return __file_map_prot_check(cred, file, prot, shared, false); + } + + static int selinux_mmap_addr(unsigned long addr) +@@ -3812,17 +3862,17 @@ static int selinux_mmap_addr(unsigned long addr) + return rc; + } + +-static int selinux_mmap_file(struct file *file, unsigned long reqprot, +- unsigned long prot, unsigned long flags) ++static int selinux_mmap_file_common(const struct cred *cred, struct file *file, ++ unsigned long reqprot, unsigned long prot, ++ bool shared) + { +- struct common_audit_data ad; +- int rc; +- + if (file) { ++ int rc; ++ struct common_audit_data ad; ++ + ad.type = LSM_AUDIT_DATA_FILE; + ad.u.file = file; +- rc = inode_has_perm(current_cred(), file_inode(file), +- FILE__MAP, &ad); ++ rc = inode_has_perm(cred, file_inode(file), FILE__MAP, &ad); + if (rc) + return rc; + } +@@ -3830,23 +3880,68 @@ static int selinux_mmap_file(struct file *file, unsigned long reqprot, + if (checkreqprot_get(&selinux_state)) + prot = reqprot; + +- return file_map_prot_check(file, prot, +- (flags & MAP_TYPE) == MAP_SHARED); ++ return file_map_prot_check(cred, file, prot, shared); ++} ++ ++static int selinux_mmap_file(struct file *file, unsigned long reqprot, ++ unsigned long prot, unsigned long flags) ++{ ++ return selinux_mmap_file_common(current_cred(), file, reqprot, prot, ++ (flags & MAP_TYPE) == MAP_SHARED); ++} ++ ++/** ++ * selinux_mmap_backing_file - Check mmap permissions on a backing file ++ * @vma: memory region ++ * @backing_file: stacked filesystem backing file ++ * @user_file: user visible file ++ * ++ * This is called after selinux_mmap_file() on stacked filesystems, and it ++ * is this function's responsibility to verify access to @backing_file and ++ * setup the SELinux state for possible later use in the mprotect() code path. ++ * ++ * By the time this function is called, mmap() access to @user_file has already ++ * been authorized and @vma->vm_file has been set to point to @backing_file. ++ * ++ * Return zero on success, negative values otherwise. ++ */ ++static int selinux_mmap_backing_file(struct vm_area_struct *vma, ++ struct file *backing_file, ++ struct file *user_file __always_unused) ++{ ++ unsigned long prot = 0; ++ ++ /* translate vma->vm_flags perms into PROT perms */ ++ if (vma->vm_flags & VM_READ) ++ prot |= PROT_READ; ++ if (vma->vm_flags & VM_WRITE) ++ prot |= PROT_WRITE; ++ if (vma->vm_flags & VM_EXEC) ++ prot |= PROT_EXEC; ++ ++ return selinux_mmap_file_common(backing_file->f_cred, backing_file, ++ prot, prot, vma->vm_flags & VM_SHARED); + } + + static int selinux_file_mprotect(struct vm_area_struct *vma, + unsigned long reqprot, + unsigned long prot) + { ++ int rc; + const struct cred *cred = current_cred(); + u32 sid = cred_sid(cred); ++ const struct file *file = vma->vm_file; ++ bool backing_file; ++ bool shared = vma->vm_flags & VM_SHARED; ++ ++ /* check if we need to trigger the "backing files are awful" mode */ ++ backing_file = file && (file->f_mode & FMODE_BACKING); + + if (checkreqprot_get(&selinux_state)) + prot = reqprot; + + if (default_noexec && + (prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) { +- int rc = 0; + /* + * We don't use the vma_is_initial_heap() helper as it has + * a history of problems and is currently broken on systems +@@ -3861,12 +3956,16 @@ static int selinux_file_mprotect(struct vm_area_struct *vma, + rc = avc_has_perm(&selinux_state, + sid, sid, SECCLASS_PROCESS, + PROCESS__EXECHEAP, NULL); +- } else if (!vma->vm_file && (vma_is_initial_stack(vma) || ++ if (rc) ++ return rc; ++ } else if (!file && (vma_is_initial_stack(vma) || + vma_is_stack_for_current(vma))) { + rc = avc_has_perm(&selinux_state, + sid, sid, SECCLASS_PROCESS, + PROCESS__EXECSTACK, NULL); +- } else if (vma->vm_file && vma->anon_vma) { ++ if (rc) ++ return rc; ++ } else if (file && vma->anon_vma) { + /* + * We are making executable a file mapping that has + * had some COW done. Since pages might have been +@@ -3874,13 +3973,29 @@ static int selinux_file_mprotect(struct vm_area_struct *vma, + * modified content. This typically should only + * occur for text relocations. + */ +- rc = file_has_perm(cred, vma->vm_file, FILE__EXECMOD); ++ rc = __file_has_perm(cred, file, FILE__EXECMOD, ++ backing_file); ++ if (rc) ++ return rc; ++ if (backing_file) { ++ rc = file_has_perm(file->f_cred, file, ++ FILE__EXECMOD); ++ if (rc) ++ return rc; ++ } + } ++ } ++ ++ rc = __file_map_prot_check(cred, file, prot, shared, backing_file); ++ if (rc) ++ return rc; ++ if (backing_file) { ++ rc = file_map_prot_check(file->f_cred, file, prot, shared); + if (rc) + return rc; + } + +- return file_map_prot_check(vma->vm_file, prot, vma->vm_flags&VM_SHARED); ++ return 0; + } + + static int selinux_file_lock(struct file *file, unsigned int cmd) +@@ -7007,6 +7122,7 @@ static void selinux_bpf_token_free(struct bpf_token *token) + struct lsm_blob_sizes selinux_blob_sizes __ro_after_init = { + .lbs_cred = sizeof(struct task_security_struct), + .lbs_file = sizeof(struct file_security_struct), ++ .lbs_backing_file = sizeof(struct backing_file_security_struct), + .lbs_inode = sizeof(struct inode_security_struct), + .lbs_ipc = sizeof(struct ipc_security_struct), + .lbs_msg_msg = sizeof(struct msg_security_struct), +@@ -7216,9 +7332,11 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = { + + LSM_HOOK_INIT(file_permission, selinux_file_permission), + LSM_HOOK_INIT(file_alloc_security, selinux_file_alloc_security), ++ LSM_HOOK_INIT(backing_file_alloc, selinux_backing_file_alloc), + LSM_HOOK_INIT(file_ioctl, selinux_file_ioctl), + LSM_HOOK_INIT(file_ioctl_compat, selinux_file_ioctl_compat), + LSM_HOOK_INIT(mmap_file, selinux_mmap_file), ++ LSM_HOOK_INIT(mmap_backing_file, selinux_mmap_backing_file), + LSM_HOOK_INIT(mmap_addr, selinux_mmap_addr), + LSM_HOOK_INIT(file_mprotect, selinux_file_mprotect), + LSM_HOOK_INIT(file_lock, selinux_file_lock), +diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h +index 2953132408bf..b1c5a2877f7e 100644 +--- a/security/selinux/include/objsec.h ++++ b/security/selinux/include/objsec.h +@@ -60,6 +60,10 @@ struct file_security_struct { + u32 pseqno; /* Policy seqno at the time of file open */ + }; + ++struct backing_file_security_struct { ++ u32 uf_sid; /* associated user file fsec->sid */ ++}; ++ + struct superblock_security_struct { + u32 sid; /* SID of file system superblock */ + u32 def_sid; /* default SID for labeling */ +@@ -158,6 +162,13 @@ static inline struct file_security_struct *selinux_file(const struct file *file) + return file->f_security + selinux_blob_sizes.lbs_file; + } + ++static inline struct backing_file_security_struct * ++selinux_backing_file(const struct file *backing_file) ++{ ++ void *blob = backing_file_security(backing_file); ++ return blob + selinux_blob_sizes.lbs_backing_file; ++} ++ + static inline struct inode_security_struct *selinux_inode( + const struct inode *inode) + { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1345-selinux-rhel-only-hotfix-for-execmem-regression.patch b/SOURCES/1345-selinux-rhel-only-hotfix-for-execmem-regression.patch new file mode 100644 index 000000000..7b9fe21c0 --- /dev/null +++ b/SOURCES/1345-selinux-rhel-only-hotfix-for-execmem-regression.patch @@ -0,0 +1,130 @@ +From ec2a2e4b876c7faed3de5e85406180810cc8539a Mon Sep 17 00:00:00 2001 +From: Ondrej Mosnacek +Date: Tue, 16 Jun 2026 10:06:13 +0200 +Subject: [PATCH] selinux: RHEL-only hotfix for execmem regression +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-185118 +Upstream Status: RHEL9-only + +As discovered by image-mode/CoreOS testing, the fixes for CVE-2026-46054 +caused a regression that results in unexpected execmem denials in +specific scenarios involving overlayfs (or another stacked filesystem). + +Specifically in case of image mode / CoreOS there is often (always?) an +overlayfs filesystem mounted during early boot (before SELinux policy is +loaded), which means that overlayfs captures the kernel’s SELinux +context as part of the mounter credentials, which are later used by +overlayfs+SELinux to verify that file accesses through the overlay mount +don’t give the mounter a way to access underlying files it otherwise +wouldn’t have access to. This verification would normally pass, as the +policy grants the kernel context almost unrestricted access to the +filesystem. However, the new checks added to fix CVE-2026-46054 +erroneously include the execmem check for the mounter and in the policy +kernel_t doesn’t have the execmem permission, so mmapping an overlay +file with MAP_PRIVATE and PROT_WRITE|PROT_EXEC would now result in a +SELinux denial. + +Fix this by passing a boolean through the helper functions that allows +to distinguish the direct permission check from the mounter check and +skipping the execmem check in the mounter case. + +This is a transient RHEL-only fix to allow the CVE fix to go through +without breaking image mode/CoreOS deployments. Once an optimal solution +is figured out and applied upstream, this commit will be reverted and +replaced with the upstream fix (at least in Y-streams). I expect the +upstream solution to be functionally equivalent, though probably +cosmetically different. + +Signed-off-by: Ondrej Mosnacek + +diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c +index fb5b9cb027d0..b31b06d4440b 100644 +--- a/security/selinux/hooks.c ++++ b/security/selinux/hooks.c +@@ -3798,7 +3798,7 @@ static int default_noexec __ro_after_init; + + static int __file_map_prot_check(const struct cred *cred, + const struct file *file, unsigned long prot, +- bool shared, bool bf_user_file) ++ bool shared, bool mounter, bool bf_user_file) + { + struct inode *inode = NULL; + bool prot_exec = prot & PROT_EXEC; +@@ -3812,7 +3812,7 @@ static int __file_map_prot_check(const struct cred *cred, + } + + if (default_noexec && prot_exec && +- (!file || IS_PRIVATE(inode) || (!shared && prot_write))) { ++ (!file || IS_PRIVATE(inode) || (!shared && prot_write)) && !mounter) { + int rc; + u32 sid = cred_sid(cred); + +@@ -3843,9 +3843,9 @@ static int __file_map_prot_check(const struct cred *cred, + + static inline int file_map_prot_check(const struct cred *cred, + const struct file *file, +- unsigned long prot, bool shared) ++ unsigned long prot, bool shared, bool mounter) + { +- return __file_map_prot_check(cred, file, prot, shared, false); ++ return __file_map_prot_check(cred, file, prot, shared, mounter, false); + } + + static int selinux_mmap_addr(unsigned long addr) +@@ -3864,7 +3864,7 @@ static int selinux_mmap_addr(unsigned long addr) + + static int selinux_mmap_file_common(const struct cred *cred, struct file *file, + unsigned long reqprot, unsigned long prot, +- bool shared) ++ bool shared, bool mounter) + { + if (file) { + int rc; +@@ -3880,14 +3880,15 @@ static int selinux_mmap_file_common(const struct cred *cred, struct file *file, + if (checkreqprot_get(&selinux_state)) + prot = reqprot; + +- return file_map_prot_check(cred, file, prot, shared); ++ return file_map_prot_check(cred, file, prot, shared, mounter); + } + + static int selinux_mmap_file(struct file *file, unsigned long reqprot, + unsigned long prot, unsigned long flags) + { + return selinux_mmap_file_common(current_cred(), file, reqprot, prot, +- (flags & MAP_TYPE) == MAP_SHARED); ++ (flags & MAP_TYPE) == MAP_SHARED, ++ false); + } + + /** +@@ -3920,7 +3921,8 @@ static int selinux_mmap_backing_file(struct vm_area_struct *vma, + prot |= PROT_EXEC; + + return selinux_mmap_file_common(backing_file->f_cred, backing_file, +- prot, prot, vma->vm_flags & VM_SHARED); ++ prot, prot, vma->vm_flags & VM_SHARED, ++ true); + } + + static int selinux_file_mprotect(struct vm_area_struct *vma, +@@ -3986,11 +3988,11 @@ static int selinux_file_mprotect(struct vm_area_struct *vma, + } + } + +- rc = __file_map_prot_check(cred, file, prot, shared, backing_file); ++ rc = __file_map_prot_check(cred, file, prot, shared, false, backing_file); + if (rc) + return rc; + if (backing_file) { +- rc = file_map_prot_check(file->f_cred, file, prot, shared); ++ rc = file_map_prot_check(file->f_cred, file, prot, shared, true); + if (rc) + return rc; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1346-net-mlx5-hws-fix-matcher-action-template-attach.patch b/SOURCES/1346-net-mlx5-hws-fix-matcher-action-template-attach.patch new file mode 100644 index 000000000..9225e114c --- /dev/null +++ b/SOURCES/1346-net-mlx5-hws-fix-matcher-action-template-attach.patch @@ -0,0 +1,323 @@ +From 0ba00ffd79bdb243b0067aa95fe5846b6c1ecbe7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:41:58 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix matcher action template attach + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 36ef2575e78d1a3c699dc3f1c9dee9be742c9bdd +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:31 2025 +0300 + + net/mlx5: HWS, Fix matcher action template attach + + The procedure of attaching an action template to an existing matcher had + a few issues: + + 1. Attaching accidentally overran the `at` array in bwc_matcher, which + would result in memory corruption. This bug wasn't triggered, but it + is possible to trigger it by attaching action templates beyond the + initial buffer size of 8. Fix this by converting to a dynamically + sized buffer and reallocating if needed. + + 2. Similarly, the `at` array inside the native matcher was never + reallocated. Fix this the same as above. + + 3. The bwc layer treated any error in action template attach as a signal + that the matcher should be rehashed to account for a larger number of + action STEs. In reality, there are other unrelated errors that can + arise and they should be propagated upstack. Fix this by adding a + `need_rehash` output parameter that's orthogonal to error codes. + + Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 19dce1ba512d..32de8bfc7644 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -90,13 +90,19 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + bwc_matcher->priority = priority; + bwc_matcher->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG; + ++ bwc_matcher->size_of_at_array = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM; ++ bwc_matcher->at = kcalloc(bwc_matcher->size_of_at_array, ++ sizeof(*bwc_matcher->at), GFP_KERNEL); ++ if (!bwc_matcher->at) ++ goto free_bwc_matcher_rules; ++ + /* create dummy action template */ + bwc_matcher->at[0] = + mlx5hws_action_template_create(action_types ? + action_types : init_action_types); + if (!bwc_matcher->at[0]) { + mlx5hws_err(table->ctx, "BWC matcher: failed creating action template\n"); +- goto free_bwc_matcher_rules; ++ goto free_bwc_matcher_at_array; + } + + bwc_matcher->num_of_at = 1; +@@ -126,6 +132,8 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + mlx5hws_match_template_destroy(bwc_matcher->mt); + free_at: + mlx5hws_action_template_destroy(bwc_matcher->at[0]); ++free_bwc_matcher_at_array: ++ kfree(bwc_matcher->at); + free_bwc_matcher_rules: + kfree(bwc_matcher->rules); + err: +@@ -192,6 +200,7 @@ int mlx5hws_bwc_matcher_destroy_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + + for (i = 0; i < bwc_matcher->num_of_at; i++) + mlx5hws_action_template_destroy(bwc_matcher->at[i]); ++ kfree(bwc_matcher->at); + + mlx5hws_match_template_destroy(bwc_matcher->mt); + kfree(bwc_matcher->rules); +@@ -520,6 +529,23 @@ hws_bwc_matcher_extend_at(struct mlx5hws_bwc_matcher *bwc_matcher, + struct mlx5hws_rule_action rule_actions[]) + { + enum mlx5hws_action_type action_types[MLX5HWS_BWC_MAX_ACTS]; ++ void *p; ++ ++ if (unlikely(bwc_matcher->num_of_at >= bwc_matcher->size_of_at_array)) { ++ if (bwc_matcher->size_of_at_array >= MLX5HWS_MATCHER_MAX_AT) ++ return -ENOMEM; ++ bwc_matcher->size_of_at_array *= 2; ++ p = krealloc(bwc_matcher->at, ++ bwc_matcher->size_of_at_array * ++ sizeof(*bwc_matcher->at), ++ __GFP_ZERO | GFP_KERNEL); ++ if (!p) { ++ bwc_matcher->size_of_at_array /= 2; ++ return -ENOMEM; ++ } ++ ++ bwc_matcher->at = p; ++ } + + hws_bwc_rule_actions_to_action_types(rule_actions, action_types); + +@@ -777,6 +803,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_rule_attr rule_attr; + struct mutex *queue_lock; /* Protect the queue */ + u32 num_of_rules; ++ bool need_rehash; + int ret = 0; + int at_idx; + +@@ -803,10 +830,14 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + at_idx = bwc_matcher->num_of_at - 1; + + ret = mlx5hws_matcher_attach_at(bwc_matcher->matcher, +- bwc_matcher->at[at_idx]); ++ bwc_matcher->at[at_idx], ++ &need_rehash); + if (unlikely(ret)) { +- /* Action template attach failed, possibly due to +- * requiring more action STEs. ++ hws_bwc_unlock_all_queues(ctx); ++ return ret; ++ } ++ if (unlikely(need_rehash)) { ++ /* The new action template requires more action STEs. + * Need to attempt creating new matcher with all + * the action templates, including the new one. + */ +@@ -942,6 +973,7 @@ hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; + struct mlx5hws_rule_attr rule_attr; + struct mutex *queue_lock; /* Protect the queue */ ++ bool need_rehash; + int at_idx, ret; + u16 idx; + +@@ -973,12 +1005,17 @@ hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + at_idx = bwc_matcher->num_of_at - 1; + + ret = mlx5hws_matcher_attach_at(bwc_matcher->matcher, +- bwc_matcher->at[at_idx]); ++ bwc_matcher->at[at_idx], ++ &need_rehash); + if (unlikely(ret)) { +- /* Action template attach failed, possibly due to +- * requiring more action STEs. +- * Need to attempt creating new matcher with all +- * the action templates, including the new one. ++ hws_bwc_unlock_all_queues(ctx); ++ return ret; ++ } ++ if (unlikely(need_rehash)) { ++ /* The new action template requires more action ++ * STEs. Need to attempt creating new matcher ++ * with all the action templates, including the ++ * new one. + */ + ret = hws_bwc_matcher_rehash_at(bwc_matcher); + if (unlikely(ret)) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +index 47f7ed141553..bb0cf4b922ce 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +@@ -10,9 +10,7 @@ + #define MLX5HWS_BWC_MATCHER_REHASH_BURST_TH 32 + + /* Max number of AT attach operations for the same matcher. +- * When the limit is reached, next attempt to attach new AT +- * will result in creation of a new matcher and moving all +- * the rules to this matcher. ++ * When the limit is reached, a larger buffer is allocated for the ATs. + */ + #define MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM 8 + +@@ -23,10 +21,11 @@ + struct mlx5hws_bwc_matcher { + struct mlx5hws_matcher *matcher; + struct mlx5hws_match_template *mt; +- struct mlx5hws_action_template *at[MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM]; +- u32 priority; ++ struct mlx5hws_action_template **at; + u8 num_of_at; ++ u8 size_of_at_array; + u8 size_log; ++ u32 priority; + atomic_t num_of_rules; + struct list_head *rules; + }; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index b61864b32053..37a4497048a6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -905,18 +905,48 @@ static int hws_matcher_uninit(struct mlx5hws_matcher *matcher) + return 0; + } + ++static int hws_matcher_grow_at_array(struct mlx5hws_matcher *matcher) ++{ ++ void *p; ++ ++ if (matcher->size_of_at_array >= MLX5HWS_MATCHER_MAX_AT) ++ return -ENOMEM; ++ ++ matcher->size_of_at_array *= 2; ++ p = krealloc(matcher->at, ++ matcher->size_of_at_array * sizeof(*matcher->at), ++ __GFP_ZERO | GFP_KERNEL); ++ if (!p) { ++ matcher->size_of_at_array /= 2; ++ return -ENOMEM; ++ } ++ ++ matcher->at = p; ++ ++ return 0; ++} ++ + int mlx5hws_matcher_attach_at(struct mlx5hws_matcher *matcher, +- struct mlx5hws_action_template *at) ++ struct mlx5hws_action_template *at, ++ bool *need_rehash) + { + bool is_jumbo = mlx5hws_matcher_mt_is_jumbo(matcher->mt); + struct mlx5hws_context *ctx = matcher->tbl->ctx; + u32 required_stes; + int ret; + +- if (!matcher->attr.max_num_of_at_attach) { +- mlx5hws_dbg(ctx, "Num of current at (%d) exceed allowed value\n", +- matcher->num_of_at); +- return -EOPNOTSUPP; ++ *need_rehash = false; ++ ++ if (unlikely(matcher->num_of_at >= matcher->size_of_at_array)) { ++ ret = hws_matcher_grow_at_array(matcher); ++ if (ret) ++ return ret; ++ ++ if (matcher->col_matcher) { ++ ret = hws_matcher_grow_at_array(matcher->col_matcher); ++ if (ret) ++ return ret; ++ } + } + + ret = hws_matcher_check_and_process_at(matcher, at); +@@ -927,12 +957,11 @@ int mlx5hws_matcher_attach_at(struct mlx5hws_matcher *matcher, + if (matcher->action_ste.max_stes < required_stes) { + mlx5hws_dbg(ctx, "Required STEs [%d] exceeds initial action template STE [%d]\n", + required_stes, matcher->action_ste.max_stes); +- return -ENOMEM; ++ *need_rehash = true; + } + + matcher->at[matcher->num_of_at] = *at; + matcher->num_of_at += 1; +- matcher->attr.max_num_of_at_attach -= 1; + + if (matcher->col_matcher) + matcher->col_matcher->num_of_at = matcher->num_of_at; +@@ -960,8 +989,9 @@ hws_matcher_set_templates(struct mlx5hws_matcher *matcher, + if (!matcher->mt) + return -ENOMEM; + +- matcher->at = kvcalloc(num_of_at + matcher->attr.max_num_of_at_attach, +- sizeof(*matcher->at), ++ matcher->size_of_at_array = ++ num_of_at + matcher->attr.max_num_of_at_attach; ++ matcher->at = kvcalloc(matcher->size_of_at_array, sizeof(*matcher->at), + GFP_KERNEL); + if (!matcher->at) { + mlx5hws_err(ctx, "Failed to allocate action template array\n"); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +index 020de70270c5..20b32012c418 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +@@ -23,6 +23,9 @@ + */ + #define MLX5HWS_MATCHER_ACTION_RTC_UPDATE_MULT 1 + ++/* Maximum number of action templates that can be attached to a matcher. */ ++#define MLX5HWS_MATCHER_MAX_AT 128 ++ + enum mlx5hws_matcher_offset { + MLX5HWS_MATCHER_OFFSET_TAG_DW1 = 12, + MLX5HWS_MATCHER_OFFSET_TAG_DW0 = 13, +@@ -72,6 +75,7 @@ struct mlx5hws_matcher { + struct mlx5hws_match_template *mt; + struct mlx5hws_action_template *at; + u8 num_of_at; ++ u8 size_of_at_array; + u8 num_of_mt; + /* enum mlx5hws_matcher_flags */ + u8 flags; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index 5121951f2778..8ed8a715a2eb 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -399,11 +399,14 @@ int mlx5hws_matcher_destroy(struct mlx5hws_matcher *matcher); + * + * @matcher: Matcher to attach the action template to. + * @at: Action template to be attached to the matcher. ++ * @need_rehash: Output parameter that tells callers if the matcher needs to be ++ * rehashed. + * + * Return: Zero on success, non-zero otherwise. + */ + int mlx5hws_matcher_attach_at(struct mlx5hws_matcher *matcher, +- struct mlx5hws_action_template *at); ++ struct mlx5hws_action_template *at, ++ bool *need_rehash); + + /** + * mlx5hws_matcher_resize_set_target - Link two matchers and enable moving rules. +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1347-net-mlx5-hws-remove-unused-element-array.patch b/SOURCES/1347-net-mlx5-hws-remove-unused-element-array.patch new file mode 100644 index 000000000..f286da431 --- /dev/null +++ b/SOURCES/1347-net-mlx5-hws-remove-unused-element-array.patch @@ -0,0 +1,178 @@ +From f6b3ae9e3b84ce0d94c00565a0321eb0a1502cee Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:41:58 -0400 +Subject: [PATCH] net/mlx5: HWS, Remove unused element array + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b2ae16214ffeda3e1c25223eebe19f85b0876181 +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:32 2025 +0300 + + net/mlx5: HWS, Remove unused element array + + Remove the array of elements wrapped in a struct because in reality only + the first element was ever used. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +index 50a81d360bb2..35ed9bee06a6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +@@ -293,7 +293,7 @@ static int hws_pool_create_resource_on_index(struct mlx5hws_pool *pool, + } + + static struct mlx5hws_pool_elements * +-hws_pool_element_create_new_elem(struct mlx5hws_pool *pool, u32 order, int idx) ++hws_pool_element_create_new_elem(struct mlx5hws_pool *pool, u32 order) + { + struct mlx5hws_pool_elements *elem; + u32 alloc_size; +@@ -311,21 +311,21 @@ hws_pool_element_create_new_elem(struct mlx5hws_pool *pool, u32 order, int idx) + elem->bitmap = hws_pool_create_and_init_bitmap(alloc_size - order); + if (!elem->bitmap) { + mlx5hws_err(pool->ctx, +- "Failed to create bitmap type: %d: size %d index: %d\n", +- pool->type, alloc_size, idx); ++ "Failed to create bitmap type: %d: size %d\n", ++ pool->type, alloc_size); + goto free_elem; + } + + elem->log_size = alloc_size - order; + } + +- if (hws_pool_create_resource_on_index(pool, alloc_size, idx)) { +- mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d index: %d\n", +- pool->type, alloc_size, idx); ++ if (hws_pool_create_resource_on_index(pool, alloc_size, 0)) { ++ mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d\n", ++ pool->type, alloc_size); + goto free_db; + } + +- pool->db.element_manager->elements[idx] = elem; ++ pool->db.element = elem; + + return elem; + +@@ -359,9 +359,9 @@ hws_pool_onesize_element_get_mem_chunk(struct mlx5hws_pool *pool, u32 order, + { + struct mlx5hws_pool_elements *elem; + +- elem = pool->db.element_manager->elements[0]; ++ elem = pool->db.element; + if (!elem) +- elem = hws_pool_element_create_new_elem(pool, order, 0); ++ elem = hws_pool_element_create_new_elem(pool, order); + if (!elem) + goto err_no_elem; + +@@ -451,16 +451,14 @@ static int hws_pool_general_element_db_init(struct mlx5hws_pool *pool) + return 0; + } + +-static void hws_onesize_element_db_destroy_element(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_elements *elem, +- struct mlx5hws_pool_chunk *chunk) ++static void ++hws_onesize_element_db_destroy_element(struct mlx5hws_pool *pool, ++ struct mlx5hws_pool_elements *elem) + { +- if (unlikely(!pool->resource[chunk->resource_idx])) +- pr_warn("HWS: invalid resource with index %d\n", chunk->resource_idx); +- +- hws_pool_resource_free(pool, chunk->resource_idx); ++ hws_pool_resource_free(pool, 0); ++ bitmap_free(elem->bitmap); + kfree(elem); +- pool->db.element_manager->elements[chunk->resource_idx] = NULL; ++ pool->db.element = NULL; + } + + static void hws_onesize_element_db_put_chunk(struct mlx5hws_pool *pool, +@@ -471,7 +469,7 @@ static void hws_onesize_element_db_put_chunk(struct mlx5hws_pool *pool, + if (unlikely(chunk->resource_idx)) + pr_warn("HWS: invalid resource with index %d\n", chunk->resource_idx); + +- elem = pool->db.element_manager->elements[chunk->resource_idx]; ++ elem = pool->db.element; + if (!elem) { + mlx5hws_err(pool->ctx, "No such element (%d)\n", chunk->resource_idx); + return; +@@ -483,7 +481,7 @@ static void hws_onesize_element_db_put_chunk(struct mlx5hws_pool *pool, + + if (pool->flags & MLX5HWS_POOL_FLAGS_RELEASE_FREE_RESOURCE && + !elem->num_of_elements) +- hws_onesize_element_db_destroy_element(pool, elem, chunk); ++ hws_onesize_element_db_destroy_element(pool, elem); + } + + static int hws_onesize_element_db_get_chunk(struct mlx5hws_pool *pool, +@@ -504,18 +502,13 @@ static int hws_onesize_element_db_get_chunk(struct mlx5hws_pool *pool, + + static void hws_onesize_element_db_uninit(struct mlx5hws_pool *pool) + { +- struct mlx5hws_pool_elements *elem; +- int i; ++ struct mlx5hws_pool_elements *elem = pool->db.element; + +- for (i = 0; i < MLX5HWS_POOL_RESOURCE_ARR_SZ; i++) { +- elem = pool->db.element_manager->elements[i]; +- if (elem) { +- bitmap_free(elem->bitmap); +- kfree(elem); +- pool->db.element_manager->elements[i] = NULL; +- } ++ if (elem) { ++ bitmap_free(elem->bitmap); ++ kfree(elem); ++ pool->db.element = NULL; + } +- kfree(pool->db.element_manager); + } + + /* This memory management works as the following: +@@ -526,10 +519,6 @@ static void hws_onesize_element_db_uninit(struct mlx5hws_pool *pool) + */ + static int hws_pool_onesize_element_db_init(struct mlx5hws_pool *pool) + { +- pool->db.element_manager = kzalloc(sizeof(*pool->db.element_manager), GFP_KERNEL); +- if (!pool->db.element_manager) +- return -ENOMEM; +- + pool->p_db_uninit = &hws_onesize_element_db_uninit; + pool->p_get_chunk = &hws_onesize_element_db_get_chunk; + pool->p_put_chunk = &hws_onesize_element_db_put_chunk; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +index 621298b352b2..f4258f83fdbf 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +@@ -87,14 +87,10 @@ struct mlx5hws_pool_elements { + bool is_full; + }; + +-struct mlx5hws_element_manager { +- struct mlx5hws_pool_elements *elements[MLX5HWS_POOL_RESOURCE_ARR_SZ]; +-}; +- + struct mlx5hws_pool_db { + enum mlx5hws_db_type type; + union { +- struct mlx5hws_element_manager *element_manager; ++ struct mlx5hws_pool_elements *element; + struct mlx5hws_buddy_manager *buddy_manager; + }; + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1348-net-mlx5-hws-make-pool-single-resource.patch b/SOURCES/1348-net-mlx5-hws-make-pool-single-resource.patch new file mode 100644 index 000000000..9ccdbc21c --- /dev/null +++ b/SOURCES/1348-net-mlx5-hws-make-pool-single-resource.patch @@ -0,0 +1,700 @@ +From 33eec33671fbe0444075b87f15b803b77059993f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:41:58 -0400 +Subject: [PATCH] net/mlx5: HWS, Make pool single resource + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 38956bea7349ce75c1519b57c27cd97580b4c822 +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:33 2025 +0300 + + net/mlx5: HWS, Make pool single resource + + The pool implementation claimed to support multiple resources, but this + does not really make sense in context. Callers always allocate a single + STC or STE chunk of exactly the size provided. + + The code that handled multiple resources was unused (and likely buggy) + due to the combination of flags passed by callers. + + Simplify the pool by having it handle a single resource. As a result of + this simplification, chunks no longer contain a resource offset (there + is now only one resource per pool), and the get_base_id functions no + longer take a chunk parameter. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index b5332c54d4fb..781ba8c4f733 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -238,6 +238,7 @@ hws_action_fixup_stc_attr(struct mlx5hws_context *ctx, + enum mlx5hws_table_type table_type, + bool is_mirror) + { ++ struct mlx5hws_pool *pool; + bool use_fixup = false; + u32 fw_tbl_type; + u32 base_id; +@@ -253,13 +254,11 @@ hws_action_fixup_stc_attr(struct mlx5hws_context *ctx, + use_fixup = true; + break; + } ++ pool = stc_attr->ste_table.ste_pool; + if (!is_mirror) +- base_id = mlx5hws_pool_chunk_get_base_id(stc_attr->ste_table.ste_pool, +- &stc_attr->ste_table.ste); ++ base_id = mlx5hws_pool_get_base_id(pool); + else +- base_id = +- mlx5hws_pool_chunk_get_base_mirror_id(stc_attr->ste_table.ste_pool, +- &stc_attr->ste_table.ste); ++ base_id = mlx5hws_pool_get_base_mirror_id(pool); + + *fixup_stc_attr = *stc_attr; + fixup_stc_attr->ste_table.ste_obj_id = base_id; +@@ -337,7 +336,7 @@ __must_hold(&ctx->ctrl_lock) + if (!mlx5hws_context_cap_dynamic_reparse(ctx)) + stc_attr->reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE; + +- obj_0_id = mlx5hws_pool_chunk_get_base_id(stc_pool, stc); ++ obj_0_id = mlx5hws_pool_get_base_id(stc_pool); + + /* According to table/action limitation change the stc_attr */ + use_fixup = hws_action_fixup_stc_attr(ctx, stc_attr, &fixup_stc_attr, table_type, false); +@@ -353,7 +352,7 @@ __must_hold(&ctx->ctrl_lock) + if (table_type == MLX5HWS_TABLE_TYPE_FDB) { + u32 obj_1_id; + +- obj_1_id = mlx5hws_pool_chunk_get_base_mirror_id(stc_pool, stc); ++ obj_1_id = mlx5hws_pool_get_base_mirror_id(stc_pool); + + use_fixup = hws_action_fixup_stc_attr(ctx, stc_attr, + &fixup_stc_attr, +@@ -393,11 +392,11 @@ __must_hold(&ctx->ctrl_lock) + stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_DROP; + stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT; + stc_attr.stc_offset = stc->offset; +- obj_id = mlx5hws_pool_chunk_get_base_id(stc_pool, stc); ++ obj_id = mlx5hws_pool_get_base_id(stc_pool); + mlx5hws_cmd_stc_modify(ctx->mdev, obj_id, &stc_attr); + + if (table_type == MLX5HWS_TABLE_TYPE_FDB) { +- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(stc_pool, stc); ++ obj_id = mlx5hws_pool_get_base_mirror_id(stc_pool); + mlx5hws_cmd_stc_modify(ctx->mdev, obj_id, &stc_attr); + } + +@@ -1581,7 +1580,6 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + u32 miss_ft_id) + { + struct mlx5hws_cmd_rtc_create_attr rtc_attr = {0}; +- struct mlx5hws_action_default_stc *default_stc; + struct mlx5hws_matcher_action_ste *table_ste; + struct mlx5hws_pool_attr pool_attr = {0}; + struct mlx5hws_pool *ste_pool, *stc_pool; +@@ -1629,7 +1627,7 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + rtc_attr.fw_gen_wqe = true; + rtc_attr.is_scnd_range = true; + +- obj_id = mlx5hws_pool_chunk_get_base_id(ste_pool, ste); ++ obj_id = mlx5hws_pool_get_base_id(ste_pool); + + rtc_attr.pd = ctx->pd_num; + rtc_attr.ste_base = obj_id; +@@ -1639,8 +1637,7 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + + /* STC is a single resource (obj_id), use any STC for the ID */ + stc_pool = ctx->stc_pool; +- default_stc = ctx->common_res.default_stc; +- obj_id = mlx5hws_pool_chunk_get_base_id(stc_pool, &default_stc->default_hit); ++ obj_id = mlx5hws_pool_get_base_id(stc_pool); + rtc_attr.stc_base = obj_id; + + ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_0_id); +@@ -1650,11 +1647,11 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + } + + /* Create mirror RTC */ +- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(ste_pool, ste); ++ obj_id = mlx5hws_pool_get_base_mirror_id(ste_pool); + rtc_attr.ste_base = obj_id; + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(MLX5HWS_TABLE_TYPE_FDB, true); + +- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(stc_pool, &default_stc->default_hit); ++ obj_id = mlx5hws_pool_get_base_mirror_id(stc_pool); + rtc_attr.stc_base = obj_id; + + ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_1_id); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +index 696275fd0ce2..3491408c5d84 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +@@ -118,7 +118,6 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma + { + enum mlx5hws_table_type tbl_type = matcher->tbl->type; + struct mlx5hws_cmd_ft_query_attr ft_attr = {0}; +- struct mlx5hws_pool_chunk *ste; + struct mlx5hws_pool *ste_pool; + u64 icm_addr_0 = 0; + u64 icm_addr_1 = 0; +@@ -134,12 +133,11 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma + matcher->end_ft_id, + matcher->col_matcher ? HWS_PTR_TO_ID(matcher->col_matcher) : 0); + +- ste = &matcher->match_ste.ste; + ste_pool = matcher->match_ste.pool; + if (ste_pool) { +- ste_0_id = mlx5hws_pool_chunk_get_base_id(ste_pool, ste); ++ ste_0_id = mlx5hws_pool_get_base_id(ste_pool); + if (tbl_type == MLX5HWS_TABLE_TYPE_FDB) +- ste_1_id = mlx5hws_pool_chunk_get_base_mirror_id(ste_pool, ste); ++ ste_1_id = mlx5hws_pool_get_base_mirror_id(ste_pool); + } + + seq_printf(f, ",%d,%d,%d,%d", +@@ -148,12 +146,11 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma + matcher->match_ste.rtc_1_id, + (int)ste_1_id); + +- ste = &matcher->action_ste.ste; + ste_pool = matcher->action_ste.pool; + if (ste_pool) { +- ste_0_id = mlx5hws_pool_chunk_get_base_id(ste_pool, ste); ++ ste_0_id = mlx5hws_pool_get_base_id(ste_pool); + if (tbl_type == MLX5HWS_TABLE_TYPE_FDB) +- ste_1_id = mlx5hws_pool_chunk_get_base_mirror_id(ste_pool, ste); ++ ste_1_id = mlx5hws_pool_get_base_mirror_id(ste_pool); + else + ste_1_id = -1; + } else { +@@ -387,14 +384,17 @@ static int hws_debug_dump_context_stc(struct seq_file *f, struct mlx5hws_context + if (!stc_pool) + return 0; + +- if (stc_pool->resource[0]) { +- ret = hws_debug_dump_context_stc_resource(f, ctx, stc_pool->resource[0]); ++ if (stc_pool->resource) { ++ ret = hws_debug_dump_context_stc_resource(f, ctx, ++ stc_pool->resource); + if (ret) + return ret; + } + +- if (stc_pool->mirror_resource[0]) { +- ret = hws_debug_dump_context_stc_resource(f, ctx, stc_pool->mirror_resource[0]); ++ if (stc_pool->mirror_resource) { ++ struct mlx5hws_pool_resource *res = stc_pool->mirror_resource; ++ ++ ret = hws_debug_dump_context_stc_resource(f, ctx, res); + if (ret) + return ret; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index 37a4497048a6..59b14db427b4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -223,7 +223,6 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + struct mlx5hws_cmd_rtc_create_attr rtc_attr = {0}; + struct mlx5hws_match_template *mt = matcher->mt; + struct mlx5hws_context *ctx = matcher->tbl->ctx; +- struct mlx5hws_action_default_stc *default_stc; + struct mlx5hws_matcher_action_ste *action_ste; + struct mlx5hws_table *tbl = matcher->tbl; + struct mlx5hws_pool *ste_pool, *stc_pool; +@@ -305,7 +304,7 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + return -EINVAL; + } + +- obj_id = mlx5hws_pool_chunk_get_base_id(ste_pool, ste); ++ obj_id = mlx5hws_pool_get_base_id(ste_pool); + + rtc_attr.pd = ctx->pd_num; + rtc_attr.ste_base = obj_id; +@@ -316,8 +315,7 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + + /* STC is a single resource (obj_id), use any STC for the ID */ + stc_pool = ctx->stc_pool; +- default_stc = ctx->common_res.default_stc; +- obj_id = mlx5hws_pool_chunk_get_base_id(stc_pool, &default_stc->default_hit); ++ obj_id = mlx5hws_pool_get_base_id(stc_pool); + rtc_attr.stc_base = obj_id; + + ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_0_id); +@@ -328,11 +326,11 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + } + + if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) { +- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(ste_pool, ste); ++ obj_id = mlx5hws_pool_get_base_mirror_id(ste_pool); + rtc_attr.ste_base = obj_id; + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, true); + +- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(stc_pool, &default_stc->default_hit); ++ obj_id = mlx5hws_pool_get_base_mirror_id(stc_pool); + rtc_attr.stc_base = obj_id; + hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, rtc_type, true); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +index 35ed9bee06a6..0de03e17624c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +@@ -20,15 +20,14 @@ static void hws_pool_free_one_resource(struct mlx5hws_pool_resource *resource) + kfree(resource); + } + +-static void hws_pool_resource_free(struct mlx5hws_pool *pool, +- int resource_idx) ++static void hws_pool_resource_free(struct mlx5hws_pool *pool) + { +- hws_pool_free_one_resource(pool->resource[resource_idx]); +- pool->resource[resource_idx] = NULL; ++ hws_pool_free_one_resource(pool->resource); ++ pool->resource = NULL; + + if (pool->tbl_type == MLX5HWS_TABLE_TYPE_FDB) { +- hws_pool_free_one_resource(pool->mirror_resource[resource_idx]); +- pool->mirror_resource[resource_idx] = NULL; ++ hws_pool_free_one_resource(pool->mirror_resource); ++ pool->mirror_resource = NULL; + } + } + +@@ -78,7 +77,7 @@ hws_pool_create_one_resource(struct mlx5hws_pool *pool, u32 log_range, + } + + static int +-hws_pool_resource_alloc(struct mlx5hws_pool *pool, u32 log_range, int idx) ++hws_pool_resource_alloc(struct mlx5hws_pool *pool, u32 log_range) + { + struct mlx5hws_pool_resource *resource; + u32 fw_ft_type, opt_log_range; +@@ -91,7 +90,7 @@ hws_pool_resource_alloc(struct mlx5hws_pool *pool, u32 log_range, int idx) + return -EINVAL; + } + +- pool->resource[idx] = resource; ++ pool->resource = resource; + + if (pool->tbl_type == MLX5HWS_TABLE_TYPE_FDB) { + struct mlx5hws_pool_resource *mirror_resource; +@@ -102,10 +101,10 @@ hws_pool_resource_alloc(struct mlx5hws_pool *pool, u32 log_range, int idx) + if (!mirror_resource) { + mlx5hws_err(pool->ctx, "Failed allocating mirrored resource\n"); + hws_pool_free_one_resource(resource); +- pool->resource[idx] = NULL; ++ pool->resource = NULL; + return -EINVAL; + } +- pool->mirror_resource[idx] = mirror_resource; ++ pool->mirror_resource = mirror_resource; + } + + return 0; +@@ -129,9 +128,9 @@ static void hws_pool_buddy_db_put_chunk(struct mlx5hws_pool *pool, + { + struct mlx5hws_buddy_mem *buddy; + +- buddy = pool->db.buddy_manager->buddies[chunk->resource_idx]; ++ buddy = pool->db.buddy; + if (!buddy) { +- mlx5hws_err(pool->ctx, "No such buddy (%d)\n", chunk->resource_idx); ++ mlx5hws_err(pool->ctx, "Bad buddy state\n"); + return; + } + +@@ -139,86 +138,50 @@ static void hws_pool_buddy_db_put_chunk(struct mlx5hws_pool *pool, + } + + static struct mlx5hws_buddy_mem * +-hws_pool_buddy_get_next_buddy(struct mlx5hws_pool *pool, int idx, +- u32 order, bool *is_new_buddy) ++hws_pool_buddy_get_buddy(struct mlx5hws_pool *pool, u32 order) + { + static struct mlx5hws_buddy_mem *buddy; + u32 new_buddy_size; + +- buddy = pool->db.buddy_manager->buddies[idx]; ++ buddy = pool->db.buddy; + if (buddy) + return buddy; + + new_buddy_size = max(pool->alloc_log_sz, order); +- *is_new_buddy = true; + buddy = mlx5hws_buddy_create(new_buddy_size); + if (!buddy) { +- mlx5hws_err(pool->ctx, "Failed to create buddy order: %d index: %d\n", +- new_buddy_size, idx); ++ mlx5hws_err(pool->ctx, "Failed to create buddy order: %d\n", ++ new_buddy_size); + return NULL; + } + +- if (hws_pool_resource_alloc(pool, new_buddy_size, idx) != 0) { +- mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d index: %d\n", +- pool->type, new_buddy_size, idx); ++ if (hws_pool_resource_alloc(pool, new_buddy_size) != 0) { ++ mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d\n", ++ pool->type, new_buddy_size); + mlx5hws_buddy_cleanup(buddy); + return NULL; + } + +- pool->db.buddy_manager->buddies[idx] = buddy; ++ pool->db.buddy = buddy; + + return buddy; + } + + static int hws_pool_buddy_get_mem_chunk(struct mlx5hws_pool *pool, + int order, +- u32 *buddy_idx, + int *seg) + { + struct mlx5hws_buddy_mem *buddy; +- bool new_mem = false; +- int ret = 0; +- int i; +- +- *seg = -1; +- +- /* Find the next free place from the buddy array */ +- while (*seg < 0) { +- for (i = 0; i < MLX5HWS_POOL_RESOURCE_ARR_SZ; i++) { +- buddy = hws_pool_buddy_get_next_buddy(pool, i, +- order, +- &new_mem); +- if (!buddy) { +- ret = -ENOMEM; +- goto out; +- } +- +- *seg = mlx5hws_buddy_alloc_mem(buddy, order); +- if (*seg >= 0) +- goto found; +- +- if (pool->flags & MLX5HWS_POOL_FLAGS_ONE_RESOURCE) { +- mlx5hws_err(pool->ctx, +- "Fail to allocate seg for one resource pool\n"); +- ret = -ENOMEM; +- goto out; +- } +- +- if (new_mem) { +- /* We have new memory pool, should be place for us */ +- mlx5hws_err(pool->ctx, +- "No memory for order: %d with buddy no: %d\n", +- order, i); +- ret = -ENOMEM; +- goto out; +- } +- } +- } + +-found: +- *buddy_idx = i; +-out: +- return ret; ++ buddy = hws_pool_buddy_get_buddy(pool, order); ++ if (!buddy) ++ return -ENOMEM; ++ ++ *seg = mlx5hws_buddy_alloc_mem(buddy, order); ++ if (*seg >= 0) ++ return 0; ++ ++ return -ENOMEM; + } + + static int hws_pool_buddy_db_get_chunk(struct mlx5hws_pool *pool, +@@ -226,9 +189,7 @@ static int hws_pool_buddy_db_get_chunk(struct mlx5hws_pool *pool, + { + int ret = 0; + +- /* Go over the buddies and find next free slot */ + ret = hws_pool_buddy_get_mem_chunk(pool, chunk->order, +- &chunk->resource_idx, + &chunk->offset); + if (ret) + mlx5hws_err(pool->ctx, "Failed to get free slot for chunk with order: %d\n", +@@ -240,33 +201,21 @@ static int hws_pool_buddy_db_get_chunk(struct mlx5hws_pool *pool, + static void hws_pool_buddy_db_uninit(struct mlx5hws_pool *pool) + { + struct mlx5hws_buddy_mem *buddy; +- int i; +- +- for (i = 0; i < MLX5HWS_POOL_RESOURCE_ARR_SZ; i++) { +- buddy = pool->db.buddy_manager->buddies[i]; +- if (buddy) { +- mlx5hws_buddy_cleanup(buddy); +- kfree(buddy); +- pool->db.buddy_manager->buddies[i] = NULL; +- } +- } + +- kfree(pool->db.buddy_manager); ++ buddy = pool->db.buddy; ++ if (buddy) { ++ mlx5hws_buddy_cleanup(buddy); ++ kfree(buddy); ++ pool->db.buddy = NULL; ++ } + } + + static int hws_pool_buddy_db_init(struct mlx5hws_pool *pool, u32 log_range) + { +- pool->db.buddy_manager = kzalloc(sizeof(*pool->db.buddy_manager), GFP_KERNEL); +- if (!pool->db.buddy_manager) +- return -ENOMEM; +- + if (pool->flags & MLX5HWS_POOL_FLAGS_ALLOC_MEM_ON_CREATE) { +- bool new_buddy; +- +- if (!hws_pool_buddy_get_next_buddy(pool, 0, log_range, &new_buddy)) { ++ if (!hws_pool_buddy_get_buddy(pool, log_range)) { + mlx5hws_err(pool->ctx, + "Failed allocating memory on create log_sz: %d\n", log_range); +- kfree(pool->db.buddy_manager); + return -ENOMEM; + } + } +@@ -278,14 +227,13 @@ static int hws_pool_buddy_db_init(struct mlx5hws_pool *pool, u32 log_range) + return 0; + } + +-static int hws_pool_create_resource_on_index(struct mlx5hws_pool *pool, +- u32 alloc_size, int idx) ++static int hws_pool_create_resource(struct mlx5hws_pool *pool, u32 alloc_size) + { +- int ret = hws_pool_resource_alloc(pool, alloc_size, idx); ++ int ret = hws_pool_resource_alloc(pool, alloc_size); + + if (ret) { +- mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d index: %d\n", +- pool->type, alloc_size, idx); ++ mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d\n", ++ pool->type, alloc_size); + return ret; + } + +@@ -319,7 +267,7 @@ hws_pool_element_create_new_elem(struct mlx5hws_pool *pool, u32 order) + elem->log_size = alloc_size - order; + } + +- if (hws_pool_create_resource_on_index(pool, alloc_size, 0)) { ++ if (hws_pool_create_resource(pool, alloc_size)) { + mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d\n", + pool->type, alloc_size); + goto free_db; +@@ -355,7 +303,7 @@ static int hws_pool_element_find_seg(struct mlx5hws_pool_elements *elem, int *se + + static int + hws_pool_onesize_element_get_mem_chunk(struct mlx5hws_pool *pool, u32 order, +- u32 *idx, int *seg) ++ int *seg) + { + struct mlx5hws_pool_elements *elem; + +@@ -370,7 +318,6 @@ hws_pool_onesize_element_get_mem_chunk(struct mlx5hws_pool *pool, u32 order, + return -ENOMEM; + } + +- *idx = 0; + elem->num_of_elements++; + return 0; + +@@ -379,21 +326,17 @@ hws_pool_onesize_element_get_mem_chunk(struct mlx5hws_pool *pool, u32 order, + return -ENOMEM; + } + +-static int +-hws_pool_general_element_get_mem_chunk(struct mlx5hws_pool *pool, u32 order, +- u32 *idx, int *seg) ++static int hws_pool_general_element_get_mem_chunk(struct mlx5hws_pool *pool, ++ u32 order, int *seg) + { +- int ret, i; +- +- for (i = 0; i < MLX5HWS_POOL_RESOURCE_ARR_SZ; i++) { +- if (!pool->resource[i]) { +- ret = hws_pool_create_resource_on_index(pool, order, i); +- if (ret) +- goto err_no_res; +- *idx = i; +- *seg = 0; /* One memory slot in that element */ +- return 0; +- } ++ int ret; ++ ++ if (!pool->resource) { ++ ret = hws_pool_create_resource(pool, order); ++ if (ret) ++ goto err_no_res; ++ *seg = 0; /* One memory slot in that element */ ++ return 0; + } + + mlx5hws_err(pool->ctx, "No more resources (last request order: %d)\n", order); +@@ -409,9 +352,7 @@ static int hws_pool_general_element_db_get_chunk(struct mlx5hws_pool *pool, + { + int ret; + +- /* Go over all memory elements and find/allocate free slot */ + ret = hws_pool_general_element_get_mem_chunk(pool, chunk->order, +- &chunk->resource_idx, + &chunk->offset); + if (ret) + mlx5hws_err(pool->ctx, "Failed to get free slot for chunk with order: %d\n", +@@ -423,11 +364,8 @@ static int hws_pool_general_element_db_get_chunk(struct mlx5hws_pool *pool, + static void hws_pool_general_element_db_put_chunk(struct mlx5hws_pool *pool, + struct mlx5hws_pool_chunk *chunk) + { +- if (unlikely(!pool->resource[chunk->resource_idx])) +- pr_warn("HWS: invalid resource with index %d\n", chunk->resource_idx); +- + if (pool->flags & MLX5HWS_POOL_FLAGS_RELEASE_FREE_RESOURCE) +- hws_pool_resource_free(pool, chunk->resource_idx); ++ hws_pool_resource_free(pool); + } + + static void hws_pool_general_element_db_uninit(struct mlx5hws_pool *pool) +@@ -455,7 +393,7 @@ static void + hws_onesize_element_db_destroy_element(struct mlx5hws_pool *pool, + struct mlx5hws_pool_elements *elem) + { +- hws_pool_resource_free(pool, 0); ++ hws_pool_resource_free(pool); + bitmap_free(elem->bitmap); + kfree(elem); + pool->db.element = NULL; +@@ -466,12 +404,9 @@ static void hws_onesize_element_db_put_chunk(struct mlx5hws_pool *pool, + { + struct mlx5hws_pool_elements *elem; + +- if (unlikely(chunk->resource_idx)) +- pr_warn("HWS: invalid resource with index %d\n", chunk->resource_idx); +- + elem = pool->db.element; + if (!elem) { +- mlx5hws_err(pool->ctx, "No such element (%d)\n", chunk->resource_idx); ++ mlx5hws_err(pool->ctx, "Pool element was not allocated\n"); + return; + } + +@@ -489,9 +424,7 @@ static int hws_onesize_element_db_get_chunk(struct mlx5hws_pool *pool, + { + int ret = 0; + +- /* Go over all memory elements and find/allocate free slot */ + ret = hws_pool_onesize_element_get_mem_chunk(pool, chunk->order, +- &chunk->resource_idx, + &chunk->offset); + if (ret) + mlx5hws_err(pool->ctx, "Failed to get free slot for chunk with order: %d\n", +@@ -614,13 +547,10 @@ mlx5hws_pool_create(struct mlx5hws_context *ctx, struct mlx5hws_pool_attr *pool_ + + int mlx5hws_pool_destroy(struct mlx5hws_pool *pool) + { +- int i; +- + mutex_destroy(&pool->lock); + +- for (i = 0; i < MLX5HWS_POOL_RESOURCE_ARR_SZ; i++) +- if (pool->resource[i]) +- hws_pool_resource_free(pool, i); ++ if (pool->resource) ++ hws_pool_resource_free(pool); + + hws_pool_db_unint(pool); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +index f4258f83fdbf..112a61cd2997 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +@@ -6,16 +6,12 @@ + + #define MLX5HWS_POOL_STC_LOG_SZ 15 + +-#define MLX5HWS_POOL_RESOURCE_ARR_SZ 100 +- + enum mlx5hws_pool_type { + MLX5HWS_POOL_TYPE_STE, + MLX5HWS_POOL_TYPE_STC, + }; + + struct mlx5hws_pool_chunk { +- u32 resource_idx; +- /* Internal offset, relative to base index */ + int offset; + int order; + }; +@@ -72,14 +68,10 @@ enum mlx5hws_db_type { + MLX5HWS_POOL_DB_TYPE_GENERAL_SIZE, + /* One resource only, all the elements are with same one size */ + MLX5HWS_POOL_DB_TYPE_ONE_SIZE_RESOURCE, +- /* Many resources, the memory allocated with buddy mechanism */ ++ /* Entries are managed using a buddy mechanism. */ + MLX5HWS_POOL_DB_TYPE_BUDDY, + }; + +-struct mlx5hws_buddy_manager { +- struct mlx5hws_buddy_mem *buddies[MLX5HWS_POOL_RESOURCE_ARR_SZ]; +-}; +- + struct mlx5hws_pool_elements { + u32 num_of_elements; + unsigned long *bitmap; +@@ -91,7 +83,7 @@ struct mlx5hws_pool_db { + enum mlx5hws_db_type type; + union { + struct mlx5hws_pool_elements *element; +- struct mlx5hws_buddy_manager *buddy_manager; ++ struct mlx5hws_buddy_mem *buddy; + }; + }; + +@@ -109,8 +101,8 @@ struct mlx5hws_pool { + size_t alloc_log_sz; + enum mlx5hws_table_type tbl_type; + enum mlx5hws_pool_optimize opt_type; +- struct mlx5hws_pool_resource *resource[MLX5HWS_POOL_RESOURCE_ARR_SZ]; +- struct mlx5hws_pool_resource *mirror_resource[MLX5HWS_POOL_RESOURCE_ARR_SZ]; ++ struct mlx5hws_pool_resource *resource; ++ struct mlx5hws_pool_resource *mirror_resource; + /* DB */ + struct mlx5hws_pool_db db; + /* Functions */ +@@ -131,17 +123,13 @@ int mlx5hws_pool_chunk_alloc(struct mlx5hws_pool *pool, + void mlx5hws_pool_chunk_free(struct mlx5hws_pool *pool, + struct mlx5hws_pool_chunk *chunk); + +-static inline u32 +-mlx5hws_pool_chunk_get_base_id(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) ++static inline u32 mlx5hws_pool_get_base_id(struct mlx5hws_pool *pool) + { +- return pool->resource[chunk->resource_idx]->base_id; ++ return pool->resource->base_id; + } + +-static inline u32 +-mlx5hws_pool_chunk_get_base_mirror_id(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) ++static inline u32 mlx5hws_pool_get_base_mirror_id(struct mlx5hws_pool *pool) + { +- return pool->mirror_resource[chunk->resource_idx]->base_id; ++ return pool->mirror_resource->base_id; + } + #endif /* MLX5HWS_POOL_H_ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1349-net-mlx5-hws-refactor-pool-implementation.patch b/SOURCES/1349-net-mlx5-hws-refactor-pool-implementation.patch new file mode 100644 index 000000000..853be9ec6 --- /dev/null +++ b/SOURCES/1349-net-mlx5-hws-refactor-pool-implementation.patch @@ -0,0 +1,760 @@ +From ac067697d1ab53a4dceeec03736f9b8bf2363665 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:41:59 -0400 +Subject: [PATCH] net/mlx5: HWS, Refactor pool implementation + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d171ce3d988868bed9dc3c9eeb8428f87dd9ac85 +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:34 2025 +0300 + + net/mlx5: HWS, Refactor pool implementation + + Refactor the pool implementation to remove unused flags and clarify its + usage. A pool represents a single range of STEs or STCs which are + allocated at pool creation time. + + Pools are used under three patterns: + + 1. STCs are allocated one at a time from a global pool using a bitmap + based implementation. + + 2. Action STEs are allocated in power-of-two blocks using a buddy + algorithm. + + 3. Match STEs do not use allocation, since insertion into these tables + is based on hashes or direct addressing. In such cases we use a pool + only to create the STE range. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index 781ba8c4f733..39904b337b81 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1602,7 +1602,6 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + + pool_attr.table_type = MLX5HWS_TABLE_TYPE_FDB; + pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE; +- pool_attr.flags = MLX5HWS_POOL_FLAGS_FOR_STE_ACTION_POOL; + pool_attr.alloc_log_sz = 1; + table_ste->pool = mlx5hws_pool_create(ctx, &pool_attr); + if (!table_ste->pool) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c +index 9cda2774fd64..b7cb736b74d7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c +@@ -34,7 +34,6 @@ static int hws_context_pools_init(struct mlx5hws_context *ctx) + + /* Create an STC pool per FT type */ + pool_attr.pool_type = MLX5HWS_POOL_TYPE_STC; +- pool_attr.flags = MLX5HWS_POOL_FLAGS_FOR_STC_POOL; + max_log_sz = min(MLX5HWS_POOL_STC_LOG_SZ, ctx->caps->stc_alloc_log_max); + pool_attr.alloc_log_sz = max(max_log_sz, ctx->caps->stc_alloc_log_gran); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index 59b14db427b4..95d31fd6c976 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -265,14 +265,6 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + rtc_attr.match_definer_0 = ctx->caps->linear_match_definer; + } + } +- +- /* Match pool requires implicit allocation */ +- ret = mlx5hws_pool_chunk_alloc(ste_pool, ste); +- if (ret) { +- mlx5hws_err(ctx, "Failed to allocate STE for %s RTC", +- hws_matcher_rtc_type_to_str(rtc_type)); +- return ret; +- } + break; + + case HWS_MATCHER_RTC_TYPE_STE_ARRAY: +@@ -357,23 +349,17 @@ static void hws_matcher_destroy_rtc(struct mlx5hws_matcher *matcher, + { + struct mlx5hws_matcher_action_ste *action_ste; + struct mlx5hws_table *tbl = matcher->tbl; +- struct mlx5hws_pool_chunk *ste; +- struct mlx5hws_pool *ste_pool; + u32 rtc_0_id, rtc_1_id; + + switch (rtc_type) { + case HWS_MATCHER_RTC_TYPE_MATCH: + rtc_0_id = matcher->match_ste.rtc_0_id; + rtc_1_id = matcher->match_ste.rtc_1_id; +- ste_pool = matcher->match_ste.pool; +- ste = &matcher->match_ste.ste; + break; + case HWS_MATCHER_RTC_TYPE_STE_ARRAY: + action_ste = &matcher->action_ste; + rtc_0_id = action_ste->rtc_0_id; + rtc_1_id = action_ste->rtc_1_id; +- ste_pool = action_ste->pool; +- ste = &action_ste->ste; + break; + default: + return; +@@ -383,8 +369,6 @@ static void hws_matcher_destroy_rtc(struct mlx5hws_matcher *matcher, + mlx5hws_cmd_rtc_destroy(matcher->tbl->ctx->mdev, rtc_1_id); + + mlx5hws_cmd_rtc_destroy(matcher->tbl->ctx->mdev, rtc_0_id); +- if (rtc_type == HWS_MATCHER_RTC_TYPE_MATCH) +- mlx5hws_pool_chunk_free(ste_pool, ste); + } + + static int +@@ -557,7 +541,7 @@ static int hws_matcher_bind_at(struct mlx5hws_matcher *matcher) + /* Allocate action STE mempool */ + pool_attr.table_type = tbl->type; + pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE; +- pool_attr.flags = MLX5HWS_POOL_FLAGS_FOR_STE_ACTION_POOL; ++ pool_attr.flags = MLX5HWS_POOL_FLAG_BUDDY; + /* Pool size is similar to action RTC size */ + pool_attr.alloc_log_sz = ilog2(roundup_pow_of_two(action_ste->max_stes)) + + matcher->attr.table.sz_row_log + +@@ -636,7 +620,6 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + /* Create an STE pool per matcher*/ + pool_attr.table_type = matcher->tbl->type; + pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE; +- pool_attr.flags = MLX5HWS_POOL_FLAGS_FOR_MATCHER_STE_POOL; + pool_attr.alloc_log_sz = matcher->attr.table.sz_col_log + + matcher->attr.table.sz_row_log; + hws_matcher_set_pool_attr(&pool_attr, matcher); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +index 0de03e17624c..270b333faab3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +@@ -60,10 +60,8 @@ hws_pool_create_one_resource(struct mlx5hws_pool *pool, u32 log_range, + ret = -EINVAL; + } + +- if (ret) { +- mlx5hws_err(pool->ctx, "Failed to allocate resource objects\n"); ++ if (ret) + goto free_resource; +- } + + resource->pool = pool; + resource->range = 1 << log_range; +@@ -76,17 +74,17 @@ hws_pool_create_one_resource(struct mlx5hws_pool *pool, u32 log_range, + return NULL; + } + +-static int +-hws_pool_resource_alloc(struct mlx5hws_pool *pool, u32 log_range) ++static int hws_pool_resource_alloc(struct mlx5hws_pool *pool) + { + struct mlx5hws_pool_resource *resource; + u32 fw_ft_type, opt_log_range; + + fw_ft_type = mlx5hws_table_get_res_fw_ft_type(pool->tbl_type, false); +- opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_ORIG ? 0 : log_range; ++ opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_ORIG ? ++ 0 : pool->alloc_log_sz; + resource = hws_pool_create_one_resource(pool, opt_log_range, fw_ft_type); + if (!resource) { +- mlx5hws_err(pool->ctx, "Failed allocating resource\n"); ++ mlx5hws_err(pool->ctx, "Failed to allocate resource\n"); + return -EINVAL; + } + +@@ -96,10 +94,11 @@ hws_pool_resource_alloc(struct mlx5hws_pool *pool, u32 log_range) + struct mlx5hws_pool_resource *mirror_resource; + + fw_ft_type = mlx5hws_table_get_res_fw_ft_type(pool->tbl_type, true); +- opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_MIRROR ? 0 : log_range; ++ opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_MIRROR ? ++ 0 : pool->alloc_log_sz; + mirror_resource = hws_pool_create_one_resource(pool, opt_log_range, fw_ft_type); + if (!mirror_resource) { +- mlx5hws_err(pool->ctx, "Failed allocating mirrored resource\n"); ++ mlx5hws_err(pool->ctx, "Failed to allocate mirrored resource\n"); + hws_pool_free_one_resource(resource); + pool->resource = NULL; + return -EINVAL; +@@ -110,92 +109,58 @@ hws_pool_resource_alloc(struct mlx5hws_pool *pool, u32 log_range) + return 0; + } + +-static unsigned long *hws_pool_create_and_init_bitmap(u32 log_range) +-{ +- unsigned long *cur_bmp; +- +- cur_bmp = bitmap_zalloc(1 << log_range, GFP_KERNEL); +- if (!cur_bmp) +- return NULL; +- +- bitmap_fill(cur_bmp, 1 << log_range); +- +- return cur_bmp; +-} +- +-static void hws_pool_buddy_db_put_chunk(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) ++static int hws_pool_buddy_init(struct mlx5hws_pool *pool) + { + struct mlx5hws_buddy_mem *buddy; + +- buddy = pool->db.buddy; ++ buddy = mlx5hws_buddy_create(pool->alloc_log_sz); + if (!buddy) { +- mlx5hws_err(pool->ctx, "Bad buddy state\n"); +- return; +- } +- +- mlx5hws_buddy_free_mem(buddy, chunk->offset, chunk->order); +-} +- +-static struct mlx5hws_buddy_mem * +-hws_pool_buddy_get_buddy(struct mlx5hws_pool *pool, u32 order) +-{ +- static struct mlx5hws_buddy_mem *buddy; +- u32 new_buddy_size; +- +- buddy = pool->db.buddy; +- if (buddy) +- return buddy; +- +- new_buddy_size = max(pool->alloc_log_sz, order); +- buddy = mlx5hws_buddy_create(new_buddy_size); +- if (!buddy) { +- mlx5hws_err(pool->ctx, "Failed to create buddy order: %d\n", +- new_buddy_size); +- return NULL; ++ mlx5hws_err(pool->ctx, "Failed to create buddy order: %zu\n", ++ pool->alloc_log_sz); ++ return -ENOMEM; + } + +- if (hws_pool_resource_alloc(pool, new_buddy_size) != 0) { +- mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d\n", +- pool->type, new_buddy_size); ++ if (hws_pool_resource_alloc(pool) != 0) { ++ mlx5hws_err(pool->ctx, "Failed to create resource type: %d size %zu\n", ++ pool->type, pool->alloc_log_sz); + mlx5hws_buddy_cleanup(buddy); +- return NULL; ++ return -ENOMEM; + } + + pool->db.buddy = buddy; + +- return buddy; ++ return 0; + } + +-static int hws_pool_buddy_get_mem_chunk(struct mlx5hws_pool *pool, +- int order, +- int *seg) ++static int hws_pool_buddy_db_get_chunk(struct mlx5hws_pool *pool, ++ struct mlx5hws_pool_chunk *chunk) + { +- struct mlx5hws_buddy_mem *buddy; ++ struct mlx5hws_buddy_mem *buddy = pool->db.buddy; + +- buddy = hws_pool_buddy_get_buddy(pool, order); +- if (!buddy) +- return -ENOMEM; ++ if (!buddy) { ++ mlx5hws_err(pool->ctx, "Bad buddy state\n"); ++ return -EINVAL; ++ } + +- *seg = mlx5hws_buddy_alloc_mem(buddy, order); +- if (*seg >= 0) ++ chunk->offset = mlx5hws_buddy_alloc_mem(buddy, chunk->order); ++ if (chunk->offset >= 0) + return 0; + + return -ENOMEM; + } + +-static int hws_pool_buddy_db_get_chunk(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) ++static void hws_pool_buddy_db_put_chunk(struct mlx5hws_pool *pool, ++ struct mlx5hws_pool_chunk *chunk) + { +- int ret = 0; ++ struct mlx5hws_buddy_mem *buddy; + +- ret = hws_pool_buddy_get_mem_chunk(pool, chunk->order, +- &chunk->offset); +- if (ret) +- mlx5hws_err(pool->ctx, "Failed to get free slot for chunk with order: %d\n", +- chunk->order); ++ buddy = pool->db.buddy; ++ if (!buddy) { ++ mlx5hws_err(pool->ctx, "Bad buddy state\n"); ++ return; ++ } + +- return ret; ++ mlx5hws_buddy_free_mem(buddy, chunk->offset, chunk->order); + } + + static void hws_pool_buddy_db_uninit(struct mlx5hws_pool *pool) +@@ -210,15 +175,13 @@ static void hws_pool_buddy_db_uninit(struct mlx5hws_pool *pool) + } + } + +-static int hws_pool_buddy_db_init(struct mlx5hws_pool *pool, u32 log_range) ++static int hws_pool_buddy_db_init(struct mlx5hws_pool *pool) + { +- if (pool->flags & MLX5HWS_POOL_FLAGS_ALLOC_MEM_ON_CREATE) { +- if (!hws_pool_buddy_get_buddy(pool, log_range)) { +- mlx5hws_err(pool->ctx, +- "Failed allocating memory on create log_sz: %d\n", log_range); +- return -ENOMEM; +- } +- } ++ int ret; ++ ++ ret = hws_pool_buddy_init(pool); ++ if (ret) ++ return ret; + + pool->p_db_uninit = &hws_pool_buddy_db_uninit; + pool->p_get_chunk = &hws_pool_buddy_db_get_chunk; +@@ -227,234 +190,105 @@ static int hws_pool_buddy_db_init(struct mlx5hws_pool *pool, u32 log_range) + return 0; + } + +-static int hws_pool_create_resource(struct mlx5hws_pool *pool, u32 alloc_size) +-{ +- int ret = hws_pool_resource_alloc(pool, alloc_size); +- +- if (ret) { +- mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d\n", +- pool->type, alloc_size); +- return ret; +- } +- +- return 0; +-} +- +-static struct mlx5hws_pool_elements * +-hws_pool_element_create_new_elem(struct mlx5hws_pool *pool, u32 order) ++static unsigned long *hws_pool_create_and_init_bitmap(u32 log_range) + { +- struct mlx5hws_pool_elements *elem; +- u32 alloc_size; +- +- alloc_size = pool->alloc_log_sz; ++ unsigned long *bitmap; + +- elem = kzalloc(sizeof(*elem), GFP_KERNEL); +- if (!elem) ++ bitmap = bitmap_zalloc(1 << log_range, GFP_KERNEL); ++ if (!bitmap) + return NULL; + +- /* Sharing the same resource, also means that all the elements are with size 1 */ +- if ((pool->flags & MLX5HWS_POOL_FLAGS_FIXED_SIZE_OBJECTS) && +- !(pool->flags & MLX5HWS_POOL_FLAGS_RESOURCE_PER_CHUNK)) { +- /* Currently all chunks in size 1 */ +- elem->bitmap = hws_pool_create_and_init_bitmap(alloc_size - order); +- if (!elem->bitmap) { +- mlx5hws_err(pool->ctx, +- "Failed to create bitmap type: %d: size %d\n", +- pool->type, alloc_size); +- goto free_elem; +- } +- +- elem->log_size = alloc_size - order; +- } +- +- if (hws_pool_create_resource(pool, alloc_size)) { +- mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %d\n", +- pool->type, alloc_size); +- goto free_db; +- } +- +- pool->db.element = elem; ++ bitmap_fill(bitmap, 1 << log_range); + +- return elem; +- +-free_db: +- bitmap_free(elem->bitmap); +-free_elem: +- kfree(elem); +- return NULL; ++ return bitmap; + } + +-static int hws_pool_element_find_seg(struct mlx5hws_pool_elements *elem, int *seg) ++static int hws_pool_bitmap_init(struct mlx5hws_pool *pool) + { +- unsigned int segment, size; ++ unsigned long *bitmap; + +- size = 1 << elem->log_size; +- +- segment = find_first_bit(elem->bitmap, size); +- if (segment >= size) { +- elem->is_full = true; ++ bitmap = hws_pool_create_and_init_bitmap(pool->alloc_log_sz); ++ if (!bitmap) { ++ mlx5hws_err(pool->ctx, "Failed to create bitmap order: %zu\n", ++ pool->alloc_log_sz); + return -ENOMEM; + } + +- bitmap_clear(elem->bitmap, segment, 1); +- *seg = segment; +- return 0; +-} +- +-static int +-hws_pool_onesize_element_get_mem_chunk(struct mlx5hws_pool *pool, u32 order, +- int *seg) +-{ +- struct mlx5hws_pool_elements *elem; +- +- elem = pool->db.element; +- if (!elem) +- elem = hws_pool_element_create_new_elem(pool, order); +- if (!elem) +- goto err_no_elem; +- +- if (hws_pool_element_find_seg(elem, seg) != 0) { +- mlx5hws_err(pool->ctx, "No more resources (last request order: %d)\n", order); ++ if (hws_pool_resource_alloc(pool) != 0) { ++ mlx5hws_err(pool->ctx, "Failed to create resource type: %d: size %zu\n", ++ pool->type, pool->alloc_log_sz); ++ bitmap_free(bitmap); + return -ENOMEM; + } + +- elem->num_of_elements++; +- return 0; ++ pool->db.bitmap = bitmap; + +-err_no_elem: +- mlx5hws_err(pool->ctx, "Failed to allocate element for order: %d\n", order); +- return -ENOMEM; ++ return 0; + } + +-static int hws_pool_general_element_get_mem_chunk(struct mlx5hws_pool *pool, +- u32 order, int *seg) ++static int hws_pool_bitmap_db_get_chunk(struct mlx5hws_pool *pool, ++ struct mlx5hws_pool_chunk *chunk) + { +- int ret; ++ unsigned long *bitmap, size; + +- if (!pool->resource) { +- ret = hws_pool_create_resource(pool, order); +- if (ret) +- goto err_no_res; +- *seg = 0; /* One memory slot in that element */ +- return 0; ++ if (chunk->order != 0) { ++ mlx5hws_err(pool->ctx, "Pool only supports order 0 allocs\n"); ++ return -EINVAL; + } + +- mlx5hws_err(pool->ctx, "No more resources (last request order: %d)\n", order); +- return -ENOMEM; +- +-err_no_res: +- mlx5hws_err(pool->ctx, "Failed to allocate element for order: %d\n", order); +- return -ENOMEM; +-} +- +-static int hws_pool_general_element_db_get_chunk(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) +-{ +- int ret; +- +- ret = hws_pool_general_element_get_mem_chunk(pool, chunk->order, +- &chunk->offset); +- if (ret) +- mlx5hws_err(pool->ctx, "Failed to get free slot for chunk with order: %d\n", +- chunk->order); +- +- return ret; +-} ++ bitmap = pool->db.bitmap; ++ if (!bitmap) { ++ mlx5hws_err(pool->ctx, "Bad bitmap state\n"); ++ return -EINVAL; ++ } + +-static void hws_pool_general_element_db_put_chunk(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) +-{ +- if (pool->flags & MLX5HWS_POOL_FLAGS_RELEASE_FREE_RESOURCE) +- hws_pool_resource_free(pool); +-} ++ size = 1 << pool->alloc_log_sz; + +-static void hws_pool_general_element_db_uninit(struct mlx5hws_pool *pool) +-{ +- (void)pool; +-} ++ chunk->offset = find_first_bit(bitmap, size); ++ if (chunk->offset >= size) ++ return -ENOMEM; + +-/* This memory management works as the following: +- * - At start doesn't allocate no mem at all. +- * - When new request for chunk arrived: +- * allocate resource and give it. +- * - When free that chunk: +- * the resource is freed. +- */ +-static int hws_pool_general_element_db_init(struct mlx5hws_pool *pool) +-{ +- pool->p_db_uninit = &hws_pool_general_element_db_uninit; +- pool->p_get_chunk = &hws_pool_general_element_db_get_chunk; +- pool->p_put_chunk = &hws_pool_general_element_db_put_chunk; ++ bitmap_clear(bitmap, chunk->offset, 1); + + return 0; + } + +-static void +-hws_onesize_element_db_destroy_element(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_elements *elem) +-{ +- hws_pool_resource_free(pool); +- bitmap_free(elem->bitmap); +- kfree(elem); +- pool->db.element = NULL; +-} +- +-static void hws_onesize_element_db_put_chunk(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) ++static void hws_pool_bitmap_db_put_chunk(struct mlx5hws_pool *pool, ++ struct mlx5hws_pool_chunk *chunk) + { +- struct mlx5hws_pool_elements *elem; ++ unsigned long *bitmap; + +- elem = pool->db.element; +- if (!elem) { +- mlx5hws_err(pool->ctx, "Pool element was not allocated\n"); ++ bitmap = pool->db.bitmap; ++ if (!bitmap) { ++ mlx5hws_err(pool->ctx, "Bad bitmap state\n"); + return; + } + +- bitmap_set(elem->bitmap, chunk->offset, 1); +- elem->is_full = false; +- elem->num_of_elements--; +- +- if (pool->flags & MLX5HWS_POOL_FLAGS_RELEASE_FREE_RESOURCE && +- !elem->num_of_elements) +- hws_onesize_element_db_destroy_element(pool, elem); ++ bitmap_set(bitmap, chunk->offset, 1); + } + +-static int hws_onesize_element_db_get_chunk(struct mlx5hws_pool *pool, +- struct mlx5hws_pool_chunk *chunk) ++static void hws_pool_bitmap_db_uninit(struct mlx5hws_pool *pool) + { +- int ret = 0; +- +- ret = hws_pool_onesize_element_get_mem_chunk(pool, chunk->order, +- &chunk->offset); +- if (ret) +- mlx5hws_err(pool->ctx, "Failed to get free slot for chunk with order: %d\n", +- chunk->order); ++ unsigned long *bitmap; + +- return ret; ++ bitmap = pool->db.bitmap; ++ if (bitmap) { ++ bitmap_free(bitmap); ++ pool->db.bitmap = NULL; ++ } + } + +-static void hws_onesize_element_db_uninit(struct mlx5hws_pool *pool) ++static int hws_pool_bitmap_db_init(struct mlx5hws_pool *pool) + { +- struct mlx5hws_pool_elements *elem = pool->db.element; ++ int ret; + +- if (elem) { +- bitmap_free(elem->bitmap); +- kfree(elem); +- pool->db.element = NULL; +- } +-} ++ ret = hws_pool_bitmap_init(pool); ++ if (ret) ++ return ret; + +-/* This memory management works as the following: +- * - At start doesn't allocate no mem at all. +- * - When new request for chunk arrived: +- * aloocate the first and only slot of memory/resource +- * when it ended return error. +- */ +-static int hws_pool_onesize_element_db_init(struct mlx5hws_pool *pool) +-{ +- pool->p_db_uninit = &hws_onesize_element_db_uninit; +- pool->p_get_chunk = &hws_onesize_element_db_get_chunk; +- pool->p_put_chunk = &hws_onesize_element_db_put_chunk; ++ pool->p_db_uninit = &hws_pool_bitmap_db_uninit; ++ pool->p_get_chunk = &hws_pool_bitmap_db_get_chunk; ++ pool->p_put_chunk = &hws_pool_bitmap_db_put_chunk; + + return 0; + } +@@ -464,15 +298,14 @@ static int hws_pool_db_init(struct mlx5hws_pool *pool, + { + int ret; + +- if (db_type == MLX5HWS_POOL_DB_TYPE_GENERAL_SIZE) +- ret = hws_pool_general_element_db_init(pool); +- else if (db_type == MLX5HWS_POOL_DB_TYPE_ONE_SIZE_RESOURCE) +- ret = hws_pool_onesize_element_db_init(pool); ++ if (db_type == MLX5HWS_POOL_DB_TYPE_BITMAP) ++ ret = hws_pool_bitmap_db_init(pool); + else +- ret = hws_pool_buddy_db_init(pool, pool->alloc_log_sz); ++ ret = hws_pool_buddy_db_init(pool); + + if (ret) { +- mlx5hws_err(pool->ctx, "Failed to init general db : %d (ret: %d)\n", db_type, ret); ++ mlx5hws_err(pool->ctx, "Failed to init pool type: %d (ret: %d)\n", ++ db_type, ret); + return ret; + } + +@@ -521,15 +354,10 @@ mlx5hws_pool_create(struct mlx5hws_context *ctx, struct mlx5hws_pool_attr *pool_ + pool->tbl_type = pool_attr->table_type; + pool->opt_type = pool_attr->opt_type; + +- /* Support general db */ +- if (pool->flags == (MLX5HWS_POOL_FLAGS_RELEASE_FREE_RESOURCE | +- MLX5HWS_POOL_FLAGS_RESOURCE_PER_CHUNK)) +- res_db_type = MLX5HWS_POOL_DB_TYPE_GENERAL_SIZE; +- else if (pool->flags == (MLX5HWS_POOL_FLAGS_ONE_RESOURCE | +- MLX5HWS_POOL_FLAGS_FIXED_SIZE_OBJECTS)) +- res_db_type = MLX5HWS_POOL_DB_TYPE_ONE_SIZE_RESOURCE; +- else ++ if (pool->flags & MLX5HWS_POOL_FLAG_BUDDY) + res_db_type = MLX5HWS_POOL_DB_TYPE_BUDDY; ++ else ++ res_db_type = MLX5HWS_POOL_DB_TYPE_BITMAP; + + pool->alloc_log_sz = pool_attr->alloc_log_sz; + +@@ -545,7 +373,7 @@ mlx5hws_pool_create(struct mlx5hws_context *ctx, struct mlx5hws_pool_attr *pool_ + return NULL; + } + +-int mlx5hws_pool_destroy(struct mlx5hws_pool *pool) ++void mlx5hws_pool_destroy(struct mlx5hws_pool *pool) + { + mutex_destroy(&pool->lock); + +@@ -555,5 +383,4 @@ int mlx5hws_pool_destroy(struct mlx5hws_pool *pool) + hws_pool_db_unint(pool); + + kfree(pool); +- return 0; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +index 112a61cd2997..9a781a87f097 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +@@ -23,29 +23,10 @@ struct mlx5hws_pool_resource { + }; + + enum mlx5hws_pool_flags { +- /* Only a one resource in that pool */ +- MLX5HWS_POOL_FLAGS_ONE_RESOURCE = 1 << 0, +- MLX5HWS_POOL_FLAGS_RELEASE_FREE_RESOURCE = 1 << 1, +- /* No sharing resources between chunks */ +- MLX5HWS_POOL_FLAGS_RESOURCE_PER_CHUNK = 1 << 2, +- /* All objects are in the same size */ +- MLX5HWS_POOL_FLAGS_FIXED_SIZE_OBJECTS = 1 << 3, +- /* Managed by buddy allocator */ +- MLX5HWS_POOL_FLAGS_BUDDY_MANAGED = 1 << 4, +- /* Allocate pool_type memory on pool creation */ +- MLX5HWS_POOL_FLAGS_ALLOC_MEM_ON_CREATE = 1 << 5, +- +- /* These values should be used by the caller */ +- MLX5HWS_POOL_FLAGS_FOR_STC_POOL = +- MLX5HWS_POOL_FLAGS_ONE_RESOURCE | +- MLX5HWS_POOL_FLAGS_FIXED_SIZE_OBJECTS, +- MLX5HWS_POOL_FLAGS_FOR_MATCHER_STE_POOL = +- MLX5HWS_POOL_FLAGS_RELEASE_FREE_RESOURCE | +- MLX5HWS_POOL_FLAGS_RESOURCE_PER_CHUNK, +- MLX5HWS_POOL_FLAGS_FOR_STE_ACTION_POOL = +- MLX5HWS_POOL_FLAGS_ONE_RESOURCE | +- MLX5HWS_POOL_FLAGS_BUDDY_MANAGED | +- MLX5HWS_POOL_FLAGS_ALLOC_MEM_ON_CREATE, ++ /* Managed by a buddy allocator. If this is not set only allocations of ++ * order 0 are supported. ++ */ ++ MLX5HWS_POOL_FLAG_BUDDY = BIT(0), + }; + + enum mlx5hws_pool_optimize { +@@ -64,25 +45,16 @@ struct mlx5hws_pool_attr { + }; + + enum mlx5hws_db_type { +- /* Uses for allocating chunk of big memory, each element has its own resource in the FW*/ +- MLX5HWS_POOL_DB_TYPE_GENERAL_SIZE, +- /* One resource only, all the elements are with same one size */ +- MLX5HWS_POOL_DB_TYPE_ONE_SIZE_RESOURCE, ++ /* Uses a bitmap, supports only allocations of order 0. */ ++ MLX5HWS_POOL_DB_TYPE_BITMAP, + /* Entries are managed using a buddy mechanism. */ + MLX5HWS_POOL_DB_TYPE_BUDDY, + }; + +-struct mlx5hws_pool_elements { +- u32 num_of_elements; +- unsigned long *bitmap; +- u32 log_size; +- bool is_full; +-}; +- + struct mlx5hws_pool_db { + enum mlx5hws_db_type type; + union { +- struct mlx5hws_pool_elements *element; ++ unsigned long *bitmap; + struct mlx5hws_buddy_mem *buddy; + }; + }; +@@ -103,7 +75,6 @@ struct mlx5hws_pool { + enum mlx5hws_pool_optimize opt_type; + struct mlx5hws_pool_resource *resource; + struct mlx5hws_pool_resource *mirror_resource; +- /* DB */ + struct mlx5hws_pool_db db; + /* Functions */ + mlx5hws_pool_unint_db p_db_uninit; +@@ -115,7 +86,7 @@ struct mlx5hws_pool * + mlx5hws_pool_create(struct mlx5hws_context *ctx, + struct mlx5hws_pool_attr *pool_attr); + +-int mlx5hws_pool_destroy(struct mlx5hws_pool *pool); ++void mlx5hws_pool_destroy(struct mlx5hws_pool *pool); + + int mlx5hws_pool_chunk_alloc(struct mlx5hws_pool *pool, + struct mlx5hws_pool_chunk *chunk); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1350-net-mlx5-hws-cleanup-after-pool-refactoring.patch b/SOURCES/1350-net-mlx5-hws-cleanup-after-pool-refactoring.patch new file mode 100644 index 000000000..acaf23e50 --- /dev/null +++ b/SOURCES/1350-net-mlx5-hws-cleanup-after-pool-refactoring.patch @@ -0,0 +1,265 @@ +From 333144760a660c248b241cf555a88ed2447c29b1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:41:59 -0400 +Subject: [PATCH] net/mlx5: HWS, Cleanup after pool refactoring + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 43a2038c6d8a810e8e70f0e7fcb965f431c92bfb +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:35 2025 +0300 + + net/mlx5: HWS, Cleanup after pool refactoring + + Remove members which are now no longer used. In fact, many of the + `struct mlx5hws_pool_chunk` were not even written to beyond being + initialized, but they were used in various internals. + + Also cleanup some local variables which made more sense when the API was + thicker. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index 39904b337b81..161ad720b339 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1583,7 +1583,6 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + struct mlx5hws_matcher_action_ste *table_ste; + struct mlx5hws_pool_attr pool_attr = {0}; + struct mlx5hws_pool *ste_pool, *stc_pool; +- struct mlx5hws_pool_chunk *ste; + u32 *rtc_0_id, *rtc_1_id; + u32 obj_id; + int ret; +@@ -1613,8 +1612,6 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + rtc_0_id = &table_ste->rtc_0_id; + rtc_1_id = &table_ste->rtc_1_id; + ste_pool = table_ste->pool; +- ste = &table_ste->ste; +- ste->order = 1; + + rtc_attr.log_size = 0; + rtc_attr.log_depth = 0; +@@ -1630,7 +1627,6 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + + rtc_attr.pd = ctx->pd_num; + rtc_attr.ste_base = obj_id; +- rtc_attr.ste_offset = ste->offset; + rtc_attr.reparse_mode = mlx5hws_context_get_reparse_mode(ctx); + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(MLX5HWS_TABLE_TYPE_FDB, false); + +@@ -1833,7 +1829,6 @@ mlx5hws_action_create_dest_match_range(struct mlx5hws_context *ctx, + stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT; + stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_JUMP_TO_STE_TABLE; + stc_attr.reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE; +- stc_attr.ste_table.ste = table_ste->ste; + stc_attr.ste_table.ste_pool = table_ste->pool; + stc_attr.ste_table.match_definer_id = ctx->caps->trivial_match_definer; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +index e8f98c109b99..9c83753e4592 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +@@ -406,7 +406,6 @@ int mlx5hws_cmd_rtc_create(struct mlx5_core_dev *mdev, + MLX5_SET(rtc, attr, match_definer_1, rtc_attr->match_definer_1); + MLX5_SET(rtc, attr, stc_id, rtc_attr->stc_base); + MLX5_SET(rtc, attr, ste_table_base_id, rtc_attr->ste_base); +- MLX5_SET(rtc, attr, ste_table_offset, rtc_attr->ste_offset); + MLX5_SET(rtc, attr, miss_flow_table_id, rtc_attr->miss_ft_id); + MLX5_SET(rtc, attr, reparse_mode, rtc_attr->reparse_mode); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h +index 51d9e0291ac1..fa6bff210266 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h +@@ -70,7 +70,6 @@ struct mlx5hws_cmd_rtc_create_attr { + u32 pd; + u32 stc_base; + u32 ste_base; +- u32 ste_offset; + u32 miss_ft_id; + bool fw_gen_wqe; + u8 update_index_mode; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index 95d31fd6c976..3028e0387e3f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -197,22 +197,15 @@ static int hws_matcher_disconnect(struct mlx5hws_matcher *matcher) + + static void hws_matcher_set_rtc_attr_sz(struct mlx5hws_matcher *matcher, + struct mlx5hws_cmd_rtc_create_attr *rtc_attr, +- enum mlx5hws_matcher_rtc_type rtc_type, + bool is_mirror) + { +- struct mlx5hws_pool_chunk *ste = &matcher->action_ste.ste; + enum mlx5hws_matcher_flow_src flow_src = matcher->attr.optimize_flow_src; +- bool is_match_rtc = rtc_type == HWS_MATCHER_RTC_TYPE_MATCH; + + if ((flow_src == MLX5HWS_MATCHER_FLOW_SRC_VPORT && !is_mirror) || + (flow_src == MLX5HWS_MATCHER_FLOW_SRC_WIRE && is_mirror)) { + /* Optimize FDB RTC */ + rtc_attr->log_size = 0; + rtc_attr->log_depth = 0; +- } else { +- /* Keep original values */ +- rtc_attr->log_size = is_match_rtc ? matcher->attr.table.sz_row_log : ste->order; +- rtc_attr->log_depth = is_match_rtc ? matcher->attr.table.sz_col_log : 0; + } + } + +@@ -225,8 +218,7 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + struct mlx5hws_context *ctx = matcher->tbl->ctx; + struct mlx5hws_matcher_action_ste *action_ste; + struct mlx5hws_table *tbl = matcher->tbl; +- struct mlx5hws_pool *ste_pool, *stc_pool; +- struct mlx5hws_pool_chunk *ste; ++ struct mlx5hws_pool *ste_pool; + u32 *rtc_0_id, *rtc_1_id; + u32 obj_id; + int ret; +@@ -236,8 +228,6 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + rtc_0_id = &matcher->match_ste.rtc_0_id; + rtc_1_id = &matcher->match_ste.rtc_1_id; + ste_pool = matcher->match_ste.pool; +- ste = &matcher->match_ste.ste; +- ste->order = attr->table.sz_col_log + attr->table.sz_row_log; + + rtc_attr.log_size = attr->table.sz_row_log; + rtc_attr.log_depth = attr->table.sz_col_log; +@@ -273,16 +263,15 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + rtc_0_id = &action_ste->rtc_0_id; + rtc_1_id = &action_ste->rtc_1_id; + ste_pool = action_ste->pool; +- ste = &action_ste->ste; + /* Action RTC size calculation: + * log((max number of rules in matcher) * + * (max number of action STEs per rule) * + * (2 to support writing new STEs for update rule)) + */ +- ste->order = ilog2(roundup_pow_of_two(action_ste->max_stes)) + +- attr->table.sz_row_log + +- MLX5HWS_MATCHER_ACTION_RTC_UPDATE_MULT; +- rtc_attr.log_size = ste->order; ++ rtc_attr.log_size = ++ ilog2(roundup_pow_of_two(action_ste->max_stes)) + ++ attr->table.sz_row_log + ++ MLX5HWS_MATCHER_ACTION_RTC_UPDATE_MULT; + rtc_attr.log_depth = 0; + rtc_attr.update_index_mode = MLX5_IFC_RTC_STE_UPDATE_MODE_BY_OFFSET; + /* The action STEs use the default always hit definer */ +@@ -300,21 +289,19 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + + rtc_attr.pd = ctx->pd_num; + rtc_attr.ste_base = obj_id; +- rtc_attr.ste_offset = ste->offset; + rtc_attr.reparse_mode = mlx5hws_context_get_reparse_mode(ctx); + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, false); +- hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, rtc_type, false); ++ hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, false); + + /* STC is a single resource (obj_id), use any STC for the ID */ +- stc_pool = ctx->stc_pool; +- obj_id = mlx5hws_pool_get_base_id(stc_pool); ++ obj_id = mlx5hws_pool_get_base_id(ctx->stc_pool); + rtc_attr.stc_base = obj_id; + + ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_0_id); + if (ret) { + mlx5hws_err(ctx, "Failed to create matcher RTC of type %s", + hws_matcher_rtc_type_to_str(rtc_type)); +- goto free_ste; ++ return ret; + } + + if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) { +@@ -322,9 +309,9 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + rtc_attr.ste_base = obj_id; + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, true); + +- obj_id = mlx5hws_pool_get_base_mirror_id(stc_pool); ++ obj_id = mlx5hws_pool_get_base_mirror_id(ctx->stc_pool); + rtc_attr.stc_base = obj_id; +- hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, rtc_type, true); ++ hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, true); + + ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_1_id); + if (ret) { +@@ -338,16 +325,12 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + + destroy_rtc_0: + mlx5hws_cmd_rtc_destroy(ctx->mdev, *rtc_0_id); +-free_ste: +- if (rtc_type == HWS_MATCHER_RTC_TYPE_MATCH) +- mlx5hws_pool_chunk_free(ste_pool, ste); + return ret; + } + + static void hws_matcher_destroy_rtc(struct mlx5hws_matcher *matcher, + enum mlx5hws_matcher_rtc_type rtc_type) + { +- struct mlx5hws_matcher_action_ste *action_ste; + struct mlx5hws_table *tbl = matcher->tbl; + u32 rtc_0_id, rtc_1_id; + +@@ -357,18 +340,17 @@ static void hws_matcher_destroy_rtc(struct mlx5hws_matcher *matcher, + rtc_1_id = matcher->match_ste.rtc_1_id; + break; + case HWS_MATCHER_RTC_TYPE_STE_ARRAY: +- action_ste = &matcher->action_ste; +- rtc_0_id = action_ste->rtc_0_id; +- rtc_1_id = action_ste->rtc_1_id; ++ rtc_0_id = matcher->action_ste.rtc_0_id; ++ rtc_1_id = matcher->action_ste.rtc_1_id; + break; + default: + return; + } + + if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) +- mlx5hws_cmd_rtc_destroy(matcher->tbl->ctx->mdev, rtc_1_id); ++ mlx5hws_cmd_rtc_destroy(tbl->ctx->mdev, rtc_1_id); + +- mlx5hws_cmd_rtc_destroy(matcher->tbl->ctx->mdev, rtc_0_id); ++ mlx5hws_cmd_rtc_destroy(tbl->ctx->mdev, rtc_0_id); + } + + static int +@@ -564,7 +546,6 @@ static int hws_matcher_bind_at(struct mlx5hws_matcher *matcher) + stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT; + stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_JUMP_TO_STE_TABLE; + stc_attr.reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE; +- stc_attr.ste_table.ste = action_ste->ste; + stc_attr.ste_table.ste_pool = action_ste->pool; + stc_attr.ste_table.match_definer_id = ctx->caps->trivial_match_definer; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +index 20b32012c418..0450b6175ac9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +@@ -45,14 +45,12 @@ struct mlx5hws_match_template { + }; + + struct mlx5hws_matcher_match_ste { +- struct mlx5hws_pool_chunk ste; + u32 rtc_0_id; + u32 rtc_1_id; + struct mlx5hws_pool *pool; + }; + + struct mlx5hws_matcher_action_ste { +- struct mlx5hws_pool_chunk ste; + struct mlx5hws_pool_chunk stc; + u32 rtc_0_id; + u32 rtc_1_id; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1351-net-mlx5-hws-add-fullness-tracking-to-pool.patch b/SOURCES/1351-net-mlx5-hws-add-fullness-tracking-to-pool.patch new file mode 100644 index 000000000..60dc621c0 --- /dev/null +++ b/SOURCES/1351-net-mlx5-hws-add-fullness-tracking-to-pool.patch @@ -0,0 +1,108 @@ +From 2cd06eab502130ff9491b0f14378269d658826c8 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:41:59 -0400 +Subject: [PATCH] net/mlx5: HWS, Add fullness tracking to pool + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 04562694766514f00e7086d3d4884db5f3a22d4e +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:36 2025 +0300 + + net/mlx5: HWS, Add fullness tracking to pool + + Future users will need to query whether a pool is empty. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-7-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +index 270b333faab3..26d85fe3c417 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +@@ -324,6 +324,8 @@ int mlx5hws_pool_chunk_alloc(struct mlx5hws_pool *pool, + + mutex_lock(&pool->lock); + ret = pool->p_get_chunk(pool, chunk); ++ if (ret == 0) ++ pool->available_elems -= 1 << chunk->order; + mutex_unlock(&pool->lock); + + return ret; +@@ -334,6 +336,7 @@ void mlx5hws_pool_chunk_free(struct mlx5hws_pool *pool, + { + mutex_lock(&pool->lock); + pool->p_put_chunk(pool, chunk); ++ pool->available_elems += 1 << chunk->order; + mutex_unlock(&pool->lock); + } + +@@ -360,6 +363,7 @@ mlx5hws_pool_create(struct mlx5hws_context *ctx, struct mlx5hws_pool_attr *pool_ + res_db_type = MLX5HWS_POOL_DB_TYPE_BITMAP; + + pool->alloc_log_sz = pool_attr->alloc_log_sz; ++ pool->available_elems = 1 << pool_attr->alloc_log_sz; + + if (hws_pool_db_init(pool, res_db_type)) + goto free_pool; +@@ -377,6 +381,9 @@ void mlx5hws_pool_destroy(struct mlx5hws_pool *pool) + { + mutex_destroy(&pool->lock); + ++ if (pool->available_elems != 1 << pool->alloc_log_sz) ++ mlx5hws_err(pool->ctx, "Attempting to destroy non-empty pool\n"); ++ + if (pool->resource) + hws_pool_resource_free(pool); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +index 9a781a87f097..c82760d53e1a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +@@ -71,6 +71,7 @@ struct mlx5hws_pool { + enum mlx5hws_pool_flags flags; + struct mutex lock; /* protect the pool */ + size_t alloc_log_sz; ++ size_t available_elems; + enum mlx5hws_table_type tbl_type; + enum mlx5hws_pool_optimize opt_type; + struct mlx5hws_pool_resource *resource; +@@ -103,4 +104,28 @@ static inline u32 mlx5hws_pool_get_base_mirror_id(struct mlx5hws_pool *pool) + { + return pool->mirror_resource->base_id; + } ++ ++static inline bool ++mlx5hws_pool_empty(struct mlx5hws_pool *pool) ++{ ++ bool ret; ++ ++ mutex_lock(&pool->lock); ++ ret = pool->available_elems == 0; ++ mutex_unlock(&pool->lock); ++ ++ return ret; ++} ++ ++static inline bool ++mlx5hws_pool_full(struct mlx5hws_pool *pool) ++{ ++ bool ret; ++ ++ mutex_lock(&pool->lock); ++ ret = pool->available_elems == (1 << pool->alloc_log_sz); ++ mutex_unlock(&pool->lock); ++ ++ return ret; ++} + #endif /* MLX5HWS_POOL_H_ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1352-net-mlx5-hws-fix-pool-size-optimization.patch b/SOURCES/1352-net-mlx5-hws-fix-pool-size-optimization.patch new file mode 100644 index 000000000..e8a3f555e --- /dev/null +++ b/SOURCES/1352-net-mlx5-hws-fix-pool-size-optimization.patch @@ -0,0 +1,53 @@ +From 83591e87d75f1fbe1bad278c3b590cc83e85c276 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:41:59 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix pool size optimization + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a68334f9750f41fc36990840090ef9dbee1e2c7e +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:37 2025 +0300 + + net/mlx5: HWS, Fix pool size optimization + + The optimization to create a size-one STE range for the unused direction + was broken. The hardware prevents us from creating RTCs over unallocated + STE space, so the only reason this has worked so far is because the + optimization was never used. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-8-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +index 26d85fe3c417..7e37d6e9eb83 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +@@ -80,7 +80,7 @@ static int hws_pool_resource_alloc(struct mlx5hws_pool *pool) + u32 fw_ft_type, opt_log_range; + + fw_ft_type = mlx5hws_table_get_res_fw_ft_type(pool->tbl_type, false); +- opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_ORIG ? ++ opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_MIRROR ? + 0 : pool->alloc_log_sz; + resource = hws_pool_create_one_resource(pool, opt_log_range, fw_ft_type); + if (!resource) { +@@ -94,7 +94,7 @@ static int hws_pool_resource_alloc(struct mlx5hws_pool *pool) + struct mlx5hws_pool_resource *mirror_resource; + + fw_ft_type = mlx5hws_table_get_res_fw_ft_type(pool->tbl_type, true); +- opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_MIRROR ? ++ opt_log_range = pool->opt_type == MLX5HWS_POOL_OPTIMIZE_ORIG ? + 0 : pool->alloc_log_sz; + mirror_resource = hws_pool_create_one_resource(pool, opt_log_range, fw_ft_type); + if (!mirror_resource) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1353-net-mlx5-hws-implement-action-ste-pool.patch b/SOURCES/1353-net-mlx5-hws-implement-action-ste-pool.patch new file mode 100644 index 000000000..a6ee81e99 --- /dev/null +++ b/SOURCES/1353-net-mlx5-hws-implement-action-ste-pool.patch @@ -0,0 +1,585 @@ +From be613df0e750dc94c718ad4c944fa5542870a95c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:00 -0400 +Subject: [PATCH] net/mlx5: HWS, Implement action STE pool + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 983d01b2ce0ac688bb42489f33a29a02274366d5 +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:38 2025 +0300 + + net/mlx5: HWS, Implement action STE pool + + Implement a per-queue pool of action STEs that match STEs can link to, + regardless of matcher. + + The code relies on hints to optimize whether a given rule is added to + rx-only, tx-only or both. Correspondingly, action STEs need to be added + to different RTC for ingress or egress paths. For rx-and-tx rules, the + current rule implementation dictates that the offsets for a given rule + must be the same in both RTCs. + + To avoid wasting STEs, each action STE pool element holds 3 pools: + rx-only, tx-only, and rx-and-tx, corresponding to the possible values of + the pool optimization enum. The implementation then chooses at rule + creation / update which of these elements to allocate from. + + Each element holds multiple action STE tables, which wrap an RTC, an STE + range, the logic to buddy-allocate offsets from the range, and an STC + that allows match STEs to point to this table. When allocating offsets + from an element, we iterate through available action STE tables and, if + needed, create a new table. + + Similar to the previous implementation, this iteration does not free any + resources. This is implemented in a subsequent patch. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-9-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +index 568bbe5f83f5..d292e6a9e22c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile ++++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +@@ -154,7 +154,8 @@ mlx5_core-$(CONFIG_MLX5_HW_STEERING) += steering/hws/cmd.o \ + steering/hws/vport.o \ + steering/hws/bwc_complex.o \ + steering/hws/fs_hws_pools.o \ +- steering/hws/fs_hws.o ++ steering/hws/fs_hws.o \ ++ steering/hws/action_ste_pool.o + + # + # SF device +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c +new file mode 100644 +index 000000000000..cb6ad8411631 +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c +@@ -0,0 +1,387 @@ ++// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB ++/* Copyright (c) 2025 NVIDIA Corporation & Affiliates */ ++ ++#include "internal.h" ++ ++static const char * ++hws_pool_opt_to_str(enum mlx5hws_pool_optimize opt) ++{ ++ switch (opt) { ++ case MLX5HWS_POOL_OPTIMIZE_NONE: ++ return "rx-and-tx"; ++ case MLX5HWS_POOL_OPTIMIZE_ORIG: ++ return "rx-only"; ++ case MLX5HWS_POOL_OPTIMIZE_MIRROR: ++ return "tx-only"; ++ default: ++ return "unknown"; ++ } ++} ++ ++static int ++hws_action_ste_table_create_pool(struct mlx5hws_context *ctx, ++ struct mlx5hws_action_ste_table *action_tbl, ++ enum mlx5hws_pool_optimize opt, size_t log_sz) ++{ ++ struct mlx5hws_pool_attr pool_attr = { 0 }; ++ ++ pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE; ++ pool_attr.table_type = MLX5HWS_TABLE_TYPE_FDB; ++ pool_attr.flags = MLX5HWS_POOL_FLAG_BUDDY; ++ pool_attr.opt_type = opt; ++ pool_attr.alloc_log_sz = log_sz; ++ ++ action_tbl->pool = mlx5hws_pool_create(ctx, &pool_attr); ++ if (!action_tbl->pool) { ++ mlx5hws_err(ctx, "Failed to allocate STE pool\n"); ++ return -EINVAL; ++ } ++ ++ return 0; ++} ++ ++static int hws_action_ste_table_create_single_rtc( ++ struct mlx5hws_context *ctx, ++ struct mlx5hws_action_ste_table *action_tbl, ++ enum mlx5hws_pool_optimize opt, size_t log_sz, bool tx) ++{ ++ struct mlx5hws_cmd_rtc_create_attr rtc_attr = { 0 }; ++ u32 *rtc_id; ++ ++ rtc_attr.log_depth = 0; ++ rtc_attr.update_index_mode = MLX5_IFC_RTC_STE_UPDATE_MODE_BY_OFFSET; ++ /* Action STEs use the default always hit definer. */ ++ rtc_attr.match_definer_0 = ctx->caps->trivial_match_definer; ++ rtc_attr.is_frst_jumbo = false; ++ rtc_attr.miss_ft_id = 0; ++ rtc_attr.pd = ctx->pd_num; ++ rtc_attr.reparse_mode = mlx5hws_context_get_reparse_mode(ctx); ++ ++ if (tx) { ++ rtc_attr.table_type = FS_FT_FDB_TX; ++ rtc_attr.ste_base = ++ mlx5hws_pool_get_base_mirror_id(action_tbl->pool); ++ rtc_attr.stc_base = ++ mlx5hws_pool_get_base_mirror_id(ctx->stc_pool); ++ rtc_attr.log_size = ++ opt == MLX5HWS_POOL_OPTIMIZE_ORIG ? 0 : log_sz; ++ rtc_id = &action_tbl->rtc_1_id; ++ } else { ++ rtc_attr.table_type = FS_FT_FDB_RX; ++ rtc_attr.ste_base = mlx5hws_pool_get_base_id(action_tbl->pool); ++ rtc_attr.stc_base = mlx5hws_pool_get_base_id(ctx->stc_pool); ++ rtc_attr.log_size = ++ opt == MLX5HWS_POOL_OPTIMIZE_MIRROR ? 0 : log_sz; ++ rtc_id = &action_tbl->rtc_0_id; ++ } ++ ++ return mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_id); ++} ++ ++static int ++hws_action_ste_table_create_rtcs(struct mlx5hws_context *ctx, ++ struct mlx5hws_action_ste_table *action_tbl, ++ enum mlx5hws_pool_optimize opt, size_t log_sz) ++{ ++ int err; ++ ++ err = hws_action_ste_table_create_single_rtc(ctx, action_tbl, opt, ++ log_sz, false); ++ if (err) ++ return err; ++ ++ err = hws_action_ste_table_create_single_rtc(ctx, action_tbl, opt, ++ log_sz, true); ++ if (err) { ++ mlx5hws_cmd_rtc_destroy(ctx->mdev, action_tbl->rtc_0_id); ++ return err; ++ } ++ ++ return 0; ++} ++ ++static void ++hws_action_ste_table_destroy_rtcs(struct mlx5hws_action_ste_table *action_tbl) ++{ ++ mlx5hws_cmd_rtc_destroy(action_tbl->pool->ctx->mdev, ++ action_tbl->rtc_1_id); ++ mlx5hws_cmd_rtc_destroy(action_tbl->pool->ctx->mdev, ++ action_tbl->rtc_0_id); ++} ++ ++static int ++hws_action_ste_table_create_stc(struct mlx5hws_context *ctx, ++ struct mlx5hws_action_ste_table *action_tbl) ++{ ++ struct mlx5hws_cmd_stc_modify_attr stc_attr = { 0 }; ++ ++ stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT; ++ stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_JUMP_TO_STE_TABLE; ++ stc_attr.reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE; ++ stc_attr.ste_table.ste_pool = action_tbl->pool; ++ stc_attr.ste_table.match_definer_id = ctx->caps->trivial_match_definer; ++ ++ return mlx5hws_action_alloc_single_stc(ctx, &stc_attr, ++ MLX5HWS_TABLE_TYPE_FDB, ++ &action_tbl->stc); ++} ++ ++static struct mlx5hws_action_ste_table * ++hws_action_ste_table_alloc(struct mlx5hws_action_ste_pool_element *parent_elem) ++{ ++ enum mlx5hws_pool_optimize opt = parent_elem->opt; ++ struct mlx5hws_context *ctx = parent_elem->ctx; ++ struct mlx5hws_action_ste_table *action_tbl; ++ size_t log_sz; ++ int err; ++ ++ log_sz = min(parent_elem->log_sz ? ++ parent_elem->log_sz + ++ MLX5HWS_ACTION_STE_TABLE_STEP_LOG_SZ : ++ MLX5HWS_ACTION_STE_TABLE_INIT_LOG_SZ, ++ MLX5HWS_ACTION_STE_TABLE_MAX_LOG_SZ); ++ ++ action_tbl = kzalloc(sizeof(*action_tbl), GFP_KERNEL); ++ if (!action_tbl) ++ return ERR_PTR(-ENOMEM); ++ ++ err = hws_action_ste_table_create_pool(ctx, action_tbl, opt, log_sz); ++ if (err) ++ goto free_tbl; ++ ++ err = hws_action_ste_table_create_rtcs(ctx, action_tbl, opt, log_sz); ++ if (err) ++ goto destroy_pool; ++ ++ err = hws_action_ste_table_create_stc(ctx, action_tbl); ++ if (err) ++ goto destroy_rtcs; ++ ++ action_tbl->parent_elem = parent_elem; ++ INIT_LIST_HEAD(&action_tbl->list_node); ++ list_add(&action_tbl->list_node, &parent_elem->available); ++ parent_elem->log_sz = log_sz; ++ ++ mlx5hws_dbg(ctx, ++ "Allocated %s action STE table log_sz %zu; STEs (%d, %d); RTCs (%d, %d); STC %d\n", ++ hws_pool_opt_to_str(opt), log_sz, ++ mlx5hws_pool_get_base_id(action_tbl->pool), ++ mlx5hws_pool_get_base_mirror_id(action_tbl->pool), ++ action_tbl->rtc_0_id, action_tbl->rtc_1_id, ++ action_tbl->stc.offset); ++ ++ return action_tbl; ++ ++destroy_rtcs: ++ hws_action_ste_table_destroy_rtcs(action_tbl); ++destroy_pool: ++ mlx5hws_pool_destroy(action_tbl->pool); ++free_tbl: ++ kfree(action_tbl); ++ ++ return ERR_PTR(err); ++} ++ ++static void ++hws_action_ste_table_destroy(struct mlx5hws_action_ste_table *action_tbl) ++{ ++ struct mlx5hws_context *ctx = action_tbl->parent_elem->ctx; ++ ++ mlx5hws_dbg(ctx, ++ "Destroying %s action STE table: STEs (%d, %d); RTCs (%d, %d); STC %d\n", ++ hws_pool_opt_to_str(action_tbl->parent_elem->opt), ++ mlx5hws_pool_get_base_id(action_tbl->pool), ++ mlx5hws_pool_get_base_mirror_id(action_tbl->pool), ++ action_tbl->rtc_0_id, action_tbl->rtc_1_id, ++ action_tbl->stc.offset); ++ ++ mlx5hws_action_free_single_stc(ctx, MLX5HWS_TABLE_TYPE_FDB, ++ &action_tbl->stc); ++ hws_action_ste_table_destroy_rtcs(action_tbl); ++ mlx5hws_pool_destroy(action_tbl->pool); ++ ++ list_del(&action_tbl->list_node); ++ kfree(action_tbl); ++} ++ ++static int ++hws_action_ste_pool_element_init(struct mlx5hws_context *ctx, ++ struct mlx5hws_action_ste_pool_element *elem, ++ enum mlx5hws_pool_optimize opt) ++{ ++ elem->ctx = ctx; ++ elem->opt = opt; ++ INIT_LIST_HEAD(&elem->available); ++ INIT_LIST_HEAD(&elem->full); ++ ++ return 0; ++} ++ ++static void hws_action_ste_pool_element_destroy( ++ struct mlx5hws_action_ste_pool_element *elem) ++{ ++ struct mlx5hws_action_ste_table *action_tbl, *p; ++ ++ /* This should be empty, but attempt to free its elements anyway. */ ++ list_for_each_entry_safe(action_tbl, p, &elem->full, list_node) ++ hws_action_ste_table_destroy(action_tbl); ++ ++ list_for_each_entry_safe(action_tbl, p, &elem->available, list_node) ++ hws_action_ste_table_destroy(action_tbl); ++} ++ ++static int hws_action_ste_pool_init(struct mlx5hws_context *ctx, ++ struct mlx5hws_action_ste_pool *pool) ++{ ++ enum mlx5hws_pool_optimize opt; ++ int err; ++ ++ /* Rules which are added for both RX and TX must use the same action STE ++ * indices for both. If we were to use a single table, then RX-only and ++ * TX-only rules would waste the unused entries. Thus, we use separate ++ * table sets for the three cases. ++ */ ++ for (opt = MLX5HWS_POOL_OPTIMIZE_NONE; opt < MLX5HWS_POOL_OPTIMIZE_MAX; ++ opt++) { ++ err = hws_action_ste_pool_element_init(ctx, &pool->elems[opt], ++ opt); ++ if (err) ++ goto destroy_elems; ++ } ++ ++ return 0; ++ ++destroy_elems: ++ while (opt-- > MLX5HWS_POOL_OPTIMIZE_NONE) ++ hws_action_ste_pool_element_destroy(&pool->elems[opt]); ++ ++ return err; ++} ++ ++static void hws_action_ste_pool_destroy(struct mlx5hws_action_ste_pool *pool) ++{ ++ int opt; ++ ++ for (opt = MLX5HWS_POOL_OPTIMIZE_MAX - 1; ++ opt >= MLX5HWS_POOL_OPTIMIZE_NONE; opt--) ++ hws_action_ste_pool_element_destroy(&pool->elems[opt]); ++} ++ ++int mlx5hws_action_ste_pool_init(struct mlx5hws_context *ctx) ++{ ++ struct mlx5hws_action_ste_pool *pool; ++ size_t queues = ctx->queues; ++ int i, err; ++ ++ pool = kcalloc(queues, sizeof(*pool), GFP_KERNEL); ++ if (!pool) ++ return -ENOMEM; ++ ++ for (i = 0; i < queues; i++) { ++ err = hws_action_ste_pool_init(ctx, &pool[i]); ++ if (err) ++ goto free_pool; ++ } ++ ++ ctx->action_ste_pool = pool; ++ ++ return 0; ++ ++free_pool: ++ while (i--) ++ hws_action_ste_pool_destroy(&pool[i]); ++ kfree(pool); ++ ++ return err; ++} ++ ++void mlx5hws_action_ste_pool_uninit(struct mlx5hws_context *ctx) ++{ ++ size_t queues = ctx->queues; ++ int i; ++ ++ for (i = 0; i < queues; i++) ++ hws_action_ste_pool_destroy(&ctx->action_ste_pool[i]); ++ ++ kfree(ctx->action_ste_pool); ++} ++ ++static struct mlx5hws_action_ste_pool_element * ++hws_action_ste_choose_elem(struct mlx5hws_action_ste_pool *pool, ++ bool skip_rx, bool skip_tx) ++{ ++ if (skip_rx) ++ return &pool->elems[MLX5HWS_POOL_OPTIMIZE_MIRROR]; ++ ++ if (skip_tx) ++ return &pool->elems[MLX5HWS_POOL_OPTIMIZE_ORIG]; ++ ++ return &pool->elems[MLX5HWS_POOL_OPTIMIZE_NONE]; ++} ++ ++static int ++hws_action_ste_table_chunk_alloc(struct mlx5hws_action_ste_table *action_tbl, ++ struct mlx5hws_action_ste_chunk *chunk) ++{ ++ int err; ++ ++ err = mlx5hws_pool_chunk_alloc(action_tbl->pool, &chunk->ste); ++ if (err) ++ return err; ++ ++ chunk->action_tbl = action_tbl; ++ ++ return 0; ++} ++ ++int mlx5hws_action_ste_chunk_alloc(struct mlx5hws_action_ste_pool *pool, ++ bool skip_rx, bool skip_tx, ++ struct mlx5hws_action_ste_chunk *chunk) ++{ ++ struct mlx5hws_action_ste_pool_element *elem; ++ struct mlx5hws_action_ste_table *action_tbl; ++ bool found; ++ int err; ++ ++ if (skip_rx && skip_tx) ++ return -EINVAL; ++ ++ elem = hws_action_ste_choose_elem(pool, skip_rx, skip_tx); ++ ++ mlx5hws_dbg(elem->ctx, ++ "Allocating action STEs skip_rx %d skip_tx %d order %d\n", ++ skip_rx, skip_tx, chunk->ste.order); ++ ++ found = false; ++ list_for_each_entry(action_tbl, &elem->available, list_node) { ++ if (!hws_action_ste_table_chunk_alloc(action_tbl, chunk)) { ++ found = true; ++ break; ++ } ++ } ++ ++ if (!found) { ++ action_tbl = hws_action_ste_table_alloc(elem); ++ if (IS_ERR(action_tbl)) ++ return PTR_ERR(action_tbl); ++ ++ err = hws_action_ste_table_chunk_alloc(action_tbl, chunk); ++ if (err) ++ return err; ++ } ++ ++ if (mlx5hws_pool_empty(action_tbl->pool)) ++ list_move(&action_tbl->list_node, &elem->full); ++ ++ return 0; ++} ++ ++void mlx5hws_action_ste_chunk_free(struct mlx5hws_action_ste_chunk *chunk) ++{ ++ mlx5hws_dbg(chunk->action_tbl->pool->ctx, ++ "Freeing action STEs offset %d order %d\n", ++ chunk->ste.offset, chunk->ste.order); ++ mlx5hws_pool_chunk_free(chunk->action_tbl->pool, &chunk->ste); ++ list_move(&chunk->action_tbl->list_node, ++ &chunk->action_tbl->parent_elem->available); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h +new file mode 100644 +index 000000000000..2de660a63223 +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h +@@ -0,0 +1,58 @@ ++/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ ++/* Copyright (c) 2025 NVIDIA Corporation & Affiliates */ ++ ++#ifndef ACTION_STE_POOL_H_ ++#define ACTION_STE_POOL_H_ ++ ++#define MLX5HWS_ACTION_STE_TABLE_INIT_LOG_SZ 10 ++#define MLX5HWS_ACTION_STE_TABLE_STEP_LOG_SZ 1 ++#define MLX5HWS_ACTION_STE_TABLE_MAX_LOG_SZ 20 ++ ++struct mlx5hws_action_ste_pool_element; ++ ++struct mlx5hws_action_ste_table { ++ struct mlx5hws_action_ste_pool_element *parent_elem; ++ /* Wraps the RTC and STE range for this given action. */ ++ struct mlx5hws_pool *pool; ++ /* Match STEs use this STC to jump to this pool's RTC. */ ++ struct mlx5hws_pool_chunk stc; ++ u32 rtc_0_id; ++ u32 rtc_1_id; ++ struct list_head list_node; ++}; ++ ++struct mlx5hws_action_ste_pool_element { ++ struct mlx5hws_context *ctx; ++ size_t log_sz; /* Size of the largest table so far. */ ++ enum mlx5hws_pool_optimize opt; ++ struct list_head available; ++ struct list_head full; ++}; ++ ++/* Central repository of action STEs. The context contains one of these pools ++ * per queue. ++ */ ++struct mlx5hws_action_ste_pool { ++ struct mlx5hws_action_ste_pool_element elems[MLX5HWS_POOL_OPTIMIZE_MAX]; ++}; ++ ++/* A chunk of STEs and the table it was allocated from. Used by rules. */ ++struct mlx5hws_action_ste_chunk { ++ struct mlx5hws_action_ste_table *action_tbl; ++ struct mlx5hws_pool_chunk ste; ++}; ++ ++int mlx5hws_action_ste_pool_init(struct mlx5hws_context *ctx); ++ ++void mlx5hws_action_ste_pool_uninit(struct mlx5hws_context *ctx); ++ ++/* Callers are expected to fill chunk->ste.order. On success, this function ++ * populates chunk->tbl and chunk->ste.offset. ++ */ ++int mlx5hws_action_ste_chunk_alloc(struct mlx5hws_action_ste_pool *pool, ++ bool skip_rx, bool skip_tx, ++ struct mlx5hws_action_ste_chunk *chunk); ++ ++void mlx5hws_action_ste_chunk_free(struct mlx5hws_action_ste_chunk *chunk); ++ ++#endif /* ACTION_STE_POOL_H_ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c +index b7cb736b74d7..428dae869706 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c +@@ -158,10 +158,16 @@ static int hws_context_init_hws(struct mlx5hws_context *ctx, + if (ret) + goto pools_uninit; + ++ ret = mlx5hws_action_ste_pool_init(ctx); ++ if (ret) ++ goto close_queues; ++ + INIT_LIST_HEAD(&ctx->tbl_list); + + return 0; + ++close_queues: ++ mlx5hws_send_queues_close(ctx); + pools_uninit: + hws_context_pools_uninit(ctx); + uninit_pd: +@@ -174,6 +180,7 @@ static void hws_context_uninit_hws(struct mlx5hws_context *ctx) + if (!(ctx->flags & MLX5HWS_CONTEXT_FLAG_HWS_SUPPORT)) + return; + ++ mlx5hws_action_ste_pool_uninit(ctx); + mlx5hws_send_queues_close(ctx); + hws_context_pools_uninit(ctx); + hws_context_uninit_pd(ctx); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h +index 38c3647444ad..e987e93bbc6e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h +@@ -39,6 +39,7 @@ struct mlx5hws_context { + struct mlx5hws_cmd_query_caps *caps; + u32 pd_num; + struct mlx5hws_pool *stc_pool; ++ struct mlx5hws_action_ste_pool *action_ste_pool; /* One per queue */ + struct mlx5hws_context_common_res common_res; + struct mlx5hws_pattern_cache *pattern_cache; + struct mlx5hws_definer_cache *definer_cache; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/internal.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/internal.h +index 30ccd635b505..21279d503117 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/internal.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/internal.h +@@ -17,6 +17,7 @@ + #include "context.h" + #include "table.h" + #include "send.h" ++#include "action_ste_pool.h" + #include "rule.h" + #include "cmd.h" + #include "action.h" +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +index c82760d53e1a..33e33d5f1fb3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h +@@ -33,6 +33,7 @@ enum mlx5hws_pool_optimize { + MLX5HWS_POOL_OPTIMIZE_NONE = 0x0, + MLX5HWS_POOL_OPTIMIZE_ORIG = 0x1, + MLX5HWS_POOL_OPTIMIZE_MIRROR = 0x2, ++ MLX5HWS_POOL_OPTIMIZE_MAX = 0x3, + }; + + struct mlx5hws_pool_attr { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1354-net-mlx5-hws-use-the-new-action-ste-pool.patch b/SOURCES/1354-net-mlx5-hws-use-the-new-action-ste-pool.patch new file mode 100644 index 000000000..59e15911b --- /dev/null +++ b/SOURCES/1354-net-mlx5-hws-use-the-new-action-ste-pool.patch @@ -0,0 +1,190 @@ +From 95ecc26ff257b1b713163c5a13570543b03678f2 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:00 -0400 +Subject: [PATCH] net/mlx5: HWS, Use the new action STE pool + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 593a9470a8565a59a07b577d6bcb3c199f232d4a +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:39 2025 +0300 + + net/mlx5: HWS, Use the new action STE pool + + Use the central action STE pool when creating / updating rules. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-10-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +index a27a2d5ffc7b..5b758467ed03 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +@@ -195,44 +195,30 @@ hws_rule_load_delete_info(struct mlx5hws_rule *rule, + } + } + +-static int hws_rule_alloc_action_ste(struct mlx5hws_rule *rule) ++static int mlx5hws_rule_alloc_action_ste(struct mlx5hws_rule *rule, ++ u16 queue_id, bool skip_rx, ++ bool skip_tx) + { + struct mlx5hws_matcher *matcher = rule->matcher; +- struct mlx5hws_matcher_action_ste *action_ste; +- struct mlx5hws_pool_chunk ste = {0}; +- int ret; +- +- action_ste = &matcher->action_ste; +- ste.order = ilog2(roundup_pow_of_two(action_ste->max_stes)); +- ret = mlx5hws_pool_chunk_alloc(action_ste->pool, &ste); +- if (unlikely(ret)) { +- mlx5hws_err(matcher->tbl->ctx, +- "Failed to allocate STE for rule actions"); +- return ret; +- } +- +- rule->action_ste.pool = matcher->action_ste.pool; +- rule->action_ste.num_stes = matcher->action_ste.max_stes; +- rule->action_ste.index = ste.offset; ++ struct mlx5hws_context *ctx = matcher->tbl->ctx; + +- return 0; ++ rule->action_ste.ste.order = ++ ilog2(roundup_pow_of_two(matcher->action_ste.max_stes)); ++ return mlx5hws_action_ste_chunk_alloc(&ctx->action_ste_pool[queue_id], ++ skip_rx, skip_tx, ++ &rule->action_ste); + } + +-void mlx5hws_rule_free_action_ste(struct mlx5hws_rule_action_ste_info *action_ste) ++void mlx5hws_rule_free_action_ste(struct mlx5hws_action_ste_chunk *action_ste) + { +- struct mlx5hws_pool_chunk ste = {0}; +- +- if (!action_ste->num_stes) ++ if (!action_ste->action_tbl) + return; + +- ste.order = ilog2(roundup_pow_of_two(action_ste->num_stes)); +- ste.offset = action_ste->index; +- + /* This release is safe only when the rule match STE was deleted + * (when the rule is being deleted) or replaced with the new STE that + * isn't pointing to old action STEs (when the rule is being updated). + */ +- mlx5hws_pool_chunk_free(action_ste->pool, &ste); ++ mlx5hws_action_ste_chunk_free(action_ste); + } + + static void hws_rule_create_init(struct mlx5hws_rule *rule, +@@ -250,22 +236,15 @@ static void hws_rule_create_init(struct mlx5hws_rule *rule, + rule->rtc_0 = 0; + rule->rtc_1 = 0; + +- rule->action_ste.pool = NULL; +- rule->action_ste.num_stes = 0; +- rule->action_ste.index = -1; +- + rule->status = MLX5HWS_RULE_STATUS_CREATING; + } else { + rule->status = MLX5HWS_RULE_STATUS_UPDATING; ++ /* Save the old action STE info so we can free it after writing ++ * new action STEs and a corresponding match STE. ++ */ ++ rule->old_action_ste = rule->action_ste; + } + +- /* Initialize the old action STE info - shallow-copy action_ste. +- * In create flow this will set old_action_ste fields to initial values. +- * In update flow this will save the existing action STE info, +- * so that we will later use it to free old STEs. +- */ +- rule->old_action_ste = rule->action_ste; +- + rule->pending_wqes = 0; + + /* Init default send STE attributes */ +@@ -277,7 +256,6 @@ static void hws_rule_create_init(struct mlx5hws_rule *rule, + /* Init default action apply */ + apply->tbl_type = tbl->type; + apply->common_res = &ctx->common_res; +- apply->jump_to_action_stc = matcher->action_ste.stc.offset; + apply->require_dep = 0; + } + +@@ -353,17 +331,24 @@ static int hws_rule_create_hws(struct mlx5hws_rule *rule, + + if (action_stes) { + /* Allocate action STEs for rules that need more than match STE */ +- ret = hws_rule_alloc_action_ste(rule); ++ ret = mlx5hws_rule_alloc_action_ste(rule, attr->queue_id, ++ !!ste_attr.rtc_0, ++ !!ste_attr.rtc_1); + if (ret) { + mlx5hws_err(ctx, "Failed to allocate action memory %d", ret); + mlx5hws_send_abort_new_dep_wqe(queue); + return ret; + } ++ apply.jump_to_action_stc = ++ rule->action_ste.action_tbl->stc.offset; + /* Skip RX/TX based on the dep_wqe init */ +- ste_attr.rtc_0 = dep_wqe->rtc_0 ? matcher->action_ste.rtc_0_id : 0; +- ste_attr.rtc_1 = dep_wqe->rtc_1 ? matcher->action_ste.rtc_1_id : 0; ++ ste_attr.rtc_0 = dep_wqe->rtc_0 ? ++ rule->action_ste.action_tbl->rtc_0_id : 0; ++ ste_attr.rtc_1 = dep_wqe->rtc_1 ? ++ rule->action_ste.action_tbl->rtc_1_id : 0; + /* Action STEs are written to a specific index last to first */ +- ste_attr.direct_index = rule->action_ste.index + action_stes; ++ ste_attr.direct_index = ++ rule->action_ste.ste.offset + action_stes; + apply.next_direct_idx = ste_attr.direct_index; + } else { + apply.next_direct_idx = 0; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h +index b5ee94ac449b..1c47a9c11572 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h +@@ -43,12 +43,6 @@ struct mlx5hws_rule_match_tag { + }; + }; + +-struct mlx5hws_rule_action_ste_info { +- struct mlx5hws_pool *pool; +- int index; /* STE array index */ +- u8 num_stes; +-}; +- + struct mlx5hws_rule_resize_info { + u32 rtc_0; + u32 rtc_1; +@@ -64,8 +58,8 @@ struct mlx5hws_rule { + struct mlx5hws_rule_match_tag tag; + struct mlx5hws_rule_resize_info *resize_info; + }; +- struct mlx5hws_rule_action_ste_info action_ste; +- struct mlx5hws_rule_action_ste_info old_action_ste; ++ struct mlx5hws_action_ste_chunk action_ste; ++ struct mlx5hws_action_ste_chunk old_action_ste; + u32 rtc_0; /* The RTC into which the STE was inserted */ + u32 rtc_1; /* The RTC into which the STE was inserted */ + u8 status; /* enum mlx5hws_rule_status */ +@@ -75,7 +69,7 @@ struct mlx5hws_rule { + */ + }; + +-void mlx5hws_rule_free_action_ste(struct mlx5hws_rule_action_ste_info *action_ste); ++void mlx5hws_rule_free_action_ste(struct mlx5hws_action_ste_chunk *action_ste); + + int mlx5hws_rule_move_hws_remove(struct mlx5hws_rule *rule, + void *queue, void *user_data); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1355-net-mlx5-hws-cleanup-matcher-action-ste-table.patch b/SOURCES/1355-net-mlx5-hws-cleanup-matcher-action-ste-table.patch new file mode 100644 index 000000000..5923edc31 --- /dev/null +++ b/SOURCES/1355-net-mlx5-hws-cleanup-matcher-action-ste-table.patch @@ -0,0 +1,875 @@ +From 0e63b341ab3882d9bf6aacf824f70c2b41ef65e7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:00 -0400 +Subject: [PATCH] net/mlx5: HWS, Cleanup matcher action STE table + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 22174f16f1218fc98e374b3653decae54aa481f8 +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:40 2025 +0300 + + net/mlx5: HWS, Cleanup matcher action STE table + + Remove the matcher action STE implementation now that the code uses + per-queue action STE pools. This also allows simplifying matcher code + because it is now only handling a single type of RTC/STE. + + The matcher resize data is also going away. Matchers were saving old + action STE data because the rules still used it, but now that data lives + in the action STE pool and is no longer coupled to a matcher. + + Furthermore, matchers no longer need to rehash a due to action template + addition. If a new action template needs more action STEs, we simply + update the matcher's num_of_action_stes and future rules will allocate + the correct number. Existing rules are unaffected by such an operation + and can continue to use their existing action STEs. + + The range action was using the matcher action STE implementation, but + there was no reason to do this other than the container fitting the + purpose. Extract that information to a separate structure. + + Finally, stop dumping per-matcher information about action RTCs, + because they no longer exist. A later patch in this series will add + support for dumping action STE pools. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-11-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index 161ad720b339..bef4d25c1a2a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1574,13 +1574,13 @@ hws_action_create_dest_match_range_definer(struct mlx5hws_context *ctx) + return definer; + } + +-static struct mlx5hws_matcher_action_ste * ++static struct mlx5hws_range_action_table * + hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + struct mlx5hws_definer *definer, + u32 miss_ft_id) + { + struct mlx5hws_cmd_rtc_create_attr rtc_attr = {0}; +- struct mlx5hws_matcher_action_ste *table_ste; ++ struct mlx5hws_range_action_table *table_ste; + struct mlx5hws_pool_attr pool_attr = {0}; + struct mlx5hws_pool *ste_pool, *stc_pool; + u32 *rtc_0_id, *rtc_1_id; +@@ -1669,9 +1669,9 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx, + return NULL; + } + +-static void +-hws_action_destroy_dest_match_range_table(struct mlx5hws_context *ctx, +- struct mlx5hws_matcher_action_ste *table_ste) ++static void hws_action_destroy_dest_match_range_table( ++ struct mlx5hws_context *ctx, ++ struct mlx5hws_range_action_table *table_ste) + { + mutex_lock(&ctx->ctrl_lock); + +@@ -1683,12 +1683,11 @@ hws_action_destroy_dest_match_range_table(struct mlx5hws_context *ctx, + mutex_unlock(&ctx->ctrl_lock); + } + +-static int +-hws_action_create_dest_match_range_fill_table(struct mlx5hws_context *ctx, +- struct mlx5hws_matcher_action_ste *table_ste, +- struct mlx5hws_action *hit_ft_action, +- struct mlx5hws_definer *range_definer, +- u32 min, u32 max) ++static int hws_action_create_dest_match_range_fill_table( ++ struct mlx5hws_context *ctx, ++ struct mlx5hws_range_action_table *table_ste, ++ struct mlx5hws_action *hit_ft_action, ++ struct mlx5hws_definer *range_definer, u32 min, u32 max) + { + struct mlx5hws_wqe_gta_data_seg_ste match_wqe_data = {0}; + struct mlx5hws_wqe_gta_data_seg_ste range_wqe_data = {0}; +@@ -1784,7 +1783,7 @@ mlx5hws_action_create_dest_match_range(struct mlx5hws_context *ctx, + u32 min, u32 max, u32 flags) + { + struct mlx5hws_cmd_stc_modify_attr stc_attr = {0}; +- struct mlx5hws_matcher_action_ste *table_ste; ++ struct mlx5hws_range_action_table *table_ste; + struct mlx5hws_action *hit_ft_action; + struct mlx5hws_definer *definer; + struct mlx5hws_action *action; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h +index 64b76075f7f8..25fa0d4c9221 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h +@@ -118,6 +118,12 @@ struct mlx5hws_action_template { + u8 only_term; + }; + ++struct mlx5hws_range_action_table { ++ struct mlx5hws_pool *pool; ++ u32 rtc_0_id; ++ u32 rtc_1_id; ++}; ++ + struct mlx5hws_action { + u8 type; + u8 flags; +@@ -186,7 +192,7 @@ struct mlx5hws_action { + size_t size; + } remove_header; + struct { +- struct mlx5hws_matcher_action_ste *table_ste; ++ struct mlx5hws_range_action_table *table_ste; + struct mlx5hws_action *hit_ft_action; + struct mlx5hws_definer *definer; + } range; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 32de8bfc7644..510bfbbe5991 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -478,21 +478,9 @@ hws_bwc_matcher_size_maxed_out(struct mlx5hws_bwc_matcher *bwc_matcher) + struct mlx5hws_cmd_query_caps *caps = bwc_matcher->matcher->tbl->ctx->caps; + + /* check the match RTC size */ +- if ((bwc_matcher->size_log + +- MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH + +- MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP) > +- (caps->ste_alloc_log_max - 1)) +- return true; +- +- /* check the action RTC size */ +- if ((bwc_matcher->size_log + +- MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP + +- ilog2(roundup_pow_of_two(bwc_matcher->matcher->action_ste.max_stes)) + +- MLX5HWS_MATCHER_ACTION_RTC_UPDATE_MULT) > +- (caps->ste_alloc_log_max - 1)) +- return true; +- +- return false; ++ return (bwc_matcher->size_log + MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH + ++ MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP) > ++ (caps->ste_alloc_log_max - 1); + } + + static bool +@@ -779,19 +767,6 @@ hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher) + return hws_bwc_matcher_move(bwc_matcher); + } + +-static int +-hws_bwc_matcher_rehash_at(struct mlx5hws_bwc_matcher *bwc_matcher) +-{ +- /* Rehash by action template doesn't require any additional checking. +- * The bwc_matcher already contains the new action template. +- * Just do the usual rehash: +- * - create new matcher +- * - move all the rules to the new matcher +- * - destroy the old matcher +- */ +- return hws_bwc_matcher_move(bwc_matcher); +-} +- + int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + u32 *match_param, + struct mlx5hws_rule_action rule_actions[], +@@ -803,7 +778,6 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_rule_attr rule_attr; + struct mutex *queue_lock; /* Protect the queue */ + u32 num_of_rules; +- bool need_rehash; + int ret = 0; + int at_idx; + +@@ -830,30 +804,11 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + at_idx = bwc_matcher->num_of_at - 1; + + ret = mlx5hws_matcher_attach_at(bwc_matcher->matcher, +- bwc_matcher->at[at_idx], +- &need_rehash); ++ bwc_matcher->at[at_idx]); + if (unlikely(ret)) { + hws_bwc_unlock_all_queues(ctx); + return ret; + } +- if (unlikely(need_rehash)) { +- /* The new action template requires more action STEs. +- * Need to attempt creating new matcher with all +- * the action templates, including the new one. +- */ +- ret = hws_bwc_matcher_rehash_at(bwc_matcher); +- if (unlikely(ret)) { +- mlx5hws_action_template_destroy(bwc_matcher->at[at_idx]); +- bwc_matcher->at[at_idx] = NULL; +- bwc_matcher->num_of_at--; +- +- hws_bwc_unlock_all_queues(ctx); +- +- mlx5hws_err(ctx, +- "BWC rule insertion: rehash AT failed (%d)\n", ret); +- return ret; +- } +- } + + hws_bwc_unlock_all_queues(ctx); + mutex_lock(queue_lock); +@@ -973,7 +928,6 @@ hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; + struct mlx5hws_rule_attr rule_attr; + struct mutex *queue_lock; /* Protect the queue */ +- bool need_rehash; + int at_idx, ret; + u16 idx; + +@@ -1005,32 +959,11 @@ hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + at_idx = bwc_matcher->num_of_at - 1; + + ret = mlx5hws_matcher_attach_at(bwc_matcher->matcher, +- bwc_matcher->at[at_idx], +- &need_rehash); ++ bwc_matcher->at[at_idx]); + if (unlikely(ret)) { + hws_bwc_unlock_all_queues(ctx); + return ret; + } +- if (unlikely(need_rehash)) { +- /* The new action template requires more action +- * STEs. Need to attempt creating new matcher +- * with all the action templates, including the +- * new one. +- */ +- ret = hws_bwc_matcher_rehash_at(bwc_matcher); +- if (unlikely(ret)) { +- mlx5hws_action_template_destroy(bwc_matcher->at[at_idx]); +- bwc_matcher->at[at_idx] = NULL; +- bwc_matcher->num_of_at--; +- +- hws_bwc_unlock_all_queues(ctx); +- +- mlx5hws_err(ctx, +- "BWC rule update: rehash AT failed (%d)\n", +- ret); +- return ret; +- } +- } + } + + hws_bwc_unlock_all_queues(ctx); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +index 3491408c5d84..38f75dec9cfc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +@@ -146,18 +146,6 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma + matcher->match_ste.rtc_1_id, + (int)ste_1_id); + +- ste_pool = matcher->action_ste.pool; +- if (ste_pool) { +- ste_0_id = mlx5hws_pool_get_base_id(ste_pool); +- if (tbl_type == MLX5HWS_TABLE_TYPE_FDB) +- ste_1_id = mlx5hws_pool_get_base_mirror_id(ste_pool); +- else +- ste_1_id = -1; +- } else { +- ste_0_id = -1; +- ste_1_id = -1; +- } +- + ft_attr.type = matcher->tbl->fw_ft_type; + ret = mlx5hws_cmd_flow_table_query(matcher->tbl->ctx->mdev, + matcher->end_ft_id, +@@ -167,10 +155,7 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma + if (ret) + return ret; + +- seq_printf(f, ",%d,%d,%d,%d,%d,0x%llx,0x%llx\n", +- matcher->action_ste.rtc_0_id, (int)ste_0_id, +- matcher->action_ste.rtc_1_id, (int)ste_1_id, +- 0, ++ seq_printf(f, ",-1,-1,-1,-1,0,0x%llx,0x%llx\n", + mlx5hws_debug_icm_to_idx(icm_addr_0), + mlx5hws_debug_icm_to_idx(icm_addr_1)); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index 3028e0387e3f..716502732d3d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -3,25 +3,6 @@ + + #include "internal.h" + +-enum mlx5hws_matcher_rtc_type { +- HWS_MATCHER_RTC_TYPE_MATCH, +- HWS_MATCHER_RTC_TYPE_STE_ARRAY, +- HWS_MATCHER_RTC_TYPE_MAX, +-}; +- +-static const char * const mlx5hws_matcher_rtc_type_str[] = { +- [HWS_MATCHER_RTC_TYPE_MATCH] = "MATCH", +- [HWS_MATCHER_RTC_TYPE_STE_ARRAY] = "STE_ARRAY", +- [HWS_MATCHER_RTC_TYPE_MAX] = "UNKNOWN", +-}; +- +-static const char *hws_matcher_rtc_type_to_str(enum mlx5hws_matcher_rtc_type rtc_type) +-{ +- if (rtc_type > HWS_MATCHER_RTC_TYPE_MAX) +- rtc_type = HWS_MATCHER_RTC_TYPE_MAX; +- return mlx5hws_matcher_rtc_type_str[rtc_type]; +-} +- + static bool hws_matcher_requires_col_tbl(u8 log_num_of_rules) + { + /* Collision table concatenation is done only for large rule tables */ +@@ -209,83 +190,52 @@ static void hws_matcher_set_rtc_attr_sz(struct mlx5hws_matcher *matcher, + } + } + +-static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, +- enum mlx5hws_matcher_rtc_type rtc_type) ++static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher) + { + struct mlx5hws_matcher_attr *attr = &matcher->attr; + struct mlx5hws_cmd_rtc_create_attr rtc_attr = {0}; + struct mlx5hws_match_template *mt = matcher->mt; + struct mlx5hws_context *ctx = matcher->tbl->ctx; +- struct mlx5hws_matcher_action_ste *action_ste; + struct mlx5hws_table *tbl = matcher->tbl; +- struct mlx5hws_pool *ste_pool; +- u32 *rtc_0_id, *rtc_1_id; + u32 obj_id; + int ret; + +- switch (rtc_type) { +- case HWS_MATCHER_RTC_TYPE_MATCH: +- rtc_0_id = &matcher->match_ste.rtc_0_id; +- rtc_1_id = &matcher->match_ste.rtc_1_id; +- ste_pool = matcher->match_ste.pool; +- +- rtc_attr.log_size = attr->table.sz_row_log; +- rtc_attr.log_depth = attr->table.sz_col_log; +- rtc_attr.is_frst_jumbo = mlx5hws_matcher_mt_is_jumbo(mt); +- rtc_attr.is_scnd_range = 0; +- rtc_attr.miss_ft_id = matcher->end_ft_id; +- +- if (attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_HASH) { +- /* The usual Hash Table */ +- rtc_attr.update_index_mode = MLX5_IFC_RTC_STE_UPDATE_MODE_BY_HASH; +- +- /* The first mt is used since all share the same definer */ +- rtc_attr.match_definer_0 = mlx5hws_definer_get_id(mt->definer); +- } else if (attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_INDEX) { +- rtc_attr.update_index_mode = MLX5_IFC_RTC_STE_UPDATE_MODE_BY_OFFSET; +- rtc_attr.num_hash_definer = 1; +- +- if (attr->distribute_mode == MLX5HWS_MATCHER_DISTRIBUTE_BY_HASH) { +- /* Hash Split Table */ +- rtc_attr.access_index_mode = MLX5_IFC_RTC_STE_ACCESS_MODE_BY_HASH; +- rtc_attr.match_definer_0 = mlx5hws_definer_get_id(mt->definer); +- } else if (attr->distribute_mode == MLX5HWS_MATCHER_DISTRIBUTE_BY_LINEAR) { +- /* Linear Lookup Table */ +- rtc_attr.access_index_mode = MLX5_IFC_RTC_STE_ACCESS_MODE_LINEAR; +- rtc_attr.match_definer_0 = ctx->caps->linear_match_definer; +- } ++ rtc_attr.log_size = attr->table.sz_row_log; ++ rtc_attr.log_depth = attr->table.sz_col_log; ++ rtc_attr.is_frst_jumbo = mlx5hws_matcher_mt_is_jumbo(mt); ++ rtc_attr.is_scnd_range = 0; ++ rtc_attr.miss_ft_id = matcher->end_ft_id; ++ ++ if (attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_HASH) { ++ /* The usual Hash Table */ ++ rtc_attr.update_index_mode = ++ MLX5_IFC_RTC_STE_UPDATE_MODE_BY_HASH; ++ ++ /* The first mt is used since all share the same definer */ ++ rtc_attr.match_definer_0 = mlx5hws_definer_get_id(mt->definer); ++ } else if (attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_INDEX) { ++ rtc_attr.update_index_mode = ++ MLX5_IFC_RTC_STE_UPDATE_MODE_BY_OFFSET; ++ rtc_attr.num_hash_definer = 1; ++ ++ if (attr->distribute_mode == ++ MLX5HWS_MATCHER_DISTRIBUTE_BY_HASH) { ++ /* Hash Split Table */ ++ rtc_attr.access_index_mode = ++ MLX5_IFC_RTC_STE_ACCESS_MODE_BY_HASH; ++ rtc_attr.match_definer_0 = ++ mlx5hws_definer_get_id(mt->definer); ++ } else if (attr->distribute_mode == ++ MLX5HWS_MATCHER_DISTRIBUTE_BY_LINEAR) { ++ /* Linear Lookup Table */ ++ rtc_attr.access_index_mode = ++ MLX5_IFC_RTC_STE_ACCESS_MODE_LINEAR; ++ rtc_attr.match_definer_0 = ++ ctx->caps->linear_match_definer; + } +- break; +- +- case HWS_MATCHER_RTC_TYPE_STE_ARRAY: +- action_ste = &matcher->action_ste; +- +- rtc_0_id = &action_ste->rtc_0_id; +- rtc_1_id = &action_ste->rtc_1_id; +- ste_pool = action_ste->pool; +- /* Action RTC size calculation: +- * log((max number of rules in matcher) * +- * (max number of action STEs per rule) * +- * (2 to support writing new STEs for update rule)) +- */ +- rtc_attr.log_size = +- ilog2(roundup_pow_of_two(action_ste->max_stes)) + +- attr->table.sz_row_log + +- MLX5HWS_MATCHER_ACTION_RTC_UPDATE_MULT; +- rtc_attr.log_depth = 0; +- rtc_attr.update_index_mode = MLX5_IFC_RTC_STE_UPDATE_MODE_BY_OFFSET; +- /* The action STEs use the default always hit definer */ +- rtc_attr.match_definer_0 = ctx->caps->trivial_match_definer; +- rtc_attr.is_frst_jumbo = false; +- rtc_attr.miss_ft_id = 0; +- break; +- +- default: +- mlx5hws_err(ctx, "HWS Invalid RTC type\n"); +- return -EINVAL; + } + +- obj_id = mlx5hws_pool_get_base_id(ste_pool); ++ obj_id = mlx5hws_pool_get_base_id(matcher->match_ste.pool); + + rtc_attr.pd = ctx->pd_num; + rtc_attr.ste_base = obj_id; +@@ -297,15 +247,16 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + obj_id = mlx5hws_pool_get_base_id(ctx->stc_pool); + rtc_attr.stc_base = obj_id; + +- ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_0_id); ++ ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, ++ &matcher->match_ste.rtc_0_id); + if (ret) { +- mlx5hws_err(ctx, "Failed to create matcher RTC of type %s", +- hws_matcher_rtc_type_to_str(rtc_type)); ++ mlx5hws_err(ctx, "Failed to create matcher RTC\n"); + return ret; + } + + if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) { +- obj_id = mlx5hws_pool_get_base_mirror_id(ste_pool); ++ obj_id = mlx5hws_pool_get_base_mirror_id( ++ matcher->match_ste.pool); + rtc_attr.ste_base = obj_id; + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, true); + +@@ -313,10 +264,10 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + rtc_attr.stc_base = obj_id; + hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, true); + +- ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_1_id); ++ ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, ++ &matcher->match_ste.rtc_1_id); + if (ret) { +- mlx5hws_err(ctx, "Failed to create peer matcher RTC of type %s", +- hws_matcher_rtc_type_to_str(rtc_type)); ++ mlx5hws_err(ctx, "Failed to create mirror matcher RTC\n"); + goto destroy_rtc_0; + } + } +@@ -324,33 +275,18 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher, + return 0; + + destroy_rtc_0: +- mlx5hws_cmd_rtc_destroy(ctx->mdev, *rtc_0_id); ++ mlx5hws_cmd_rtc_destroy(ctx->mdev, matcher->match_ste.rtc_0_id); + return ret; + } + +-static void hws_matcher_destroy_rtc(struct mlx5hws_matcher *matcher, +- enum mlx5hws_matcher_rtc_type rtc_type) ++static void hws_matcher_destroy_rtc(struct mlx5hws_matcher *matcher) + { +- struct mlx5hws_table *tbl = matcher->tbl; +- u32 rtc_0_id, rtc_1_id; +- +- switch (rtc_type) { +- case HWS_MATCHER_RTC_TYPE_MATCH: +- rtc_0_id = matcher->match_ste.rtc_0_id; +- rtc_1_id = matcher->match_ste.rtc_1_id; +- break; +- case HWS_MATCHER_RTC_TYPE_STE_ARRAY: +- rtc_0_id = matcher->action_ste.rtc_0_id; +- rtc_1_id = matcher->action_ste.rtc_1_id; +- break; +- default: +- return; +- } ++ struct mlx5_core_dev *mdev = matcher->tbl->ctx->mdev; + +- if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) +- mlx5hws_cmd_rtc_destroy(tbl->ctx->mdev, rtc_1_id); ++ if (matcher->tbl->type == MLX5HWS_TABLE_TYPE_FDB) ++ mlx5hws_cmd_rtc_destroy(mdev, matcher->match_ste.rtc_1_id); + +- mlx5hws_cmd_rtc_destroy(tbl->ctx->mdev, rtc_0_id); ++ mlx5hws_cmd_rtc_destroy(mdev, matcher->match_ste.rtc_0_id); + } + + static int +@@ -418,85 +354,17 @@ static int hws_matcher_check_and_process_at(struct mlx5hws_matcher *matcher, + return 0; + } + +-static int hws_matcher_resize_init(struct mlx5hws_matcher *src_matcher) +-{ +- struct mlx5hws_matcher_resize_data *resize_data; +- +- resize_data = kzalloc(sizeof(*resize_data), GFP_KERNEL); +- if (!resize_data) +- return -ENOMEM; +- +- resize_data->max_stes = src_matcher->action_ste.max_stes; +- +- resize_data->stc = src_matcher->action_ste.stc; +- resize_data->rtc_0_id = src_matcher->action_ste.rtc_0_id; +- resize_data->rtc_1_id = src_matcher->action_ste.rtc_1_id; +- resize_data->pool = src_matcher->action_ste.max_stes ? +- src_matcher->action_ste.pool : NULL; +- +- /* Place the new resized matcher on the dst matcher's list */ +- list_add(&resize_data->list_node, &src_matcher->resize_dst->resize_data); +- +- /* Move all the previous resized matchers to the dst matcher's list */ +- while (!list_empty(&src_matcher->resize_data)) { +- resize_data = list_first_entry(&src_matcher->resize_data, +- struct mlx5hws_matcher_resize_data, +- list_node); +- list_del_init(&resize_data->list_node); +- list_add(&resize_data->list_node, &src_matcher->resize_dst->resize_data); +- } +- +- return 0; +-} +- +-static void hws_matcher_resize_uninit(struct mlx5hws_matcher *matcher) +-{ +- struct mlx5hws_matcher_resize_data *resize_data; +- +- if (!mlx5hws_matcher_is_resizable(matcher)) +- return; +- +- while (!list_empty(&matcher->resize_data)) { +- resize_data = list_first_entry(&matcher->resize_data, +- struct mlx5hws_matcher_resize_data, +- list_node); +- list_del_init(&resize_data->list_node); +- +- if (resize_data->max_stes) { +- mlx5hws_action_free_single_stc(matcher->tbl->ctx, +- matcher->tbl->type, +- &resize_data->stc); +- +- if (matcher->tbl->type == MLX5HWS_TABLE_TYPE_FDB) +- mlx5hws_cmd_rtc_destroy(matcher->tbl->ctx->mdev, +- resize_data->rtc_1_id); +- +- mlx5hws_cmd_rtc_destroy(matcher->tbl->ctx->mdev, +- resize_data->rtc_0_id); +- +- if (resize_data->pool) +- mlx5hws_pool_destroy(resize_data->pool); +- } +- +- kfree(resize_data); +- } +-} +- + static int hws_matcher_bind_at(struct mlx5hws_matcher *matcher) + { + bool is_jumbo = mlx5hws_matcher_mt_is_jumbo(matcher->mt); +- struct mlx5hws_cmd_stc_modify_attr stc_attr = {0}; +- struct mlx5hws_matcher_action_ste *action_ste; +- struct mlx5hws_table *tbl = matcher->tbl; +- struct mlx5hws_pool_attr pool_attr = {0}; +- struct mlx5hws_context *ctx = tbl->ctx; +- u32 required_stes; +- u8 max_stes = 0; ++ struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ u8 required_stes, max_stes; + int i, ret; + + if (matcher->flags & MLX5HWS_MATCHER_FLAGS_COLLISION) + return 0; + ++ max_stes = 0; + for (i = 0; i < matcher->num_of_at; i++) { + struct mlx5hws_action_template *at = &matcher->at[i]; + +@@ -512,74 +380,9 @@ static int hws_matcher_bind_at(struct mlx5hws_matcher *matcher) + /* Future: Optimize reparse */ + } + +- /* There are no additional STEs required for matcher */ +- if (!max_stes) +- return 0; +- +- matcher->action_ste.max_stes = max_stes; +- +- action_ste = &matcher->action_ste; +- +- /* Allocate action STE mempool */ +- pool_attr.table_type = tbl->type; +- pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE; +- pool_attr.flags = MLX5HWS_POOL_FLAG_BUDDY; +- /* Pool size is similar to action RTC size */ +- pool_attr.alloc_log_sz = ilog2(roundup_pow_of_two(action_ste->max_stes)) + +- matcher->attr.table.sz_row_log + +- MLX5HWS_MATCHER_ACTION_RTC_UPDATE_MULT; +- hws_matcher_set_pool_attr(&pool_attr, matcher); +- action_ste->pool = mlx5hws_pool_create(ctx, &pool_attr); +- if (!action_ste->pool) { +- mlx5hws_err(ctx, "Failed to create action ste pool\n"); +- return -EINVAL; +- } +- +- /* Allocate action RTC */ +- ret = hws_matcher_create_rtc(matcher, HWS_MATCHER_RTC_TYPE_STE_ARRAY); +- if (ret) { +- mlx5hws_err(ctx, "Failed to create action RTC\n"); +- goto free_ste_pool; +- } +- +- /* Allocate STC for jumps to STE */ +- stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT; +- stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_JUMP_TO_STE_TABLE; +- stc_attr.reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE; +- stc_attr.ste_table.ste_pool = action_ste->pool; +- stc_attr.ste_table.match_definer_id = ctx->caps->trivial_match_definer; +- +- ret = mlx5hws_action_alloc_single_stc(ctx, &stc_attr, tbl->type, +- &action_ste->stc); +- if (ret) { +- mlx5hws_err(ctx, "Failed to create action jump to table STC\n"); +- goto free_rtc; +- } ++ matcher->num_of_action_stes = max_stes; + + return 0; +- +-free_rtc: +- hws_matcher_destroy_rtc(matcher, HWS_MATCHER_RTC_TYPE_STE_ARRAY); +-free_ste_pool: +- mlx5hws_pool_destroy(action_ste->pool); +- return ret; +-} +- +-static void hws_matcher_unbind_at(struct mlx5hws_matcher *matcher) +-{ +- struct mlx5hws_matcher_action_ste *action_ste; +- struct mlx5hws_table *tbl = matcher->tbl; +- +- action_ste = &matcher->action_ste; +- +- if (!action_ste->max_stes || +- matcher->flags & MLX5HWS_MATCHER_FLAGS_COLLISION || +- mlx5hws_matcher_is_in_resize(matcher)) +- return; +- +- mlx5hws_action_free_single_stc(tbl->ctx, tbl->type, &action_ste->stc); +- hws_matcher_destroy_rtc(matcher, HWS_MATCHER_RTC_TYPE_STE_ARRAY); +- mlx5hws_pool_destroy(action_ste->pool); + } + + static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) +@@ -723,10 +526,10 @@ static int hws_matcher_create_and_connect(struct mlx5hws_matcher *matcher) + /* Create matcher end flow table anchor */ + ret = hws_matcher_create_end_ft(matcher); + if (ret) +- goto unbind_at; ++ goto unbind_mt; + + /* Allocate the RTC for the new matcher */ +- ret = hws_matcher_create_rtc(matcher, HWS_MATCHER_RTC_TYPE_MATCH); ++ ret = hws_matcher_create_rtc(matcher); + if (ret) + goto destroy_end_ft; + +@@ -738,11 +541,9 @@ static int hws_matcher_create_and_connect(struct mlx5hws_matcher *matcher) + return 0; + + destroy_rtc: +- hws_matcher_destroy_rtc(matcher, HWS_MATCHER_RTC_TYPE_MATCH); ++ hws_matcher_destroy_rtc(matcher); + destroy_end_ft: + hws_matcher_destroy_end_ft(matcher); +-unbind_at: +- hws_matcher_unbind_at(matcher); + unbind_mt: + hws_matcher_unbind_mt(matcher); + return ret; +@@ -750,11 +551,9 @@ static int hws_matcher_create_and_connect(struct mlx5hws_matcher *matcher) + + static void hws_matcher_destroy_and_disconnect(struct mlx5hws_matcher *matcher) + { +- hws_matcher_resize_uninit(matcher); + hws_matcher_disconnect(matcher); +- hws_matcher_destroy_rtc(matcher, HWS_MATCHER_RTC_TYPE_MATCH); ++ hws_matcher_destroy_rtc(matcher); + hws_matcher_destroy_end_ft(matcher); +- hws_matcher_unbind_at(matcher); + hws_matcher_unbind_mt(matcher); + } + +@@ -776,8 +575,6 @@ hws_matcher_create_col_matcher(struct mlx5hws_matcher *matcher) + if (!col_matcher) + return -ENOMEM; + +- INIT_LIST_HEAD(&col_matcher->resize_data); +- + col_matcher->tbl = matcher->tbl; + col_matcher->mt = matcher->mt; + col_matcher->at = matcher->at; +@@ -831,8 +628,6 @@ static int hws_matcher_init(struct mlx5hws_matcher *matcher) + struct mlx5hws_context *ctx = matcher->tbl->ctx; + int ret; + +- INIT_LIST_HEAD(&matcher->resize_data); +- + mutex_lock(&ctx->ctrl_lock); + + /* Allocate matcher resource and connect to the packet pipe */ +@@ -889,16 +684,12 @@ static int hws_matcher_grow_at_array(struct mlx5hws_matcher *matcher) + } + + int mlx5hws_matcher_attach_at(struct mlx5hws_matcher *matcher, +- struct mlx5hws_action_template *at, +- bool *need_rehash) ++ struct mlx5hws_action_template *at) + { + bool is_jumbo = mlx5hws_matcher_mt_is_jumbo(matcher->mt); +- struct mlx5hws_context *ctx = matcher->tbl->ctx; + u32 required_stes; + int ret; + +- *need_rehash = false; +- + if (unlikely(matcher->num_of_at >= matcher->size_of_at_array)) { + ret = hws_matcher_grow_at_array(matcher); + if (ret) +@@ -916,11 +707,8 @@ int mlx5hws_matcher_attach_at(struct mlx5hws_matcher *matcher, + return ret; + + required_stes = at->num_of_action_stes - (!is_jumbo || at->only_term); +- if (matcher->action_ste.max_stes < required_stes) { +- mlx5hws_dbg(ctx, "Required STEs [%d] exceeds initial action template STE [%d]\n", +- required_stes, matcher->action_ste.max_stes); +- *need_rehash = true; +- } ++ if (matcher->num_of_action_stes < required_stes) ++ matcher->num_of_action_stes = required_stes; + + matcher->at[matcher->num_of_at] = *at; + matcher->num_of_at += 1; +@@ -1102,7 +890,7 @@ static int hws_matcher_resize_precheck(struct mlx5hws_matcher *src_matcher, + return -EINVAL; + } + +- if (src_matcher->action_ste.max_stes > dst_matcher->action_ste.max_stes) { ++ if (src_matcher->num_of_action_stes > dst_matcher->num_of_action_stes) { + mlx5hws_err(ctx, "Src/dst matcher max STEs mismatch\n"); + return -EINVAL; + } +@@ -1131,10 +919,6 @@ int mlx5hws_matcher_resize_set_target(struct mlx5hws_matcher *src_matcher, + + src_matcher->resize_dst = dst_matcher; + +- ret = hws_matcher_resize_init(src_matcher); +- if (ret) +- src_matcher->resize_dst = NULL; +- + out: + mutex_unlock(&src_matcher->tbl->ctx->ctrl_lock); + return ret; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +index 0450b6175ac9..bad1fa8f77fd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +@@ -50,23 +50,6 @@ struct mlx5hws_matcher_match_ste { + struct mlx5hws_pool *pool; + }; + +-struct mlx5hws_matcher_action_ste { +- struct mlx5hws_pool_chunk stc; +- u32 rtc_0_id; +- u32 rtc_1_id; +- struct mlx5hws_pool *pool; +- u8 max_stes; +-}; +- +-struct mlx5hws_matcher_resize_data { +- struct mlx5hws_pool_chunk stc; +- u32 rtc_0_id; +- u32 rtc_1_id; +- struct mlx5hws_pool *pool; +- u8 max_stes; +- struct list_head list_node; +-}; +- + struct mlx5hws_matcher { + struct mlx5hws_table *tbl; + struct mlx5hws_matcher_attr attr; +@@ -75,15 +58,14 @@ struct mlx5hws_matcher { + u8 num_of_at; + u8 size_of_at_array; + u8 num_of_mt; ++ u8 num_of_action_stes; + /* enum mlx5hws_matcher_flags */ + u8 flags; + u32 end_ft_id; + struct mlx5hws_matcher *col_matcher; + struct mlx5hws_matcher *resize_dst; + struct mlx5hws_matcher_match_ste match_ste; +- struct mlx5hws_matcher_action_ste action_ste; + struct list_head list_node; +- struct list_head resize_data; + }; + + static inline bool +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index 8ed8a715a2eb..5121951f2778 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -399,14 +399,11 @@ int mlx5hws_matcher_destroy(struct mlx5hws_matcher *matcher); + * + * @matcher: Matcher to attach the action template to. + * @at: Action template to be attached to the matcher. +- * @need_rehash: Output parameter that tells callers if the matcher needs to be +- * rehashed. + * + * Return: Zero on success, non-zero otherwise. + */ + int mlx5hws_matcher_attach_at(struct mlx5hws_matcher *matcher, +- struct mlx5hws_action_template *at, +- bool *need_rehash); ++ struct mlx5hws_action_template *at); + + /** + * mlx5hws_matcher_resize_set_target - Link two matchers and enable moving rules. +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +index 5b758467ed03..9e6f35d68445 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +@@ -203,7 +203,7 @@ static int mlx5hws_rule_alloc_action_ste(struct mlx5hws_rule *rule, + struct mlx5hws_context *ctx = matcher->tbl->ctx; + + rule->action_ste.ste.order = +- ilog2(roundup_pow_of_two(matcher->action_ste.max_stes)); ++ ilog2(roundup_pow_of_two(matcher->num_of_action_stes)); + return mlx5hws_action_ste_chunk_alloc(&ctx->action_ste_pool[queue_id], + skip_rx, skip_tx, + &rule->action_ste); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1356-net-mlx5-hws-free-unused-action-ste-tables.patch b/SOURCES/1356-net-mlx5-hws-free-unused-action-ste-tables.patch new file mode 100644 index 000000000..2765fcc4c --- /dev/null +++ b/SOURCES/1356-net-mlx5-hws-free-unused-action-ste-tables.patch @@ -0,0 +1,254 @@ +From b2936ce02b8545dd8b6b4bc1a135ba7d19d63488 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:00 -0400 +Subject: [PATCH] net/mlx5: HWS, Free unused action STE tables + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 864531ca2072c55ff00ba9dfd8c15cf0f576051b +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:41 2025 +0300 + + net/mlx5: HWS, Free unused action STE tables + + Periodically check for unused action STE tables and free their + associated resources. In order to do this safely, add a per-queue lock + to synchronize the garbage collect work with regular operations on + steering rules. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-12-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c +index cb6ad8411631..5766a9c82f96 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c +@@ -159,6 +159,7 @@ hws_action_ste_table_alloc(struct mlx5hws_action_ste_pool_element *parent_elem) + + action_tbl->parent_elem = parent_elem; + INIT_LIST_HEAD(&action_tbl->list_node); ++ action_tbl->last_used = jiffies; + list_add(&action_tbl->list_node, &parent_elem->available); + parent_elem->log_sz = log_sz; + +@@ -236,6 +237,8 @@ static int hws_action_ste_pool_init(struct mlx5hws_context *ctx, + enum mlx5hws_pool_optimize opt; + int err; + ++ mutex_init(&pool->lock); ++ + /* Rules which are added for both RX and TX must use the same action STE + * indices for both. If we were to use a single table, then RX-only and + * TX-only rules would waste the unused entries. Thus, we use separate +@@ -247,6 +250,7 @@ static int hws_action_ste_pool_init(struct mlx5hws_context *ctx, + opt); + if (err) + goto destroy_elems; ++ pool->elems[opt].parent_pool = pool; + } + + return 0; +@@ -267,6 +271,58 @@ static void hws_action_ste_pool_destroy(struct mlx5hws_action_ste_pool *pool) + hws_action_ste_pool_element_destroy(&pool->elems[opt]); + } + ++static void hws_action_ste_pool_element_collect_stale( ++ struct mlx5hws_action_ste_pool_element *elem, struct list_head *cleanup) ++{ ++ struct mlx5hws_action_ste_table *action_tbl, *p; ++ unsigned long expire_time, now; ++ ++ expire_time = secs_to_jiffies(MLX5HWS_ACTION_STE_POOL_EXPIRE_SECONDS); ++ now = jiffies; ++ ++ list_for_each_entry_safe(action_tbl, p, &elem->available, list_node) { ++ if (mlx5hws_pool_full(action_tbl->pool) && ++ time_before(action_tbl->last_used + expire_time, now)) ++ list_move(&action_tbl->list_node, cleanup); ++ } ++} ++ ++static void hws_action_ste_table_cleanup_list(struct list_head *cleanup) ++{ ++ struct mlx5hws_action_ste_table *action_tbl, *p; ++ ++ list_for_each_entry_safe(action_tbl, p, cleanup, list_node) ++ hws_action_ste_table_destroy(action_tbl); ++} ++ ++static void hws_action_ste_pool_cleanup(struct work_struct *work) ++{ ++ enum mlx5hws_pool_optimize opt; ++ struct mlx5hws_context *ctx; ++ LIST_HEAD(cleanup); ++ int i; ++ ++ ctx = container_of(work, struct mlx5hws_context, ++ action_ste_cleanup.work); ++ ++ for (i = 0; i < ctx->queues; i++) { ++ struct mlx5hws_action_ste_pool *p = &ctx->action_ste_pool[i]; ++ ++ mutex_lock(&p->lock); ++ for (opt = MLX5HWS_POOL_OPTIMIZE_NONE; ++ opt < MLX5HWS_POOL_OPTIMIZE_MAX; opt++) ++ hws_action_ste_pool_element_collect_stale( ++ &p->elems[opt], &cleanup); ++ mutex_unlock(&p->lock); ++ } ++ ++ hws_action_ste_table_cleanup_list(&cleanup); ++ ++ schedule_delayed_work(&ctx->action_ste_cleanup, ++ secs_to_jiffies( ++ MLX5HWS_ACTION_STE_POOL_CLEANUP_SECONDS)); ++} ++ + int mlx5hws_action_ste_pool_init(struct mlx5hws_context *ctx) + { + struct mlx5hws_action_ste_pool *pool; +@@ -285,6 +341,12 @@ int mlx5hws_action_ste_pool_init(struct mlx5hws_context *ctx) + + ctx->action_ste_pool = pool; + ++ INIT_DELAYED_WORK(&ctx->action_ste_cleanup, ++ hws_action_ste_pool_cleanup); ++ schedule_delayed_work( ++ &ctx->action_ste_cleanup, ++ secs_to_jiffies(MLX5HWS_ACTION_STE_POOL_CLEANUP_SECONDS)); ++ + return 0; + + free_pool: +@@ -300,6 +362,8 @@ void mlx5hws_action_ste_pool_uninit(struct mlx5hws_context *ctx) + size_t queues = ctx->queues; + int i; + ++ cancel_delayed_work_sync(&ctx->action_ste_cleanup); ++ + for (i = 0; i < queues; i++) + hws_action_ste_pool_destroy(&ctx->action_ste_pool[i]); + +@@ -330,6 +394,7 @@ hws_action_ste_table_chunk_alloc(struct mlx5hws_action_ste_table *action_tbl, + return err; + + chunk->action_tbl = action_tbl; ++ action_tbl->last_used = jiffies; + + return 0; + } +@@ -346,6 +411,8 @@ int mlx5hws_action_ste_chunk_alloc(struct mlx5hws_action_ste_pool *pool, + if (skip_rx && skip_tx) + return -EINVAL; + ++ mutex_lock(&pool->lock); ++ + elem = hws_action_ste_choose_elem(pool, skip_rx, skip_tx); + + mlx5hws_dbg(elem->ctx, +@@ -362,26 +429,39 @@ int mlx5hws_action_ste_chunk_alloc(struct mlx5hws_action_ste_pool *pool, + + if (!found) { + action_tbl = hws_action_ste_table_alloc(elem); +- if (IS_ERR(action_tbl)) +- return PTR_ERR(action_tbl); ++ if (IS_ERR(action_tbl)) { ++ err = PTR_ERR(action_tbl); ++ goto out; ++ } + + err = hws_action_ste_table_chunk_alloc(action_tbl, chunk); + if (err) +- return err; ++ goto out; + } + + if (mlx5hws_pool_empty(action_tbl->pool)) + list_move(&action_tbl->list_node, &elem->full); + +- return 0; ++ err = 0; ++ ++out: ++ mutex_unlock(&pool->lock); ++ ++ return err; + } + + void mlx5hws_action_ste_chunk_free(struct mlx5hws_action_ste_chunk *chunk) + { ++ struct mutex *lock = &chunk->action_tbl->parent_elem->parent_pool->lock; ++ + mlx5hws_dbg(chunk->action_tbl->pool->ctx, + "Freeing action STEs offset %d order %d\n", + chunk->ste.offset, chunk->ste.order); ++ ++ mutex_lock(lock); + mlx5hws_pool_chunk_free(chunk->action_tbl->pool, &chunk->ste); ++ chunk->action_tbl->last_used = jiffies; + list_move(&chunk->action_tbl->list_node, + &chunk->action_tbl->parent_elem->available); ++ mutex_unlock(lock); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h +index 2de660a63223..a8ba97359e31 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h +@@ -8,6 +8,9 @@ + #define MLX5HWS_ACTION_STE_TABLE_STEP_LOG_SZ 1 + #define MLX5HWS_ACTION_STE_TABLE_MAX_LOG_SZ 20 + ++#define MLX5HWS_ACTION_STE_POOL_CLEANUP_SECONDS 300 ++#define MLX5HWS_ACTION_STE_POOL_EXPIRE_SECONDS 300 ++ + struct mlx5hws_action_ste_pool_element; + + struct mlx5hws_action_ste_table { +@@ -19,10 +22,12 @@ struct mlx5hws_action_ste_table { + u32 rtc_0_id; + u32 rtc_1_id; + struct list_head list_node; ++ unsigned long last_used; + }; + + struct mlx5hws_action_ste_pool_element { + struct mlx5hws_context *ctx; ++ struct mlx5hws_action_ste_pool *parent_pool; + size_t log_sz; /* Size of the largest table so far. */ + enum mlx5hws_pool_optimize opt; + struct list_head available; +@@ -33,6 +38,12 @@ struct mlx5hws_action_ste_pool_element { + * per queue. + */ + struct mlx5hws_action_ste_pool { ++ /* Protects the entire pool. We have one pool per queue and only one ++ * operation can be active per rule at a given time. Thus this lock ++ * protects solely against concurrent garbage collection and we expect ++ * very little contention. ++ */ ++ struct mutex lock; + struct mlx5hws_action_ste_pool_element elems[MLX5HWS_POOL_OPTIMIZE_MAX]; + }; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h +index e987e93bbc6e..3f8938c73dc0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h +@@ -40,6 +40,7 @@ struct mlx5hws_context { + u32 pd_num; + struct mlx5hws_pool *stc_pool; + struct mlx5hws_action_ste_pool *action_ste_pool; /* One per queue */ ++ struct delayed_work action_ste_cleanup; + struct mlx5hws_context_common_res common_res; + struct mlx5hws_pattern_cache *pattern_cache; + struct mlx5hws_definer_cache *definer_cache; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1357-net-mlx5-hws-export-action-ste-tables-to-debugfs.patch b/SOURCES/1357-net-mlx5-hws-export-action-ste-tables-to-debugfs.patch new file mode 100644 index 000000000..33798c861 --- /dev/null +++ b/SOURCES/1357-net-mlx5-hws-export-action-ste-tables-to-debugfs.patch @@ -0,0 +1,99 @@ +From d1985c5a5885ee6fa478036997488534d181983c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:01 -0400 +Subject: [PATCH] net/mlx5: HWS, Export action STE tables to debugfs + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3db55f8cc8d329a97e06fb44347b64a0ca44e780 +Author: Vlad Dogaru +Date: Thu Apr 10 22:17:42 2025 +0300 + + net/mlx5: HWS, Export action STE tables to debugfs + + Introduce a new type of dump object and dump all action STE tables, + along with information on their RTCs and STEs. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Hamdan Agbariya + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Kubiak + Link: https://patch.msgid.link/1744312662-356571-13-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +index 38f75dec9cfc..91568d6c1dac 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +@@ -387,10 +387,41 @@ static int hws_debug_dump_context_stc(struct seq_file *f, struct mlx5hws_context + return 0; + } + ++static void ++hws_debug_dump_action_ste_table(struct seq_file *f, ++ struct mlx5hws_action_ste_table *action_tbl) ++{ ++ int ste_0_id = mlx5hws_pool_get_base_id(action_tbl->pool); ++ int ste_1_id = mlx5hws_pool_get_base_mirror_id(action_tbl->pool); ++ ++ seq_printf(f, "%d,0x%llx,%d,%d,%d,%d\n", ++ MLX5HWS_DEBUG_RES_TYPE_ACTION_STE_TABLE, ++ HWS_PTR_TO_ID(action_tbl), ++ action_tbl->rtc_0_id, ste_0_id, ++ action_tbl->rtc_1_id, ste_1_id); ++} ++ ++static void hws_debug_dump_action_ste_pool(struct seq_file *f, ++ struct mlx5hws_action_ste_pool *pool) ++{ ++ struct mlx5hws_action_ste_table *action_tbl; ++ enum mlx5hws_pool_optimize opt; ++ ++ mutex_lock(&pool->lock); ++ for (opt = MLX5HWS_POOL_OPTIMIZE_NONE; opt < MLX5HWS_POOL_OPTIMIZE_MAX; ++ opt++) { ++ list_for_each_entry(action_tbl, &pool->elems[opt].available, ++ list_node) { ++ hws_debug_dump_action_ste_table(f, action_tbl); ++ } ++ } ++ mutex_unlock(&pool->lock); ++} ++ + static int hws_debug_dump_context(struct seq_file *f, struct mlx5hws_context *ctx) + { + struct mlx5hws_table *tbl; +- int ret; ++ int ret, i; + + ret = hws_debug_dump_context_info(f, ctx); + if (ret) +@@ -410,6 +441,9 @@ static int hws_debug_dump_context(struct seq_file *f, struct mlx5hws_context *ct + return ret; + } + ++ for (i = 0; i < ctx->queues; i++) ++ hws_debug_dump_action_ste_pool(f, &ctx->action_ste_pool[i]); ++ + return 0; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.h +index e44e7ae28f93..89c396f9f266 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.h +@@ -26,6 +26,8 @@ enum mlx5hws_debug_res_type { + MLX5HWS_DEBUG_RES_TYPE_MATCHER_TEMPLATE_HASH_DEFINER = 4205, + MLX5HWS_DEBUG_RES_TYPE_MATCHER_TEMPLATE_RANGE_DEFINER = 4206, + MLX5HWS_DEBUG_RES_TYPE_MATCHER_TEMPLATE_COMPARE_MATCH_DEFINER = 4207, ++ ++ MLX5HWS_DEBUG_RES_TYPE_ACTION_STE_TABLE = 4300, + }; + + static inline u64 +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1358-net-mlx5e-ethtool-fix-formatting-of-ptp-rq0-csum-complete-ta.patch b/SOURCES/1358-net-mlx5e-ethtool-fix-formatting-of-ptp-rq0-csum-complete-ta.patch new file mode 100644 index 000000000..9cc47baaf --- /dev/null +++ b/SOURCES/1358-net-mlx5e-ethtool-fix-formatting-of-ptp-rq0-csum-complete-ta.patch @@ -0,0 +1,100 @@ +From e0fb28731ba130ec0f45f724aa5c27a9d49f363d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:01 -0400 +Subject: [PATCH] net/mlx5e: ethtool: Fix formatting of + ptp_rq0_csum_complete_tail_slow + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cfba1d1b61ae3f32e4bc06e9860711a4488d98b7 +Author: Kees Cook +Date: Tue Apr 15 19:01:14 2025 -0700 + + net/mlx5e: ethtool: Fix formatting of ptp_rq0_csum_complete_tail_slow + + The new GCC 15 warning -Wunterminated-string-initialization reports: + + In file included from drivers/net/ethernet/mellanox/mlx5/core/en.h:55, + from drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:34: + drivers/net/ethernet/mellanox/mlx5/core/en_stats.h:57:46: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization] + 57 | #define MLX5E_DECLARE_PTP_RQ_STAT(type, fld) "ptp_rq%d_"#fld, offsetof(type, fld) + | ^~~~~~~~~~~ + drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2279:11: note: in expansion of macro 'MLX5E_DECLARE_PTP_RQ_STAT' + 2279 | { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, csum_complete_tail_slow) }, + | ^~~~~~~~~~~~~~~~~~~~~~~~~ + + This stat string is being used in ethtool_sprintf(), so it must be a + valid NUL-terminated string. Currently the string lacks the final NUL + byte (as GCC warns), but by absolute luck, the next byte in memory is a + space (decimal 32) followed by a NUL. "format" is immediately followed + by little-endian size_t: + + struct counter_desc { + char format[32]; /* 0 32 */ + size_t offset; /* 32 8 */ + }; + + The "offset" member is populated by the stats member offset: + + #define MLX5E_DECLARE_PTP_RQ_STAT(type, fld) "ptp_rq%d_"#fld, offsetof(type, fld) + + which for this struct mlx5e_rq_stats member, csum_complete_tail_slow, is + 32, or space, and then the rest of the "offset" bytes are NULs. + + struct mlx5e_rq_stats { + ... + u64 csum_complete_tail_slow; /* 32 8 */ + + The use of vsnprintf(), within ethtool_sprintf(), reads past the end of + "format" and sees the format string as "ptp_rq%d_csum_complete_tail_slow ", + with %d getting resolved by MLX5E_PTP_CHANNEL_IX (value 0): + + ethtool_sprintf(data, ptp_rq_stats_desc[i].format, + MLX5E_PTP_CHANNEL_IX); + + With an output result of "ptp_rq0_csum_complete_tail_slow", which gets + precisely truncated to 31 characters with a trailing NUL. + + So, instead of accidentally getting this correct due to the NUL bytes + at the end of the size_t that happens to follow the format string, just + make the string initializer 1 byte shorter by replacing "%d" with "0", + since MLX5E_PTP_CHANNEL_IX is already hard-coded. This results in no + initializer truncation and no need to call sprintf(). + + Signed-off-by: Kees Cook + Reviewed-by: Dragos Tatulea + Link: https://patch.msgid.link/20250416020109.work.297-kees@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +index 1c121b435016..19664fa7f217 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +@@ -2424,8 +2424,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(ptp) + } + if (priv->rx_ptp_opened) { + for (i = 0; i < NUM_PTP_RQ_STATS; i++) +- ethtool_sprintf(data, ptp_rq_stats_desc[i].format, +- MLX5E_PTP_CHANNEL_IX); ++ ethtool_puts(data, ptp_rq_stats_desc[i].format); + } + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +index 8de6fcbd3a03..def5dea1463d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +@@ -54,7 +54,7 @@ + #define MLX5E_DECLARE_PTP_TX_STAT(type, fld) "ptp_tx%d_"#fld, offsetof(type, fld) + #define MLX5E_DECLARE_PTP_CH_STAT(type, fld) "ptp_ch_"#fld, offsetof(type, fld) + #define MLX5E_DECLARE_PTP_CQ_STAT(type, fld) "ptp_cq%d_"#fld, offsetof(type, fld) +-#define MLX5E_DECLARE_PTP_RQ_STAT(type, fld) "ptp_rq%d_"#fld, offsetof(type, fld) ++#define MLX5E_DECLARE_PTP_RQ_STAT(type, fld) "ptp_rq0_"#fld, offsetof(type, fld) + + #define MLX5E_DECLARE_QOS_TX_STAT(type, fld) "qos_tx%d_"#fld, offsetof(type, fld) + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1359-net-mlx5-fix-spelling-mistakes-in-mlx5-core-dbg-message-and-.patch b/SOURCES/1359-net-mlx5-fix-spelling-mistakes-in-mlx5-core-dbg-message-and-.patch new file mode 100644 index 000000000..16142c908 --- /dev/null +++ b/SOURCES/1359-net-mlx5-fix-spelling-mistakes-in-mlx5-core-dbg-message-and-.patch @@ -0,0 +1,58 @@ +From 3a54d4daa0cfe0a44c2f8cdc68d5b0c8b277a990 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:01 -0400 +Subject: [PATCH] net/mlx5: Fix spelling mistakes in mlx5_core_dbg message and + comments + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1e36473215297708dbe144c65b9f242c6e604520 +Author: Colin Ian King +Date: Fri Apr 18 14:57:03 2025 +0100 + + net/mlx5: Fix spelling mistakes in mlx5_core_dbg message and comments + + There is a spelling mistake in a mlx5_core_dbg and two spelling mistakes + in comment blocks. Fix them. + + Signed-off-by: Colin Ian King + Acked-by: Mark Bloch + Link: https://patch.msgid.link/20250418135703.542722-1-colin.i.king@gmail.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +index 2c5f850c31f6..40024cfa3099 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +@@ -148,7 +148,7 @@ int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id, + * Free the IRQ and other resources such as rmap from the system. + * BUT doesn't free or remove reference from mlx5. + * This function is very important for the shutdown flow, where we need to +- * cleanup system resoruces but keep mlx5 objects alive, ++ * cleanup system resources but keep mlx5 objects alive, + * see mlx5_irq_table_free_irqs(). + */ + static void mlx5_system_free_irq(struct mlx5_irq *irq) +@@ -588,7 +588,7 @@ static void irq_pool_free(struct mlx5_irq_pool *pool) + struct mlx5_irq *irq; + unsigned long index; + +- /* There are cases in which we are destrying the irq_table before ++ /* There are cases in which we are destroying the irq_table before + * freeing all the IRQs, fast teardown for example. Hence, free the irqs + * which might not have been freed. + */ +@@ -617,7 +617,7 @@ static int irq_pools_init(struct mlx5_core_dev *dev, int sf_vec, int pcif_vec, + if (!mlx5_sf_max_functions(dev)) + return 0; + if (sf_vec < MLX5_IRQ_VEC_COMP_BASE_SF) { +- mlx5_core_dbg(dev, "Not enught IRQs for SFs. SF may run at lower performance\n"); ++ mlx5_core_dbg(dev, "Not enough IRQs for SFs. SF may run at lower performance\n"); + return 0; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1360-net-mlx5-hws-fix-ip-version-decision.patch b/SOURCES/1360-net-mlx5-hws-fix-ip-version-decision.patch new file mode 100644 index 000000000..90bcbf77b --- /dev/null +++ b/SOURCES/1360-net-mlx5-hws-fix-ip-version-decision.patch @@ -0,0 +1,136 @@ +From 988625f598a4722c53b34743d5ddef5d48a46a20 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:02 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix IP version decision + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5f2f8d8b6800e4fc760c2eccec9b2bd2cacf80cf +Author: Vlad Dogaru +Date: Tue Apr 22 12:25:38 2025 +0300 + + net/mlx5: HWS, Fix IP version decision + + Unify the check for IP version when creating a definer. A given matcher + is deemed to match on IPv6 if any of the higher order (>31) bits of + source or destination address mask are set. + + A single packet cannot mix IP versions between source and destination + addresses, so it makes no sense that they would be decided on + independently. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250422092540.182091-2-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +index c8cc0c8115f5..5257e706dde2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +@@ -509,9 +509,9 @@ static int + hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + u32 *match_param) + { +- bool is_s_ipv6, is_d_ipv6, smac_set, dmac_set; + struct mlx5hws_definer_fc *fc = cd->fc; + struct mlx5hws_definer_fc *curr_fc; ++ bool is_ipv6, smac_set, dmac_set; + u32 *s_ipv6, *d_ipv6; + + if (HWS_IS_FLD_SET_SZ(match_param, outer_headers.l4_type, 0x2) || +@@ -570,10 +570,10 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + outer_headers.dst_ipv4_dst_ipv6.ipv6_layout); + + /* Assume IPv6 is used if ipv6 bits are set */ +- is_s_ipv6 = s_ipv6[0] || s_ipv6[1] || s_ipv6[2]; +- is_d_ipv6 = d_ipv6[0] || d_ipv6[1] || d_ipv6[2]; ++ is_ipv6 = s_ipv6[0] || s_ipv6[1] || s_ipv6[2] || ++ d_ipv6[0] || d_ipv6[1] || d_ipv6[2]; + +- if (is_s_ipv6) { ++ if (is_ipv6) { + /* Handle IPv6 source address */ + HWS_SET_HDR(fc, match_param, IPV6_SRC_127_96_O, + outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_127_96, +@@ -587,13 +587,6 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + HWS_SET_HDR(fc, match_param, IPV6_SRC_31_0_O, + outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0, + ipv6_src_outer.ipv6_address_31_0); +- } else { +- /* Handle IPv4 source address */ +- HWS_SET_HDR(fc, match_param, IPV4_SRC_O, +- outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0, +- ipv4_src_dest_outer.source_address); +- } +- if (is_d_ipv6) { + /* Handle IPv6 destination address */ + HWS_SET_HDR(fc, match_param, IPV6_DST_127_96_O, + outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_127_96, +@@ -608,6 +601,10 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0, + ipv6_dst_outer.ipv6_address_31_0); + } else { ++ /* Handle IPv4 source address */ ++ HWS_SET_HDR(fc, match_param, IPV4_SRC_O, ++ outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0, ++ ipv4_src_dest_outer.source_address); + /* Handle IPv4 destination address */ + HWS_SET_HDR(fc, match_param, IPV4_DST_O, + outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0, +@@ -665,9 +662,9 @@ static int + hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + u32 *match_param) + { +- bool is_s_ipv6, is_d_ipv6, smac_set, dmac_set; + struct mlx5hws_definer_fc *fc = cd->fc; + struct mlx5hws_definer_fc *curr_fc; ++ bool is_ipv6, smac_set, dmac_set; + u32 *s_ipv6, *d_ipv6; + + if (HWS_IS_FLD_SET_SZ(match_param, inner_headers.l4_type, 0x2) || +@@ -728,10 +725,10 @@ hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + inner_headers.dst_ipv4_dst_ipv6.ipv6_layout); + + /* Assume IPv6 is used if ipv6 bits are set */ +- is_s_ipv6 = s_ipv6[0] || s_ipv6[1] || s_ipv6[2]; +- is_d_ipv6 = d_ipv6[0] || d_ipv6[1] || d_ipv6[2]; ++ is_ipv6 = s_ipv6[0] || s_ipv6[1] || s_ipv6[2] || ++ d_ipv6[0] || d_ipv6[1] || d_ipv6[2]; + +- if (is_s_ipv6) { ++ if (is_ipv6) { + /* Handle IPv6 source address */ + HWS_SET_HDR(fc, match_param, IPV6_SRC_127_96_I, + inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_127_96, +@@ -745,13 +742,6 @@ hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + HWS_SET_HDR(fc, match_param, IPV6_SRC_31_0_I, + inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0, + ipv6_src_inner.ipv6_address_31_0); +- } else { +- /* Handle IPv4 source address */ +- HWS_SET_HDR(fc, match_param, IPV4_SRC_I, +- inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0, +- ipv4_src_dest_inner.source_address); +- } +- if (is_d_ipv6) { + /* Handle IPv6 destination address */ + HWS_SET_HDR(fc, match_param, IPV6_DST_127_96_I, + inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_127_96, +@@ -766,6 +756,10 @@ hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0, + ipv6_dst_inner.ipv6_address_31_0); + } else { ++ /* Handle IPv4 source address */ ++ HWS_SET_HDR(fc, match_param, IPV4_SRC_I, ++ inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0, ++ ipv4_src_dest_inner.source_address); + /* Handle IPv4 destination address */ + HWS_SET_HDR(fc, match_param, IPV4_DST_I, + inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1361-net-mlx5-hws-harden-ip-version-definer-checks.patch b/SOURCES/1361-net-mlx5-hws-harden-ip-version-definer-checks.patch new file mode 100644 index 000000000..2ca5aecb8 --- /dev/null +++ b/SOURCES/1361-net-mlx5-hws-harden-ip-version-definer-checks.patch @@ -0,0 +1,127 @@ +From 024b08ee6e9f4a7d00dcdbde0e76f34bebc32c27 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:02 -0400 +Subject: [PATCH] net/mlx5: HWS, Harden IP version definer checks + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6991a975e416154576b0f5f06256aec13e23b0a7 +Author: Vlad Dogaru +Date: Tue Apr 22 12:25:39 2025 +0300 + + net/mlx5: HWS, Harden IP version definer checks + + Replicate some sanity checks that firmware does, since hardware steering + does not go through firmware. + + When creating a definer, disallow matching on IP addresses without also + matching on IP version. The latter can be satisfied by matching either + on the version field in the IP header, or on the ethertype field. + + Also refuse to match IPv4 IHL alongside IPv6. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250422092540.182091-3-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +index 5257e706dde2..1061a46811ac 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +@@ -509,9 +509,9 @@ static int + hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + u32 *match_param) + { ++ bool is_ipv6, smac_set, dmac_set, ip_addr_set, ip_ver_set; + struct mlx5hws_definer_fc *fc = cd->fc; + struct mlx5hws_definer_fc *curr_fc; +- bool is_ipv6, smac_set, dmac_set; + u32 *s_ipv6, *d_ipv6; + + if (HWS_IS_FLD_SET_SZ(match_param, outer_headers.l4_type, 0x2) || +@@ -521,6 +521,20 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + return -EINVAL; + } + ++ ip_addr_set = HWS_IS_FLD_SET_SZ(match_param, ++ outer_headers.src_ipv4_src_ipv6, ++ 0x80) || ++ HWS_IS_FLD_SET_SZ(match_param, ++ outer_headers.dst_ipv4_dst_ipv6, 0x80); ++ ip_ver_set = HWS_IS_FLD_SET(match_param, outer_headers.ip_version) || ++ HWS_IS_FLD_SET(match_param, outer_headers.ethertype); ++ ++ if (ip_addr_set && !ip_ver_set) { ++ mlx5hws_err(cd->ctx, ++ "Unsupported match on IP address without version or ethertype\n"); ++ return -EINVAL; ++ } ++ + /* L2 Check ethertype */ + HWS_SET_HDR(fc, match_param, ETH_TYPE_O, + outer_headers.ethertype, +@@ -573,6 +587,12 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + is_ipv6 = s_ipv6[0] || s_ipv6[1] || s_ipv6[2] || + d_ipv6[0] || d_ipv6[1] || d_ipv6[2]; + ++ /* IHL is an IPv4-specific field. */ ++ if (is_ipv6 && HWS_IS_FLD_SET(match_param, outer_headers.ipv4_ihl)) { ++ mlx5hws_err(cd->ctx, "Unsupported match on IPv6 address and IPv4 IHL\n"); ++ return -EINVAL; ++ } ++ + if (is_ipv6) { + /* Handle IPv6 source address */ + HWS_SET_HDR(fc, match_param, IPV6_SRC_127_96_O, +@@ -662,9 +682,9 @@ static int + hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + u32 *match_param) + { ++ bool is_ipv6, smac_set, dmac_set, ip_addr_set, ip_ver_set; + struct mlx5hws_definer_fc *fc = cd->fc; + struct mlx5hws_definer_fc *curr_fc; +- bool is_ipv6, smac_set, dmac_set; + u32 *s_ipv6, *d_ipv6; + + if (HWS_IS_FLD_SET_SZ(match_param, inner_headers.l4_type, 0x2) || +@@ -674,6 +694,20 @@ hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + return -EINVAL; + } + ++ ip_addr_set = HWS_IS_FLD_SET_SZ(match_param, ++ inner_headers.src_ipv4_src_ipv6, ++ 0x80) || ++ HWS_IS_FLD_SET_SZ(match_param, ++ inner_headers.dst_ipv4_dst_ipv6, 0x80); ++ ip_ver_set = HWS_IS_FLD_SET(match_param, inner_headers.ip_version) || ++ HWS_IS_FLD_SET(match_param, inner_headers.ethertype); ++ ++ if (ip_addr_set && !ip_ver_set) { ++ mlx5hws_err(cd->ctx, ++ "Unsupported match on IP address without version or ethertype\n"); ++ return -EINVAL; ++ } ++ + /* L2 Check ethertype */ + HWS_SET_HDR(fc, match_param, ETH_TYPE_I, + inner_headers.ethertype, +@@ -728,6 +762,12 @@ hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + is_ipv6 = s_ipv6[0] || s_ipv6[1] || s_ipv6[2] || + d_ipv6[0] || d_ipv6[1] || d_ipv6[2]; + ++ /* IHL is an IPv4-specific field. */ ++ if (is_ipv6 && HWS_IS_FLD_SET(match_param, inner_headers.ipv4_ihl)) { ++ mlx5hws_err(cd->ctx, "Unsupported match on IPv6 address and IPv4 IHL\n"); ++ return -EINVAL; ++ } ++ + if (is_ipv6) { + /* Handle IPv6 source address */ + HWS_SET_HDR(fc, match_param, IPV6_SRC_127_96_I, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1362-net-mlx5-hws-disallow-matcher-ip-version-mixing.patch b/SOURCES/1362-net-mlx5-hws-disallow-matcher-ip-version-mixing.patch new file mode 100644 index 000000000..99e189a7c --- /dev/null +++ b/SOURCES/1362-net-mlx5-hws-disallow-matcher-ip-version-mixing.patch @@ -0,0 +1,256 @@ +From c0c4826b9a633587cd14595358b62c40a3672204 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:02 -0400 +Subject: [PATCH] net/mlx5: HWS, Disallow matcher IP version mixing + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f41f3edf0b15d7ce0b0f71c00a6125e8d7ca735f +Author: Vlad Dogaru +Date: Tue Apr 22 12:25:40 2025 +0300 + + net/mlx5: HWS, Disallow matcher IP version mixing + + Signal clearly to the user, via an error, that mixing IPv4 and IPv6 + rules in the same matcher is not supported. Previously such cases + silently failed by adding a rule that did not work correctly. + + Rules can specify an IP version by one of two fields: IP version or + ethertype. At matcher creation, store whether the template matches on + any of these two fields. If yes, inspect each rule for its corresponding + match value and store the IP version inside the matcher to guard against + inconsistencies with subsequent rules. + + Furthermore, also check rules for internal consistency, i.e. verify that + the ethertype and IP version match values do not contradict each other. + + The logic applies to inner and outer headers independently, to account + for tunneling. + + Rules that do not match on IP addresses are not affected. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250422092540.182091-4-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index 716502732d3d..5b0c1623499b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -385,6 +385,30 @@ static int hws_matcher_bind_at(struct mlx5hws_matcher *matcher) + return 0; + } + ++static void hws_matcher_set_ip_version_match(struct mlx5hws_matcher *matcher) ++{ ++ int i; ++ ++ for (i = 0; i < matcher->mt->fc_sz; i++) { ++ switch (matcher->mt->fc[i].fname) { ++ case MLX5HWS_DEFINER_FNAME_ETH_TYPE_O: ++ matcher->matches_outer_ethertype = 1; ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_L3_TYPE_O: ++ matcher->matches_outer_ip_version = 1; ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_TYPE_I: ++ matcher->matches_inner_ethertype = 1; ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_L3_TYPE_I: ++ matcher->matches_inner_ip_version = 1; ++ break; ++ default: ++ break; ++ } ++ } ++} ++ + static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + { + struct mlx5hws_context *ctx = matcher->tbl->ctx; +@@ -401,6 +425,8 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + } + } + ++ hws_matcher_set_ip_version_match(matcher); ++ + /* Create an STE pool per matcher*/ + pool_attr.table_type = matcher->tbl->type; + pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +index bad1fa8f77fd..8e95158a66b5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +@@ -50,6 +50,12 @@ struct mlx5hws_matcher_match_ste { + struct mlx5hws_pool *pool; + }; + ++enum { ++ MLX5HWS_MATCHER_IPV_UNSET = 0, ++ MLX5HWS_MATCHER_IPV_4 = 1, ++ MLX5HWS_MATCHER_IPV_6 = 2, ++}; ++ + struct mlx5hws_matcher { + struct mlx5hws_table *tbl; + struct mlx5hws_matcher_attr attr; +@@ -61,6 +67,12 @@ struct mlx5hws_matcher { + u8 num_of_action_stes; + /* enum mlx5hws_matcher_flags */ + u8 flags; ++ u8 matches_outer_ethertype:1; ++ u8 matches_outer_ip_version:1; ++ u8 matches_inner_ethertype:1; ++ u8 matches_inner_ip_version:1; ++ u8 outer_ip_version:2; ++ u8 inner_ip_version:2; + u32 end_ft_id; + struct mlx5hws_matcher *col_matcher; + struct mlx5hws_matcher *resize_dst; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +index 9e6f35d68445..5342a4cc7194 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +@@ -655,6 +655,124 @@ int mlx5hws_rule_move_hws_add(struct mlx5hws_rule *rule, + return 0; + } + ++static u8 hws_rule_ethertype_to_matcher_ipv(u32 ethertype) ++{ ++ switch (ethertype) { ++ case ETH_P_IP: ++ return MLX5HWS_MATCHER_IPV_4; ++ case ETH_P_IPV6: ++ return MLX5HWS_MATCHER_IPV_6; ++ default: ++ return MLX5HWS_MATCHER_IPV_UNSET; ++ } ++} ++ ++static u8 hws_rule_ip_version_to_matcher_ipv(u32 ip_version) ++{ ++ switch (ip_version) { ++ case 4: ++ return MLX5HWS_MATCHER_IPV_4; ++ case 6: ++ return MLX5HWS_MATCHER_IPV_6; ++ default: ++ return MLX5HWS_MATCHER_IPV_UNSET; ++ } ++} ++ ++static int hws_rule_check_outer_ip_version(struct mlx5hws_matcher *matcher, ++ u32 *match_param) ++{ ++ struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ u8 outer_ipv_ether = MLX5HWS_MATCHER_IPV_UNSET; ++ u8 outer_ipv_ip = MLX5HWS_MATCHER_IPV_UNSET; ++ u8 outer_ipv, ver; ++ ++ if (matcher->matches_outer_ethertype) { ++ ver = MLX5_GET(fte_match_param, match_param, ++ outer_headers.ethertype); ++ outer_ipv_ether = hws_rule_ethertype_to_matcher_ipv(ver); ++ } ++ if (matcher->matches_outer_ip_version) { ++ ver = MLX5_GET(fte_match_param, match_param, ++ outer_headers.ip_version); ++ outer_ipv_ip = hws_rule_ip_version_to_matcher_ipv(ver); ++ } ++ ++ if (outer_ipv_ether != MLX5HWS_MATCHER_IPV_UNSET && ++ outer_ipv_ip != MLX5HWS_MATCHER_IPV_UNSET && ++ outer_ipv_ether != outer_ipv_ip) { ++ mlx5hws_err(ctx, "Rule matches on inconsistent outer ethertype and ip version\n"); ++ return -EINVAL; ++ } ++ ++ outer_ipv = outer_ipv_ether != MLX5HWS_MATCHER_IPV_UNSET ? ++ outer_ipv_ether : outer_ipv_ip; ++ if (outer_ipv != MLX5HWS_MATCHER_IPV_UNSET && ++ matcher->outer_ip_version != MLX5HWS_MATCHER_IPV_UNSET && ++ outer_ipv != matcher->outer_ip_version) { ++ mlx5hws_err(ctx, "Matcher and rule disagree on outer IP version\n"); ++ return -EINVAL; ++ } ++ matcher->outer_ip_version = outer_ipv; ++ ++ return 0; ++} ++ ++static int hws_rule_check_inner_ip_version(struct mlx5hws_matcher *matcher, ++ u32 *match_param) ++{ ++ struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ u8 inner_ipv_ether = MLX5HWS_MATCHER_IPV_UNSET; ++ u8 inner_ipv_ip = MLX5HWS_MATCHER_IPV_UNSET; ++ u8 inner_ipv, ver; ++ ++ if (matcher->matches_inner_ethertype) { ++ ver = MLX5_GET(fte_match_param, match_param, ++ inner_headers.ethertype); ++ inner_ipv_ether = hws_rule_ethertype_to_matcher_ipv(ver); ++ } ++ if (matcher->matches_inner_ip_version) { ++ ver = MLX5_GET(fte_match_param, match_param, ++ inner_headers.ip_version); ++ inner_ipv_ip = hws_rule_ip_version_to_matcher_ipv(ver); ++ } ++ ++ if (inner_ipv_ether != MLX5HWS_MATCHER_IPV_UNSET && ++ inner_ipv_ip != MLX5HWS_MATCHER_IPV_UNSET && ++ inner_ipv_ether != inner_ipv_ip) { ++ mlx5hws_err(ctx, "Rule matches on inconsistent inner ethertype and ip version\n"); ++ return -EINVAL; ++ } ++ ++ inner_ipv = inner_ipv_ether != MLX5HWS_MATCHER_IPV_UNSET ? ++ inner_ipv_ether : inner_ipv_ip; ++ if (inner_ipv != MLX5HWS_MATCHER_IPV_UNSET && ++ matcher->inner_ip_version != MLX5HWS_MATCHER_IPV_UNSET && ++ inner_ipv != matcher->inner_ip_version) { ++ mlx5hws_err(ctx, "Matcher and rule disagree on inner IP version\n"); ++ return -EINVAL; ++ } ++ matcher->inner_ip_version = inner_ipv; ++ ++ return 0; ++} ++ ++static int hws_rule_check_ip_version(struct mlx5hws_matcher *matcher, ++ u32 *match_param) ++{ ++ int ret; ++ ++ ret = hws_rule_check_outer_ip_version(matcher, match_param); ++ if (unlikely(ret)) ++ return ret; ++ ++ ret = hws_rule_check_inner_ip_version(matcher, match_param); ++ if (unlikely(ret)) ++ return ret; ++ ++ return 0; ++} ++ + int mlx5hws_rule_create(struct mlx5hws_matcher *matcher, + u8 mt_idx, + u32 *match_param, +@@ -665,6 +783,10 @@ int mlx5hws_rule_create(struct mlx5hws_matcher *matcher, + { + int ret; + ++ ret = hws_rule_check_ip_version(matcher, match_param); ++ if (unlikely(ret)) ++ return ret; ++ + rule_handle->matcher = matcher; + + ret = hws_rule_enqueue_precheck_create(rule_handle, attr); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1363-rdma-mlx5-fix-error-flow-upon-firmware-failure-for-rq-destru.patch b/SOURCES/1363-rdma-mlx5-fix-error-flow-upon-firmware-failure-for-rq-destru.patch new file mode 100644 index 000000000..8d49a185b --- /dev/null +++ b/SOURCES/1363-rdma-mlx5-fix-error-flow-upon-firmware-failure-for-rq-destru.patch @@ -0,0 +1,142 @@ +From 85e0a9a7588dbdecc6ff8e2facde4a75b8ff4299 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:02 -0400 +Subject: [PATCH] RDMA/mlx5: Fix error flow upon firmware failure for RQ + destruction + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5d2ea5aebbb2f3ebde4403f9c55b2b057e5dd2d6 +Author: Patrisious Haddad +Date: Mon Apr 28 14:34:07 2025 +0300 + + RDMA/mlx5: Fix error flow upon firmware failure for RQ destruction + + Upon RQ destruction if the firmware command fails which is the + last resource to be destroyed some SW resources were already cleaned + regardless of the failure. + + Now properly rollback the object to its original state upon such failure. + + In order to avoid a use-after free in case someone tries to destroy the + object again, which results in the following kernel trace: + refcount_t: underflow; use-after-free. + WARNING: CPU: 0 PID: 37589 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148 + Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) rfkill mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) psample mlxfw(OE) mlx_compat(OE) macsec tls pci_hyperv_intf sunrpc vfat fat virtio_net net_failover failover fuse loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_console virtio_gpu virtio_blk virtio_dma_buf virtio_mmio dm_mirror dm_region_hash dm_log dm_mod xpmem(OE) + CPU: 0 UID: 0 PID: 37589 Comm: python3 Kdump: loaded Tainted: G OE ------- --- 6.12.0-54.el10.aarch64 #1 + Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE + Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 + pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) + pc : refcount_warn_saturate+0xf4/0x148 + lr : refcount_warn_saturate+0xf4/0x148 + sp : ffff80008b81b7e0 + x29: ffff80008b81b7e0 x28: ffff000133d51600 x27: 0000000000000001 + x26: 0000000000000000 x25: 00000000ffffffea x24: ffff00010ae80f00 + x23: ffff00010ae80f80 x22: ffff0000c66e5d08 x21: 0000000000000000 + x20: ffff0000c66e0000 x19: ffff00010ae80340 x18: 0000000000000006 + x17: 0000000000000000 x16: 0000000000000020 x15: ffff80008b81b37f + x14: 0000000000000000 x13: 2e656572662d7265 x12: ffff80008283ef78 + x11: ffff80008257efd0 x10: ffff80008283efd0 x9 : ffff80008021ed90 + x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff + x5 : ffff0001fb8e3408 x4 : 0000000000000000 x3 : ffff800179993000 + x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000133d51600 + Call trace: + refcount_warn_saturate+0xf4/0x148 + mlx5_core_put_rsc+0x88/0xa0 [mlx5_ib] + mlx5_core_destroy_rq_tracked+0x64/0x98 [mlx5_ib] + mlx5_ib_destroy_wq+0x34/0x80 [mlx5_ib] + ib_destroy_wq_user+0x30/0xc0 [ib_core] + uverbs_free_wq+0x28/0x58 [ib_uverbs] + destroy_hw_idr_uobject+0x34/0x78 [ib_uverbs] + uverbs_destroy_uobject+0x48/0x240 [ib_uverbs] + __uverbs_cleanup_ufile+0xd4/0x1a8 [ib_uverbs] + uverbs_destroy_ufile_hw+0x48/0x120 [ib_uverbs] + ib_uverbs_close+0x2c/0x100 [ib_uverbs] + __fput+0xd8/0x2f0 + __fput_sync+0x50/0x70 + __arm64_sys_close+0x40/0x90 + invoke_syscall.constprop.0+0x74/0xd0 + do_el0_svc+0x48/0xe8 + el0_svc+0x44/0x1d0 + el0t_64_sync_handler+0x120/0x130 + el0t_64_sync+0x1a4/0x1a8 + + Fixes: e2013b212f9f ("net/mlx5_core: Add RQ and SQ event handling") + Signed-off-by: Patrisious Haddad + Link: https://patch.msgid.link/3181433ccdd695c63560eeeb3f0c990961732101.1745839855.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c +index d3dcc272200a..146d03ae40bd 100644 +--- a/drivers/infiniband/hw/mlx5/qpc.c ++++ b/drivers/infiniband/hw/mlx5/qpc.c +@@ -21,8 +21,10 @@ mlx5_get_rsc(struct mlx5_qp_table *table, u32 rsn) + spin_lock_irqsave(&table->lock, flags); + + common = radix_tree_lookup(&table->tree, rsn); +- if (common) ++ if (common && !common->invalid) + refcount_inc(&common->refcount); ++ else ++ common = NULL; + + spin_unlock_irqrestore(&table->lock, flags); + +@@ -178,6 +180,18 @@ static int create_resource_common(struct mlx5_ib_dev *dev, + return 0; + } + ++static void modify_resource_common_state(struct mlx5_ib_dev *dev, ++ struct mlx5_core_qp *qp, ++ bool invalid) ++{ ++ struct mlx5_qp_table *table = &dev->qp_table; ++ unsigned long flags; ++ ++ spin_lock_irqsave(&table->lock, flags); ++ qp->common.invalid = invalid; ++ spin_unlock_irqrestore(&table->lock, flags); ++} ++ + static void destroy_resource_common(struct mlx5_ib_dev *dev, + struct mlx5_core_qp *qp) + { +@@ -609,8 +623,20 @@ int mlx5_core_create_rq_tracked(struct mlx5_ib_dev *dev, u32 *in, int inlen, + int mlx5_core_destroy_rq_tracked(struct mlx5_ib_dev *dev, + struct mlx5_core_qp *rq) + { ++ int ret; ++ ++ /* The rq destruction can be called again in case it fails, hence we ++ * mark the common resource as invalid and only once FW destruction ++ * is completed successfully we actually destroy the resources. ++ */ ++ modify_resource_common_state(dev, rq, true); ++ ret = destroy_rq_tracked(dev, rq->qpn, rq->uid); ++ if (ret) { ++ modify_resource_common_state(dev, rq, false); ++ return ret; ++ } + destroy_resource_common(dev, rq); +- return destroy_rq_tracked(dev, rq->qpn, rq->uid); ++ return 0; + } + + static void destroy_sq_tracked(struct mlx5_ib_dev *dev, u32 sqn, u16 uid) +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 04705078dfab..df76aece6be9 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -398,6 +398,7 @@ struct mlx5_core_rsc_common { + enum mlx5_res_type res; + refcount_t refcount; + struct completion free; ++ bool invalid; + }; + + struct mlx5_uars_page { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1364-net-mlx5-support-software-tx-timestamp.patch b/SOURCES/1364-net-mlx5-support-software-tx-timestamp.patch new file mode 100644 index 000000000..797cdb438 --- /dev/null +++ b/SOURCES/1364-net-mlx5-support-software-tx-timestamp.patch @@ -0,0 +1,78 @@ +From caaa5c0c5b3a539eefe31c4fe578b881ba6512bf Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:03 -0400 +Subject: [PATCH] net/mlx5: support software TX timestamp + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2451d3fb388f29d87d1abd3d2952d5ce36109816 +Author: Stanislav Fomichev +Date: Thu May 8 16:51:09 2025 -0700 + + net/mlx5: support software TX timestamp + + Having a software timestamp (along with existing hardware one) is + useful to trace how the packets flow through the stack. + mlx5e_tx_skb_update_hwts_flags is called from tx paths + to setup HW timestamp; extend it to add software one as well. + + Reviewed-by: Jason Xing + Signed-off-by: Stanislav Fomichev + Reviewed-by: Vadim Fedorenko + Acked-by: Martin KaFai Lau + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20250508235109.585096-1-stfomichev@gmail.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index 8578f03783bc..e6c9338ddae8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -1686,6 +1686,7 @@ int mlx5e_ethtool_get_ts_info(struct mlx5e_priv *priv, + return 0; + + info->so_timestamping = SOF_TIMESTAMPING_TX_HARDWARE | ++ SOF_TIMESTAMPING_TX_SOFTWARE | + SOF_TIMESTAMPING_RX_HARDWARE | + SOF_TIMESTAMPING_RAW_HARDWARE; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +index 4fd853d19e31..55a8629f0792 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +@@ -337,10 +337,11 @@ static void mlx5e_sq_calc_wqe_attr(struct sk_buff *skb, const struct mlx5e_tx_at + }; + } + +-static void mlx5e_tx_skb_update_hwts_flags(struct sk_buff *skb) ++static void mlx5e_tx_skb_update_ts_flags(struct sk_buff *skb) + { + if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) + skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; ++ skb_tx_timestamp(skb); + } + + static void mlx5e_tx_check_stop(struct mlx5e_txqsq *sq) +@@ -392,7 +393,7 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb, + cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | attr->opcode); + cseg->qpn_ds = cpu_to_be32((sq->sqn << 8) | wqe_attr->ds_cnt); + +- mlx5e_tx_skb_update_hwts_flags(skb); ++ mlx5e_tx_skb_update_ts_flags(skb); + + sq->pc += wi->num_wqebbs; + +@@ -625,7 +626,7 @@ mlx5e_sq_xmit_mpwqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, + mlx5e_dma_push(sq, txd.dma_addr, txd.len, MLX5E_DMA_MAP_SINGLE); + mlx5e_skb_fifo_push(&sq->db.skb_fifo, skb); + mlx5e_tx_mpwqe_add_dseg(sq, &txd); +- mlx5e_tx_skb_update_hwts_flags(skb); ++ mlx5e_tx_skb_update_ts_flags(skb); + + if (unlikely(mlx5e_tx_mpwqe_is_full(&sq->mpwqe))) { + /* Might stop the queue and affect the retval of __netdev_tx_sent_queue. */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1365-net-mlx5-hws-expose-function-mlx5hws-table-ft-set-next-ft-in.patch b/SOURCES/1365-net-mlx5-hws-expose-function-mlx5hws-table-ft-set-next-ft-in.patch new file mode 100644 index 000000000..0a06f2b0c --- /dev/null +++ b/SOURCES/1365-net-mlx5-hws-expose-function-mlx5hws-table-ft-set-next-ft-in.patch @@ -0,0 +1,77 @@ +From 61618987cfeb590d9694abd9fbcdb68f8845d29b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:03 -0400 +Subject: [PATCH] net/mlx5: HWS, expose function mlx5hws_table_ft_set_next_ft + in header + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d2338a27fcee9158d0378d759152b8e0a5933c88 +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:01 2025 +0300 + + net/mlx5: HWS, expose function mlx5hws_table_ft_set_next_ft in header + + In preparation for complex matcher support, make function + mlx5hws_table_ft_set_next_ft() non-static and expose it in header. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c +index ab1297531232..568f691733f3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c +@@ -342,10 +342,10 @@ int mlx5hws_table_ft_set_next_rtc(struct mlx5hws_context *ctx, + return mlx5hws_cmd_flow_table_modify(ctx->mdev, &ft_attr, ft_id); + } + +-static int hws_table_ft_set_next_ft(struct mlx5hws_context *ctx, +- u32 ft_id, +- u32 fw_ft_type, +- u32 next_ft_id) ++int mlx5hws_table_ft_set_next_ft(struct mlx5hws_context *ctx, ++ u32 ft_id, ++ u32 fw_ft_type, ++ u32 next_ft_id) + { + struct mlx5hws_cmd_ft_modify_attr ft_attr = {0}; + +@@ -389,10 +389,10 @@ int mlx5hws_table_connect_to_miss_table(struct mlx5hws_table *src_tbl, + if (dst_tbl) { + if (list_empty(&dst_tbl->matchers_list)) { + /* Connect src_tbl last_ft to dst_tbl start anchor */ +- ret = hws_table_ft_set_next_ft(src_tbl->ctx, +- last_ft_id, +- src_tbl->fw_ft_type, +- dst_tbl->ft_id); ++ ret = mlx5hws_table_ft_set_next_ft(src_tbl->ctx, ++ last_ft_id, ++ src_tbl->fw_ft_type, ++ dst_tbl->ft_id); + if (ret) + return ret; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h +index dd50420eec9e..0400cce0c317 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h +@@ -65,4 +65,9 @@ int mlx5hws_table_ft_set_next_rtc(struct mlx5hws_context *ctx, + u32 rtc_0_id, + u32 rtc_1_id); + ++int mlx5hws_table_ft_set_next_ft(struct mlx5hws_context *ctx, ++ u32 ft_id, ++ u32 fw_ft_type, ++ u32 next_ft_id); ++ + #endif /* MLX5HWS_TABLE_H_ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1366-net-mlx5-hws-add-definer-function-to-get-field-name-str.patch b/SOURCES/1366-net-mlx5-hws-add-definer-function-to-get-field-name-str.patch new file mode 100644 index 000000000..613a39578 --- /dev/null +++ b/SOURCES/1366-net-mlx5-hws-add-definer-function-to-get-field-name-str.patch @@ -0,0 +1,263 @@ +From f8db8d6e3362f5fac65193f3ece0fffb4ad20588 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:03 -0400 +Subject: [PATCH] net/mlx5: HWS, add definer function to get field name str + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fed5f4831281593a4bda2f8ef6912fdbcad6e670 +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:02 2025 +0300 + + net/mlx5: HWS, add definer function to get field name str + + In preparation for complex matcher support, add function for + converting definer fname to str, which will be used in following + patches. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +index 1061a46811ac..5cc0dc002ac1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +@@ -158,6 +158,218 @@ struct mlx5hws_definer_conv_data { + u32 match_flags; + }; + ++#define HWS_DEFINER_ENTRY(name)[MLX5HWS_DEFINER_FNAME_##name] = #name ++ ++static const char * const hws_definer_fname_to_str[] = { ++ HWS_DEFINER_ENTRY(ETH_SMAC_47_16_O), ++ HWS_DEFINER_ENTRY(ETH_SMAC_47_16_I), ++ HWS_DEFINER_ENTRY(ETH_SMAC_15_0_O), ++ HWS_DEFINER_ENTRY(ETH_SMAC_15_0_I), ++ HWS_DEFINER_ENTRY(ETH_DMAC_47_16_O), ++ HWS_DEFINER_ENTRY(ETH_DMAC_47_16_I), ++ HWS_DEFINER_ENTRY(ETH_DMAC_15_0_O), ++ HWS_DEFINER_ENTRY(ETH_DMAC_15_0_I), ++ HWS_DEFINER_ENTRY(ETH_TYPE_O), ++ HWS_DEFINER_ENTRY(ETH_TYPE_I), ++ HWS_DEFINER_ENTRY(ETH_L3_TYPE_O), ++ HWS_DEFINER_ENTRY(ETH_L3_TYPE_I), ++ HWS_DEFINER_ENTRY(VLAN_TYPE_O), ++ HWS_DEFINER_ENTRY(VLAN_TYPE_I), ++ HWS_DEFINER_ENTRY(VLAN_FIRST_PRIO_O), ++ HWS_DEFINER_ENTRY(VLAN_FIRST_PRIO_I), ++ HWS_DEFINER_ENTRY(VLAN_CFI_O), ++ HWS_DEFINER_ENTRY(VLAN_CFI_I), ++ HWS_DEFINER_ENTRY(VLAN_ID_O), ++ HWS_DEFINER_ENTRY(VLAN_ID_I), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_TYPE_O), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_TYPE_I), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_PRIO_O), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_PRIO_I), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_CFI_O), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_CFI_I), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_ID_O), ++ HWS_DEFINER_ENTRY(VLAN_SECOND_ID_I), ++ HWS_DEFINER_ENTRY(IPV4_IHL_O), ++ HWS_DEFINER_ENTRY(IPV4_IHL_I), ++ HWS_DEFINER_ENTRY(IP_DSCP_O), ++ HWS_DEFINER_ENTRY(IP_DSCP_I), ++ HWS_DEFINER_ENTRY(IP_ECN_O), ++ HWS_DEFINER_ENTRY(IP_ECN_I), ++ HWS_DEFINER_ENTRY(IP_TTL_O), ++ HWS_DEFINER_ENTRY(IP_TTL_I), ++ HWS_DEFINER_ENTRY(IPV4_DST_O), ++ HWS_DEFINER_ENTRY(IPV4_DST_I), ++ HWS_DEFINER_ENTRY(IPV4_SRC_O), ++ HWS_DEFINER_ENTRY(IPV4_SRC_I), ++ HWS_DEFINER_ENTRY(IP_VERSION_O), ++ HWS_DEFINER_ENTRY(IP_VERSION_I), ++ HWS_DEFINER_ENTRY(IP_FRAG_O), ++ HWS_DEFINER_ENTRY(IP_FRAG_I), ++ HWS_DEFINER_ENTRY(IP_LEN_O), ++ HWS_DEFINER_ENTRY(IP_LEN_I), ++ HWS_DEFINER_ENTRY(IP_TOS_O), ++ HWS_DEFINER_ENTRY(IP_TOS_I), ++ HWS_DEFINER_ENTRY(IPV6_FLOW_LABEL_O), ++ HWS_DEFINER_ENTRY(IPV6_FLOW_LABEL_I), ++ HWS_DEFINER_ENTRY(IPV6_DST_127_96_O), ++ HWS_DEFINER_ENTRY(IPV6_DST_95_64_O), ++ HWS_DEFINER_ENTRY(IPV6_DST_63_32_O), ++ HWS_DEFINER_ENTRY(IPV6_DST_31_0_O), ++ HWS_DEFINER_ENTRY(IPV6_DST_127_96_I), ++ HWS_DEFINER_ENTRY(IPV6_DST_95_64_I), ++ HWS_DEFINER_ENTRY(IPV6_DST_63_32_I), ++ HWS_DEFINER_ENTRY(IPV6_DST_31_0_I), ++ HWS_DEFINER_ENTRY(IPV6_SRC_127_96_O), ++ HWS_DEFINER_ENTRY(IPV6_SRC_95_64_O), ++ HWS_DEFINER_ENTRY(IPV6_SRC_63_32_O), ++ HWS_DEFINER_ENTRY(IPV6_SRC_31_0_O), ++ HWS_DEFINER_ENTRY(IPV6_SRC_127_96_I), ++ HWS_DEFINER_ENTRY(IPV6_SRC_95_64_I), ++ HWS_DEFINER_ENTRY(IPV6_SRC_63_32_I), ++ HWS_DEFINER_ENTRY(IPV6_SRC_31_0_I), ++ HWS_DEFINER_ENTRY(IP_PROTOCOL_O), ++ HWS_DEFINER_ENTRY(IP_PROTOCOL_I), ++ HWS_DEFINER_ENTRY(L4_SPORT_O), ++ HWS_DEFINER_ENTRY(L4_SPORT_I), ++ HWS_DEFINER_ENTRY(L4_DPORT_O), ++ HWS_DEFINER_ENTRY(L4_DPORT_I), ++ HWS_DEFINER_ENTRY(TCP_FLAGS_I), ++ HWS_DEFINER_ENTRY(TCP_FLAGS_O), ++ HWS_DEFINER_ENTRY(TCP_SEQ_NUM), ++ HWS_DEFINER_ENTRY(TCP_ACK_NUM), ++ HWS_DEFINER_ENTRY(GTP_TEID), ++ HWS_DEFINER_ENTRY(GTP_MSG_TYPE), ++ HWS_DEFINER_ENTRY(GTP_EXT_FLAG), ++ HWS_DEFINER_ENTRY(GTP_NEXT_EXT_HDR), ++ HWS_DEFINER_ENTRY(GTP_EXT_HDR_PDU), ++ HWS_DEFINER_ENTRY(GTP_EXT_HDR_QFI), ++ HWS_DEFINER_ENTRY(GTPU_DW0), ++ HWS_DEFINER_ENTRY(GTPU_FIRST_EXT_DW0), ++ HWS_DEFINER_ENTRY(GTPU_DW2), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_0), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_1), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_2), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_3), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_4), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_5), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_6), ++ HWS_DEFINER_ENTRY(FLEX_PARSER_7), ++ HWS_DEFINER_ENTRY(VPORT_REG_C_0), ++ HWS_DEFINER_ENTRY(VXLAN_FLAGS), ++ HWS_DEFINER_ENTRY(VXLAN_VNI), ++ HWS_DEFINER_ENTRY(VXLAN_GPE_FLAGS), ++ HWS_DEFINER_ENTRY(VXLAN_GPE_RSVD0), ++ HWS_DEFINER_ENTRY(VXLAN_GPE_PROTO), ++ HWS_DEFINER_ENTRY(VXLAN_GPE_VNI), ++ HWS_DEFINER_ENTRY(VXLAN_GPE_RSVD1), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_LEN), ++ HWS_DEFINER_ENTRY(GENEVE_OAM), ++ HWS_DEFINER_ENTRY(GENEVE_PROTO), ++ HWS_DEFINER_ENTRY(GENEVE_VNI), ++ HWS_DEFINER_ENTRY(SOURCE_QP), ++ HWS_DEFINER_ENTRY(SOURCE_GVMI), ++ HWS_DEFINER_ENTRY(REG_0), ++ HWS_DEFINER_ENTRY(REG_1), ++ HWS_DEFINER_ENTRY(REG_2), ++ HWS_DEFINER_ENTRY(REG_3), ++ HWS_DEFINER_ENTRY(REG_4), ++ HWS_DEFINER_ENTRY(REG_5), ++ HWS_DEFINER_ENTRY(REG_6), ++ HWS_DEFINER_ENTRY(REG_7), ++ HWS_DEFINER_ENTRY(REG_8), ++ HWS_DEFINER_ENTRY(REG_9), ++ HWS_DEFINER_ENTRY(REG_10), ++ HWS_DEFINER_ENTRY(REG_11), ++ HWS_DEFINER_ENTRY(REG_A), ++ HWS_DEFINER_ENTRY(REG_B), ++ HWS_DEFINER_ENTRY(GRE_KEY_PRESENT), ++ HWS_DEFINER_ENTRY(GRE_C), ++ HWS_DEFINER_ENTRY(GRE_K), ++ HWS_DEFINER_ENTRY(GRE_S), ++ HWS_DEFINER_ENTRY(GRE_PROTOCOL), ++ HWS_DEFINER_ENTRY(GRE_OPT_KEY), ++ HWS_DEFINER_ENTRY(GRE_OPT_SEQ), ++ HWS_DEFINER_ENTRY(GRE_OPT_CHECKSUM), ++ HWS_DEFINER_ENTRY(INTEGRITY_O), ++ HWS_DEFINER_ENTRY(INTEGRITY_I), ++ HWS_DEFINER_ENTRY(ICMP_DW1), ++ HWS_DEFINER_ENTRY(ICMP_DW2), ++ HWS_DEFINER_ENTRY(ICMP_DW3), ++ HWS_DEFINER_ENTRY(IPSEC_SPI), ++ HWS_DEFINER_ENTRY(IPSEC_SEQUENCE_NUMBER), ++ HWS_DEFINER_ENTRY(IPSEC_SYNDROME), ++ HWS_DEFINER_ENTRY(MPLS0_O), ++ HWS_DEFINER_ENTRY(MPLS1_O), ++ HWS_DEFINER_ENTRY(MPLS2_O), ++ HWS_DEFINER_ENTRY(MPLS3_O), ++ HWS_DEFINER_ENTRY(MPLS4_O), ++ HWS_DEFINER_ENTRY(MPLS0_I), ++ HWS_DEFINER_ENTRY(MPLS1_I), ++ HWS_DEFINER_ENTRY(MPLS2_I), ++ HWS_DEFINER_ENTRY(MPLS3_I), ++ HWS_DEFINER_ENTRY(MPLS4_I), ++ HWS_DEFINER_ENTRY(FLEX_PARSER0_OK), ++ HWS_DEFINER_ENTRY(FLEX_PARSER1_OK), ++ HWS_DEFINER_ENTRY(FLEX_PARSER2_OK), ++ HWS_DEFINER_ENTRY(FLEX_PARSER3_OK), ++ HWS_DEFINER_ENTRY(FLEX_PARSER4_OK), ++ HWS_DEFINER_ENTRY(FLEX_PARSER5_OK), ++ HWS_DEFINER_ENTRY(FLEX_PARSER6_OK), ++ HWS_DEFINER_ENTRY(FLEX_PARSER7_OK), ++ HWS_DEFINER_ENTRY(OKS2_MPLS0_O), ++ HWS_DEFINER_ENTRY(OKS2_MPLS1_O), ++ HWS_DEFINER_ENTRY(OKS2_MPLS2_O), ++ HWS_DEFINER_ENTRY(OKS2_MPLS3_O), ++ HWS_DEFINER_ENTRY(OKS2_MPLS4_O), ++ HWS_DEFINER_ENTRY(OKS2_MPLS0_I), ++ HWS_DEFINER_ENTRY(OKS2_MPLS1_I), ++ HWS_DEFINER_ENTRY(OKS2_MPLS2_I), ++ HWS_DEFINER_ENTRY(OKS2_MPLS3_I), ++ HWS_DEFINER_ENTRY(OKS2_MPLS4_I), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_0), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_1), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_2), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_3), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_4), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_5), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_6), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_OK_7), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_0), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_1), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_2), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_3), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_4), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_5), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_6), ++ HWS_DEFINER_ENTRY(GENEVE_OPT_DW_7), ++ HWS_DEFINER_ENTRY(IB_L4_OPCODE), ++ HWS_DEFINER_ENTRY(IB_L4_QPN), ++ HWS_DEFINER_ENTRY(IB_L4_A), ++ HWS_DEFINER_ENTRY(RANDOM_NUM), ++ HWS_DEFINER_ENTRY(PTYPE_L2_O), ++ HWS_DEFINER_ENTRY(PTYPE_L2_I), ++ HWS_DEFINER_ENTRY(PTYPE_L3_O), ++ HWS_DEFINER_ENTRY(PTYPE_L3_I), ++ HWS_DEFINER_ENTRY(PTYPE_L4_O), ++ HWS_DEFINER_ENTRY(PTYPE_L4_I), ++ HWS_DEFINER_ENTRY(PTYPE_L4_EXT_O), ++ HWS_DEFINER_ENTRY(PTYPE_L4_EXT_I), ++ HWS_DEFINER_ENTRY(PTYPE_FRAG_O), ++ HWS_DEFINER_ENTRY(PTYPE_FRAG_I), ++ HWS_DEFINER_ENTRY(TNL_HDR_0), ++ HWS_DEFINER_ENTRY(TNL_HDR_1), ++ HWS_DEFINER_ENTRY(TNL_HDR_2), ++ HWS_DEFINER_ENTRY(TNL_HDR_3), ++ [MLX5HWS_DEFINER_FNAME_MAX] = "DEFINER_FNAME_UNKNOWN", ++}; ++ ++const char *mlx5hws_definer_fname_to_str(enum mlx5hws_definer_fname fname) ++{ ++ if (fname > MLX5HWS_DEFINER_FNAME_MAX) ++ fname = MLX5HWS_DEFINER_FNAME_MAX; ++ return hws_definer_fname_to_str[fname]; ++} ++ + static void + hws_definer_ones_set(struct mlx5hws_definer_fc *fc, + void *match_param, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h +index 5c1a2086efba..62da55389331 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h +@@ -831,4 +831,6 @@ mlx5hws_definer_conv_match_params_to_compressed_fc(struct mlx5hws_context *ctx, + u32 *match_param, + int *fc_sz); + ++const char *mlx5hws_definer_fname_to_str(enum mlx5hws_definer_fname fname); ++ + #endif /* HWS_DEFINER_H_ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1367-net-mlx5-hws-expose-polling-function-in-header-file.patch b/SOURCES/1367-net-mlx5-hws-expose-polling-function-in-header-file.patch new file mode 100644 index 000000000..9045630c2 --- /dev/null +++ b/SOURCES/1367-net-mlx5-hws-expose-polling-function-in-header-file.patch @@ -0,0 +1,120 @@ +From 7287dfe961f39c52d8e7fbc0a719036457df2b41 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:04 -0400 +Subject: [PATCH] net/mlx5: HWS, expose polling function in header file + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3c739d1624e3c3186a0a0248e91851a085f6e45b +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:03 2025 +0300 + + net/mlx5: HWS, expose polling function in header file + + In preparation for complex matcher, expose the function that is + polling queue for completion (mlx5hws_bwc_queue_poll) in header + file, so that it will be used by complex matcher code. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 510bfbbe5991..27b6420678d8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -223,10 +223,10 @@ int mlx5hws_bwc_matcher_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) + return 0; + } + +-static int hws_bwc_queue_poll(struct mlx5hws_context *ctx, +- u16 queue_id, +- u32 *pending_rules, +- bool drain) ++int mlx5hws_bwc_queue_poll(struct mlx5hws_context *ctx, ++ u16 queue_id, ++ u32 *pending_rules, ++ bool drain) + { + unsigned long timeout = jiffies + + secs_to_jiffies(MLX5HWS_BWC_POLLING_TIMEOUT); +@@ -361,7 +361,8 @@ hws_bwc_rule_destroy_hws_sync(struct mlx5hws_bwc_rule *bwc_rule, + if (unlikely(ret)) + return ret; + +- ret = hws_bwc_queue_poll(ctx, rule_attr->queue_id, &expected_completions, true); ++ ret = mlx5hws_bwc_queue_poll(ctx, rule_attr->queue_id, ++ &expected_completions, true); + if (unlikely(ret)) + return ret; + +@@ -442,9 +443,8 @@ hws_bwc_rule_create_sync(struct mlx5hws_bwc_rule *bwc_rule, + if (unlikely(ret)) + return ret; + +- ret = hws_bwc_queue_poll(ctx, rule_attr->queue_id, &expected_completions, true); +- +- return ret; ++ return mlx5hws_bwc_queue_poll(ctx, rule_attr->queue_id, ++ &expected_completions, true); + } + + static int +@@ -465,7 +465,8 @@ hws_bwc_rule_update_sync(struct mlx5hws_bwc_rule *bwc_rule, + if (unlikely(ret)) + return ret; + +- ret = hws_bwc_queue_poll(ctx, rule_attr->queue_id, &expected_completions, true); ++ ret = mlx5hws_bwc_queue_poll(ctx, rule_attr->queue_id, ++ &expected_completions, true); + if (unlikely(ret)) + mlx5hws_err(ctx, "Failed updating BWC rule (%d)\n", ret); + +@@ -651,8 +652,10 @@ static int hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_match + &bwc_matcher->rules[i]) ? + NULL : list_next_entry(bwc_rules[i], list_node); + +- ret = hws_bwc_queue_poll(ctx, rule_attr.queue_id, +- &pending_rules[i], false); ++ ret = mlx5hws_bwc_queue_poll(ctx, ++ rule_attr.queue_id, ++ &pending_rules[i], ++ false); + if (unlikely(ret)) { + mlx5hws_err(ctx, + "Moving BWC rule failed during rehash (%d)\n", +@@ -669,8 +672,8 @@ static int hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_match + u16 queue_id = mlx5hws_bwc_get_queue_id(ctx, i); + + mlx5hws_send_engine_flush_queue(&ctx->send_queue[queue_id]); +- ret = hws_bwc_queue_poll(ctx, queue_id, +- &pending_rules[i], true); ++ ret = mlx5hws_bwc_queue_poll(ctx, queue_id, ++ &pending_rules[i], true); + if (unlikely(ret)) { + mlx5hws_err(ctx, + "Moving BWC rule failed during rehash (%d)\n", ret); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +index bb0cf4b922ce..a2aa2d5da694 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +@@ -64,6 +64,11 @@ void mlx5hws_bwc_rule_fill_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + u32 flow_source, + struct mlx5hws_rule_attr *rule_attr); + ++int mlx5hws_bwc_queue_poll(struct mlx5hws_context *ctx, ++ u16 queue_id, ++ u32 *pending_rules, ++ bool drain); ++ + static inline u16 mlx5hws_bwc_queues(struct mlx5hws_context *ctx) + { + /* Besides the control queue, half of the queues are +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1368-net-mlx5-hws-introduce-isolated-matchers.patch b/SOURCES/1368-net-mlx5-hws-introduce-isolated-matchers.patch new file mode 100644 index 000000000..12261d7d8 --- /dev/null +++ b/SOURCES/1368-net-mlx5-hws-introduce-isolated-matchers.patch @@ -0,0 +1,414 @@ +From e52f06951f1709e7bd3b78b3c4932d5fc69d10eb Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:04 -0400 +Subject: [PATCH] net/mlx5: HWS, introduce isolated matchers + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b816743a182f532faaeaa9aaed147ff09513e375 +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:04 2025 +0300 + + net/mlx5: HWS, introduce isolated matchers + + In preparation for complex matcher support, introduce the isolated + matcher. + + Isolated matcher is a matcher that has its own isolated table. + It is used as the second half of the complex matcher: when the rule + is split into two parts (complex rule), then matching on the first + part will send the packet to the isolated matcher that will try to + match on the second part. In case of miss, the packet goes back to + the matcher's end flow table. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index 5b0c1623499b..ce28ee1c0e41 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -23,19 +23,199 @@ static void hws_matcher_destroy_end_ft(struct mlx5hws_matcher *matcher) + mlx5hws_table_destroy_default_ft(matcher->tbl, matcher->end_ft_id); + } + ++int mlx5hws_matcher_update_end_ft_isolated(struct mlx5hws_table *tbl, ++ u32 miss_ft_id) ++{ ++ struct mlx5hws_matcher *tmp_matcher; ++ ++ if (list_empty(&tbl->matchers_list)) ++ return -EINVAL; ++ ++ /* Update isolated_matcher_end_ft_id attribute for all ++ * the matchers in isolated table. ++ */ ++ list_for_each_entry(tmp_matcher, &tbl->matchers_list, list_node) ++ tmp_matcher->attr.isolated_matcher_end_ft_id = miss_ft_id; ++ ++ tmp_matcher = list_last_entry(&tbl->matchers_list, ++ struct mlx5hws_matcher, ++ list_node); ++ ++ return mlx5hws_table_ft_set_next_ft(tbl->ctx, ++ tmp_matcher->end_ft_id, ++ tbl->fw_ft_type, ++ miss_ft_id); ++} ++ ++static int hws_matcher_connect_end_ft_isolated(struct mlx5hws_matcher *matcher) ++{ ++ struct mlx5hws_table *tbl = matcher->tbl; ++ u32 end_ft_id; ++ int ret; ++ ++ /* Reset end_ft next RTCs */ ++ ret = mlx5hws_table_ft_set_next_rtc(tbl->ctx, ++ matcher->end_ft_id, ++ matcher->tbl->fw_ft_type, ++ 0, 0); ++ if (ret) { ++ mlx5hws_err(tbl->ctx, "Isolated matcher: failed to reset FT's next RTCs\n"); ++ return ret; ++ } ++ ++ /* Connect isolated matcher's end_ft to the complex matcher's end FT */ ++ end_ft_id = matcher->attr.isolated_matcher_end_ft_id; ++ ret = mlx5hws_table_ft_set_next_ft(tbl->ctx, ++ matcher->end_ft_id, ++ matcher->tbl->fw_ft_type, ++ end_ft_id); ++ ++ if (ret) { ++ mlx5hws_err(tbl->ctx, "Isolated matcher: failed to set FT's miss_ft_id\n"); ++ return ret; ++ } ++ ++ return 0; ++} ++ ++static int hws_matcher_create_end_ft_isolated(struct mlx5hws_matcher *matcher) ++{ ++ struct mlx5hws_table *tbl = matcher->tbl; ++ int ret; ++ ++ ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, ++ tbl, ++ &matcher->end_ft_id); ++ if (ret) { ++ mlx5hws_err(tbl->ctx, "Isolated matcher: failed to create end flow table\n"); ++ return ret; ++ } ++ ++ ret = hws_matcher_connect_end_ft_isolated(matcher); ++ if (ret) { ++ mlx5hws_err(tbl->ctx, "Isolated matcher: failed to connect end FT\n"); ++ goto destroy_default_ft; ++ } ++ ++ return 0; ++ ++destroy_default_ft: ++ mlx5hws_table_destroy_default_ft(tbl, matcher->end_ft_id); ++ return ret; ++} ++ + static int hws_matcher_create_end_ft(struct mlx5hws_matcher *matcher) + { + struct mlx5hws_table *tbl = matcher->tbl; + int ret; + +- ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, tbl, &matcher->end_ft_id); ++ if (mlx5hws_matcher_is_isolated(matcher)) ++ ret = hws_matcher_create_end_ft_isolated(matcher); ++ else ++ ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, tbl, ++ &matcher->end_ft_id); ++ + if (ret) { + mlx5hws_err(tbl->ctx, "Failed to create matcher end flow table\n"); + return ret; + } ++ ++ return 0; ++} ++ ++static int hws_matcher_connect_isolated_first(struct mlx5hws_matcher *matcher) ++{ ++ struct mlx5hws_table *tbl = matcher->tbl; ++ struct mlx5hws_context *ctx = tbl->ctx; ++ int ret; ++ ++ /* Isolated matcher's end_ft is already pointing to the end_ft ++ * of the complex matcher - it was set at creation of end_ft, ++ * so no need to connect it. ++ * We still need to connect the isolated table's start FT to ++ * this matcher's RTC. ++ */ ++ ret = mlx5hws_table_ft_set_next_rtc(ctx, ++ tbl->ft_id, ++ tbl->fw_ft_type, ++ matcher->match_ste.rtc_0_id, ++ matcher->match_ste.rtc_1_id); ++ if (ret) { ++ mlx5hws_err(ctx, "Isolated matcher: failed to connect start FT to match RTC\n"); ++ return ret; ++ } ++ ++ /* Reset table's FT default miss (drop refcount) */ ++ ret = mlx5hws_table_ft_set_default_next_ft(tbl, tbl->ft_id); ++ if (ret) { ++ mlx5hws_err(ctx, "Isolated matcher: failed to reset table ft default miss\n"); ++ return ret; ++ } ++ ++ list_add(&matcher->list_node, &tbl->matchers_list); ++ ++ return ret; ++} ++ ++static int hws_matcher_connect_isolated_last(struct mlx5hws_matcher *matcher) ++{ ++ struct mlx5hws_table *tbl = matcher->tbl; ++ struct mlx5hws_context *ctx = tbl->ctx; ++ struct mlx5hws_matcher *last; ++ int ret; ++ ++ last = list_last_entry(&tbl->matchers_list, ++ struct mlx5hws_matcher, ++ list_node); ++ ++ /* New matcher's end_ft is already pointing to the end_ft of ++ * the complex matcher. ++ * Connect previous matcher's end_ft to this new matcher RTC. ++ */ ++ ret = mlx5hws_table_ft_set_next_rtc(ctx, ++ last->end_ft_id, ++ tbl->fw_ft_type, ++ matcher->match_ste.rtc_0_id, ++ matcher->match_ste.rtc_1_id); ++ if (ret) { ++ mlx5hws_err(ctx, ++ "Isolated matcher: failed to connect matcher end_ft to new match RTC\n"); ++ return ret; ++ } ++ ++ /* Reset prev matcher FT default miss (drop refcount) */ ++ ret = mlx5hws_table_ft_set_default_next_ft(tbl, last->end_ft_id); ++ if (ret) { ++ mlx5hws_err(ctx, "Isolated matcher: failed to reset matcher ft default miss\n"); ++ return ret; ++ } ++ ++ /* Insert after the last matcher */ ++ list_add(&matcher->list_node, &last->list_node); ++ + return 0; + } + ++static int hws_matcher_connect_isolated(struct mlx5hws_matcher *matcher) ++{ ++ /* Isolated matcher is expected to be the only one in its table. ++ * However, it can have a collision matcher, and it can go through ++ * rehash process, in which case we will temporary have both old and ++ * new matchers in the isolated table. ++ * Check if this is the first matcher in the isolated table. ++ */ ++ if (list_empty(&matcher->tbl->matchers_list)) ++ return hws_matcher_connect_isolated_first(matcher); ++ ++ /* If this wasn't the first matcher, then we have 3 possible cases: ++ * - this is a collision matcher for the first matcher ++ * - this is a new rehash dest matcher ++ * - this is a collision matcher for the new rehash dest matcher ++ * The logic to add new matcher is the same for all these cases. ++ */ ++ return hws_matcher_connect_isolated_last(matcher); ++} ++ + static int hws_matcher_connect(struct mlx5hws_matcher *matcher) + { + struct mlx5hws_table *tbl = matcher->tbl; +@@ -45,6 +225,9 @@ static int hws_matcher_connect(struct mlx5hws_matcher *matcher) + struct mlx5hws_matcher *tmp_matcher; + int ret; + ++ if (mlx5hws_matcher_is_isolated(matcher)) ++ return hws_matcher_connect_isolated(matcher); ++ + /* Find location in matcher list */ + if (list_empty(&tbl->matchers_list)) { + list_add(&matcher->list_node, &tbl->matchers_list); +@@ -121,6 +304,92 @@ static int hws_matcher_connect(struct mlx5hws_matcher *matcher) + return ret; + } + ++static int hws_matcher_disconnect_isolated(struct mlx5hws_matcher *matcher) ++{ ++ struct mlx5hws_matcher *first, *last, *prev, *next; ++ struct mlx5hws_table *tbl = matcher->tbl; ++ struct mlx5hws_context *ctx = tbl->ctx; ++ u32 end_ft_id; ++ int ret; ++ ++ first = list_first_entry(&tbl->matchers_list, ++ struct mlx5hws_matcher, ++ list_node); ++ last = list_last_entry(&tbl->matchers_list, ++ struct mlx5hws_matcher, ++ list_node); ++ prev = list_prev_entry(matcher, list_node); ++ next = list_next_entry(matcher, list_node); ++ ++ list_del_init(&matcher->list_node); ++ ++ if (first == last) { ++ /* This was the only matcher in the list. ++ * Reset isolated table FT next RTCs and connect it ++ * to the whole complex matcher end FT instead. ++ */ ++ ret = mlx5hws_table_ft_set_next_rtc(ctx, ++ tbl->ft_id, ++ tbl->fw_ft_type, ++ 0, 0); ++ if (ret) { ++ mlx5hws_err(tbl->ctx, "Isolated matcher: failed to reset FT's next RTCs\n"); ++ return ret; ++ } ++ ++ end_ft_id = matcher->attr.isolated_matcher_end_ft_id; ++ ret = mlx5hws_table_ft_set_next_ft(tbl->ctx, ++ tbl->ft_id, ++ tbl->fw_ft_type, ++ end_ft_id); ++ if (ret) { ++ mlx5hws_err(tbl->ctx, "Isolated matcher: failed to set FT's miss_ft_id\n"); ++ return ret; ++ } ++ ++ return 0; ++ } ++ ++ /* At this point we know that there are more matchers in the list */ ++ ++ if (matcher == first) { ++ /* We've disconnected the first matcher. ++ * Now update isolated table default FT. ++ */ ++ if (!next) ++ return -EINVAL; ++ return mlx5hws_table_ft_set_next_rtc(ctx, ++ tbl->ft_id, ++ tbl->fw_ft_type, ++ next->match_ste.rtc_0_id, ++ next->match_ste.rtc_1_id); ++ } ++ ++ if (matcher == last) { ++ /* If we've disconnected the last matcher - update prev ++ * matcher's end_ft to point to the complex matcher end_ft. ++ */ ++ if (!prev) ++ return -EINVAL; ++ return hws_matcher_connect_end_ft_isolated(prev); ++ } ++ ++ /* This wasn't the first or the last matcher, which means that it has ++ * both prev and next matchers. Note that this only happens if we're ++ * disconnecting collision matcher of the old matcher during rehash. ++ */ ++ if (!prev || !next || ++ !(matcher->flags & MLX5HWS_MATCHER_FLAGS_COLLISION)) ++ return -EINVAL; ++ ++ /* Update prev end FT to point to next match RTC */ ++ return mlx5hws_table_ft_set_next_rtc(ctx, ++ prev->end_ft_id, ++ tbl->fw_ft_type, ++ next->match_ste.rtc_0_id, ++ next->match_ste.rtc_1_id); ++} ++ + static int hws_matcher_disconnect(struct mlx5hws_matcher *matcher) + { + struct mlx5hws_matcher *next = NULL, *prev = NULL; +@@ -128,6 +397,9 @@ static int hws_matcher_disconnect(struct mlx5hws_matcher *matcher) + u32 prev_ft_id = tbl->ft_id; + int ret; + ++ if (mlx5hws_matcher_is_isolated(matcher)) ++ return hws_matcher_disconnect_isolated(matcher); ++ + if (!list_is_first(&matcher->list_node, &tbl->matchers_list)) { + prev = list_prev_entry(matcher, list_node); + prev_ft_id = prev->end_ft_id; +@@ -531,6 +803,8 @@ hws_matcher_process_attr(struct mlx5hws_cmd_query_caps *caps, + attr->table.sz_col_log = hws_matcher_rules_to_tbl_depth(attr->rule.num_log); + + matcher->flags |= attr->resizable ? MLX5HWS_MATCHER_FLAGS_RESIZABLE : 0; ++ matcher->flags |= attr->isolated_matcher_end_ft_id ? ++ MLX5HWS_MATCHER_FLAGS_ISOLATED : 0; + + return hws_matcher_check_attr_sz(caps, matcher); + } +@@ -617,6 +891,8 @@ hws_matcher_create_col_matcher(struct mlx5hws_matcher *matcher) + col_matcher->attr.table.sz_row_log -= MLX5HWS_MATCHER_ASSURED_ROW_RATIO; + + col_matcher->attr.max_num_of_at_attach = matcher->attr.max_num_of_at_attach; ++ col_matcher->attr.isolated_matcher_end_ft_id = ++ matcher->attr.isolated_matcher_end_ft_id; + + ret = hws_matcher_process_attr(ctx->caps, col_matcher); + if (ret) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +index 8e95158a66b5..32e83cddcd60 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +@@ -34,6 +34,7 @@ enum mlx5hws_matcher_offset { + enum mlx5hws_matcher_flags { + MLX5HWS_MATCHER_FLAGS_COLLISION = 1 << 2, + MLX5HWS_MATCHER_FLAGS_RESIZABLE = 1 << 3, ++ MLX5HWS_MATCHER_FLAGS_ISOLATED = 1 << 4, + }; + + struct mlx5hws_match_template { +@@ -96,9 +97,17 @@ static inline bool mlx5hws_matcher_is_in_resize(struct mlx5hws_matcher *matcher) + return !!matcher->resize_dst; + } + ++static inline bool mlx5hws_matcher_is_isolated(struct mlx5hws_matcher *matcher) ++{ ++ return !!(matcher->flags & MLX5HWS_MATCHER_FLAGS_ISOLATED); ++} ++ + static inline bool mlx5hws_matcher_is_insert_by_idx(struct mlx5hws_matcher *matcher) + { + return matcher->attr.insert_mode == MLX5HWS_MATCHER_INSERT_BY_INDEX; + } + ++int mlx5hws_matcher_update_end_ft_isolated(struct mlx5hws_table *tbl, ++ u32 miss_ft_id); ++ + #endif /* HWS_MATCHER_H_ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index 5121951f2778..fbd63369da10 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -119,6 +119,8 @@ struct mlx5hws_matcher_attr { + }; + /* Optional AT attach configuration - Max number of additional AT */ + u8 max_num_of_at_attach; ++ /* Optional end FT (miss FT ID) for match RTC (for isolated matcher) */ ++ u32 isolated_matcher_end_ft_id; + }; + + struct mlx5hws_rule_attr { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1369-net-mlx5-hws-support-complex-matchers.patch b/SOURCES/1369-net-mlx5-hws-support-complex-matchers.patch new file mode 100644 index 000000000..3c42b15a3 --- /dev/null +++ b/SOURCES/1369-net-mlx5-hws-support-complex-matchers.patch @@ -0,0 +1,1740 @@ +From fb9de5c73627ea7ac0cba7da4183aa9db9bd832b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:04 -0400 +Subject: [PATCH] net/mlx5: HWS, support complex matchers +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 17e0accac577fd6ea2090934d71a8c6f36702a26 +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:05 2025 +0300 + + net/mlx5: HWS, support complex matchers + + This patch adds support for Complex Matchers/Rules + + Overview: + -------- + + A matcher can match on a certain set of match parameters. However, the + number and size of match params for a single matcher are limited: all + the parameters must fit within a single definer. + + A common example of this limitation is IPv6 address matching, where + matching both source and destination IPs requires more bits than a + single definer can support. + + SW Steering addresses this limitation by chaining multiple Steering + Table Entries (STEs) within the same matcher, where each STE matches + on a subset of the parameters. + + In HW Steering, such chaining is not possible — the matcher's STEs + are managed in a hash table, and a single definer is used to calculate + the hash index for STEs. + + To address this limitation in HW Steering, we introduce Complex + Matchers, which consist of two chained matchers. This allows matching + on twice as many parameters. Complex Matchers are filled with Complex + Rules — rules that are split into two parts and inserted into their + respective matchers. + + The first half of the Complex Matcher is a regular matcher and points + to the second half, which is an Isolated Matcher. An Isolated Matcher + has its own isolated table and is accessible only by traffic coming + from the first half of the Complex Matcher. + + This splitting of matchers/rules into multiple parts is transparent to + users. It is hidden under the BWC HWS API. It becomes visible only when + dumping steering debug information, where the Complex Matcher appears + as two separate matchers: one in the user-created table and another + in its isolated table. + + Some implementation details: + --------------------------- + + All user actions are performed on the second part of the rules only. + The first part handles matching and applies two actions: modify header + (set metadata, see details below) and go-to-table (directing traffic to + the isolated table containing the isolated matcher). + + Rule updates (updating rule actions) are applied to the second part of + the rule since user-provided actions are not executed in the first + matcher. + + We use REG_C_6 metadata register to set and match on unique per-rule + tag (see details below). + + Splitting rules into two parts introduces new challenges: + + 1. Invalid Combinations + + Consider two rules with different matching values: + - Rule 1: A+B + - Rule 2: C+D + + Let's split the rules into two parts as follows: + + |---| |---| + | A | | B | + |---| --> |---| + | C | | D | + |---| |---| + + Splitting these rules results in invalid combinations like A+D + and C+B. + + To resolve this, we assign unique tags to each rule on the first + matcher and match these tags on the second matcher (the tag is + implemented through modify_hdr action that sets value to metadata + register REG_C_6): + + |----------| |---------| + | A | | B, TagA | + | action: | | | + | set TagA | | | + |----------| --> |---------| + | C | | D, TagB | + | action: | | | + | set TagB | | | + |----------| |---------| + + 2. Duplicated Entries: + + Consider two rules with overlapping values: + - Rule 1: A+B + - Rule 2: A+D + + Let's split the rules into two parts as follows: + + |---| |---| + | A | | B | + |---| --> |---| + | | | D | + |---| |---| + + This leads to the duplicated entries on the first matcher, which HWS + doesn't allow: subsequent delete of either of the rules will delete + the only entry in the first matcher, leaving the remaining rule + broken. + + To address this, we use a reference count for entries in the first + matcher and delete STEs only when their refcount reaches zero. + + Both challenges are resolved by having a per-matcher data structure + (implemented with rhashtable) that manages refcounts for the first part + of the rules and holds unique tags (managed via IDA) for these rules to + set and to match on the second matcher. + + Limitations: + ----------- + + We utilize metadata register REG_C_6 in this implementation, so its + usage anywhere along the steering of the flow that might include the + need for Complex Matcher is prohibited. + + The number and size of match parameters remain limited — now it is + constrained by what can be represented by two definers instead of one. + This architectural limitation arises from the structure of Complex + Matchers. If future requirements demand more parameters, + Complex Matchers can be extended beyond two matchers. + + Additionally, there is an implementation limit of 32 match parameters + per rule (disregarding parameter size). This limit can be lifted if + needed. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 27b6420678d8..d70db6948dbb 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -46,10 +46,14 @@ static void hws_bwc_unlock_all_queues(struct mlx5hws_context *ctx) + } + } + +-static void hws_bwc_matcher_init_attr(struct mlx5hws_matcher_attr *attr, ++static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + u32 priority, +- u8 size_log) ++ u8 size_log, ++ struct mlx5hws_matcher_attr *attr) + { ++ struct mlx5hws_bwc_matcher *first_matcher = ++ bwc_matcher->complex_first_bwc_matcher; ++ + memset(attr, 0, sizeof(*attr)); + + attr->priority = priority; +@@ -61,6 +65,9 @@ static void hws_bwc_matcher_init_attr(struct mlx5hws_matcher_attr *attr, + attr->rule.num_log = size_log; + attr->resizable = true; + attr->max_num_of_at_attach = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM; ++ ++ attr->isolated_matcher_end_ft_id = ++ first_matcher ? first_matcher->matcher->end_ft_id : 0; + } + + int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, +@@ -83,9 +90,10 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + for (i = 0; i < bwc_queues; i++) + INIT_LIST_HEAD(&bwc_matcher->rules[i]); + +- hws_bwc_matcher_init_attr(&attr, ++ hws_bwc_matcher_init_attr(bwc_matcher, + priority, +- MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG); ++ MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG, ++ &attr); + + bwc_matcher->priority = priority; + bwc_matcher->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG; +@@ -217,7 +225,10 @@ int mlx5hws_bwc_matcher_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) + "BWC matcher destroy: matcher still has %d rules\n", + num_of_rules); + +- mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); ++ if (bwc_matcher->complex) ++ mlx5hws_bwc_matcher_destroy_complex(bwc_matcher); ++ else ++ mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); + + kfree(bwc_matcher); + return 0; +@@ -401,9 +412,13 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule) + + int mlx5hws_bwc_rule_destroy(struct mlx5hws_bwc_rule *bwc_rule) + { +- int ret; ++ bool is_complex = !!bwc_rule->bwc_matcher->complex; ++ int ret = 0; + +- ret = mlx5hws_bwc_rule_destroy_simple(bwc_rule); ++ if (is_complex) ++ ret = mlx5hws_bwc_rule_destroy_complex(bwc_rule); ++ else ++ ret = mlx5hws_bwc_rule_destroy_simple(bwc_rule); + + mlx5hws_bwc_rule_free(bwc_rule); + return ret; +@@ -692,7 +707,10 @@ static int hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_match + + static int hws_bwc_matcher_move_all(struct mlx5hws_bwc_matcher *bwc_matcher) + { +- return hws_bwc_matcher_move_all_simple(bwc_matcher); ++ if (!bwc_matcher->complex) ++ return hws_bwc_matcher_move_all_simple(bwc_matcher); ++ ++ return mlx5hws_bwc_matcher_move_all_complex(bwc_matcher); + } + + static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) +@@ -703,9 +721,10 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) + struct mlx5hws_matcher *new_matcher; + int ret; + +- hws_bwc_matcher_init_attr(&matcher_attr, ++ hws_bwc_matcher_init_attr(bwc_matcher, + bwc_matcher->priority, +- bwc_matcher->size_log); ++ bwc_matcher->size_log, ++ &matcher_attr); + + old_matcher = bwc_matcher->matcher; + new_matcher = mlx5hws_matcher_create(old_matcher->tbl, +@@ -910,11 +929,18 @@ mlx5hws_bwc_rule_create(struct mlx5hws_bwc_matcher *bwc_matcher, + + bwc_queue_idx = hws_bwc_gen_queue_idx(ctx); + +- ret = mlx5hws_bwc_rule_create_simple(bwc_rule, +- params->match_buf, +- rule_actions, +- flow_source, +- bwc_queue_idx); ++ if (bwc_matcher->complex) ++ ret = mlx5hws_bwc_rule_create_complex(bwc_rule, ++ params, ++ flow_source, ++ rule_actions, ++ bwc_queue_idx); ++ else ++ ret = mlx5hws_bwc_rule_create_simple(bwc_rule, ++ params->match_buf, ++ rule_actions, ++ flow_source, ++ bwc_queue_idx); + if (unlikely(ret)) { + mlx5hws_bwc_rule_free(bwc_rule); + return NULL; +@@ -996,5 +1022,10 @@ int mlx5hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + return -EINVAL; + } + +- return hws_bwc_rule_action_update(bwc_rule, rule_actions); ++ /* For complex rule, the update should happen on the second matcher */ ++ if (bwc_rule->isolated_bwc_rule) ++ return hws_bwc_rule_action_update(bwc_rule->isolated_bwc_rule, ++ rule_actions); ++ else ++ return hws_bwc_rule_action_update(bwc_rule, rule_actions); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +index a2aa2d5da694..cf2b65146317 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +@@ -18,10 +18,13 @@ + + #define MLX5HWS_BWC_POLLING_TIMEOUT 60 + ++struct mlx5hws_bwc_matcher_complex_data; + struct mlx5hws_bwc_matcher { + struct mlx5hws_matcher *matcher; + struct mlx5hws_match_template *mt; + struct mlx5hws_action_template **at; ++ struct mlx5hws_bwc_matcher_complex_data *complex; ++ struct mlx5hws_bwc_matcher *complex_first_bwc_matcher; + u8 num_of_at; + u8 size_of_at_array; + u8 size_log; +@@ -33,6 +36,8 @@ struct mlx5hws_bwc_matcher { + struct mlx5hws_bwc_rule { + struct mlx5hws_bwc_matcher *bwc_matcher; + struct mlx5hws_rule *rule; ++ struct mlx5hws_bwc_rule *isolated_bwc_rule; ++ struct mlx5hws_bwc_complex_rule_hash_node *complex_hash_node; + u16 bwc_queue_idx; + struct list_head list_node; + }; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +index 9fb059a6511f..5d30c5b094fc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +@@ -3,6 +3,22 @@ + + #include "internal.h" + ++#define HWS_CLEAR_MATCH_PARAM(mask, field) \ ++ MLX5_SET(fte_match_param, (mask)->match_buf, field, 0) ++ ++#define HWS_SZ_MATCH_PARAM (MLX5_ST_SZ_DW_MATCH_PARAM * 4) ++ ++static const struct rhashtable_params hws_refcount_hash = { ++ .key_len = sizeof_field(struct mlx5hws_bwc_complex_rule_hash_node, ++ match_buf), ++ .key_offset = offsetof(struct mlx5hws_bwc_complex_rule_hash_node, ++ match_buf), ++ .head_offset = offsetof(struct mlx5hws_bwc_complex_rule_hash_node, ++ hash_node), ++ .automatic_shrinking = true, ++ .min_size = 1, ++}; ++ + bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, + u8 match_criteria_enable, + struct mlx5hws_match_parameters *mask) +@@ -48,20 +64,1078 @@ bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, + return is_complex; + } + ++static void ++hws_bwc_matcher_complex_params_clear_fld(struct mlx5hws_context *ctx, ++ enum mlx5hws_definer_fname fname, ++ struct mlx5hws_match_parameters *mask) ++{ ++ struct mlx5hws_cmd_query_caps *caps = ctx->caps; ++ ++ switch (fname) { ++ case MLX5HWS_DEFINER_FNAME_ETH_TYPE_O: ++ case MLX5HWS_DEFINER_FNAME_ETH_TYPE_I: ++ case MLX5HWS_DEFINER_FNAME_ETH_L3_TYPE_O: ++ case MLX5HWS_DEFINER_FNAME_ETH_L3_TYPE_I: ++ case MLX5HWS_DEFINER_FNAME_IP_VERSION_O: ++ case MLX5HWS_DEFINER_FNAME_IP_VERSION_I: ++ /* Because of the strict requirements for IP address matching ++ * that require ethtype/ip_version matching as well, don't clear ++ * these fields - have them in both parts of the complex matcher ++ */ ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.smac_47_16); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.smac_47_16); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.smac_15_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.smac_15_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.dmac_47_16); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.dmac_47_16); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.dmac_15_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.dmac_15_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_TYPE_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.cvlan_tag); ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.svlan_tag); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_TYPE_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.cvlan_tag); ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.svlan_tag); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_FIRST_PRIO_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.first_prio); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_FIRST_PRIO_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.first_prio); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_CFI_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.first_cfi); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_CFI_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.first_cfi); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_ID_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.first_vid); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_ID_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.first_vid); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_TYPE_O: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.outer_second_cvlan_tag); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.outer_second_svlan_tag); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_TYPE_I: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.inner_second_cvlan_tag); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.inner_second_svlan_tag); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_PRIO_O: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.outer_second_prio); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_PRIO_I: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.inner_second_prio); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_CFI_O: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.outer_second_cfi); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_CFI_I: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.inner_second_cfi); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_ID_O: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.outer_second_vid); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_ID_I: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.inner_second_vid); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV4_IHL_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ipv4_ihl); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV4_IHL_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ipv4_ihl); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_DSCP_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ip_dscp); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_DSCP_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ip_dscp); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_ECN_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ip_ecn); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_ECN_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ip_ecn); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_TTL_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ttl_hoplimit); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_TTL_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ttl_hoplimit); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV4_DST_O: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV4_SRC_O: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV4_DST_I: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV4_SRC_I: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_FRAG_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.frag); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_FRAG_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.frag); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV6_FLOW_LABEL_O: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.outer_ipv6_flow_label); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV6_FLOW_LABEL_I: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.inner_ipv6_flow_label); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_O: ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_O: ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_O: ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_O: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_127_96); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_95_64); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_63_32); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_O: ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_O: ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_O: ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_O: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_127_96); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_95_64); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_63_32); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_I: ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_I: ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_I: ++ case MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_I: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_127_96); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_95_64); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_63_32); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_I: ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_I: ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_I: ++ case MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_I: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_127_96); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_95_64); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_63_32); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_PROTOCOL_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ip_protocol); ++ break; ++ case MLX5HWS_DEFINER_FNAME_IP_PROTOCOL_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ip_protocol); ++ break; ++ case MLX5HWS_DEFINER_FNAME_L4_SPORT_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.tcp_sport); ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.udp_sport); ++ break; ++ case MLX5HWS_DEFINER_FNAME_L4_SPORT_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.tcp_dport); ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.udp_dport); ++ break; ++ case MLX5HWS_DEFINER_FNAME_L4_DPORT_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.tcp_dport); ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.udp_dport); ++ break; ++ case MLX5HWS_DEFINER_FNAME_L4_DPORT_I: ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.tcp_dport); ++ HWS_CLEAR_MATCH_PARAM(mask, inner_headers.udp_dport); ++ break; ++ case MLX5HWS_DEFINER_FNAME_TCP_FLAGS_O: ++ HWS_CLEAR_MATCH_PARAM(mask, outer_headers.tcp_flags); ++ break; ++ case MLX5HWS_DEFINER_FNAME_TCP_ACK_NUM: ++ case MLX5HWS_DEFINER_FNAME_TCP_SEQ_NUM: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.outer_tcp_seq_num); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.outer_tcp_ack_num); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.inner_tcp_seq_num); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.inner_tcp_ack_num); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GTP_TEID: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_teid); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GTP_MSG_TYPE: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_msg_type); ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_msg_flags); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GTPU_FIRST_EXT_DW0: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.gtpu_first_ext_dw_0); ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_dw_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GTPU_DW2: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_dw_2); ++ break; ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_0: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_1: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_2: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_3: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_4: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_5: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_6: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_7: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_2.outer_first_mpls_over_gre); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_2.outer_first_mpls_over_udp); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.geneve_tlv_option_0_data); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_4.prog_sample_field_id_0); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_4.prog_sample_field_value_0); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_4.prog_sample_field_value_1); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_4.prog_sample_field_id_2); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_4.prog_sample_field_value_2); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_4.prog_sample_field_id_3); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_4.prog_sample_field_value_3); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VXLAN_VNI: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.vxlan_vni); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_FLAGS: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.outer_vxlan_gpe_flags); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_RSVD0: ++ break; ++ case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_PROTO: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.outer_vxlan_gpe_next_protocol); ++ break; ++ case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_VNI: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.outer_vxlan_gpe_vni); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GENEVE_OPT_LEN: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.geneve_opt_len); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GENEVE_OAM: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.geneve_oam); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GENEVE_PROTO: ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.geneve_protocol_type); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GENEVE_VNI: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.geneve_vni); ++ break; ++ case MLX5HWS_DEFINER_FNAME_SOURCE_QP: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.source_sqn); ++ break; ++ case MLX5HWS_DEFINER_FNAME_SOURCE_GVMI: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.source_port); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.source_eswitch_owner_vhca_id); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_0: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_1: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_1); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_2: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_2); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_3: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_3); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_4: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_4); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_5: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_5); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_7: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_7); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_A: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_a); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GRE_C: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_c_present); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GRE_K: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_k_present); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GRE_S: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_s_present); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GRE_PROTOCOL: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_protocol); ++ break; ++ case MLX5HWS_DEFINER_FNAME_GRE_OPT_KEY: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_key.key); ++ break; ++ case MLX5HWS_DEFINER_FNAME_ICMP_DW1: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmp_header_data); ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmp_type); ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmp_code); ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters_3.icmpv6_header_data); ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmpv6_type); ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmpv6_code); ++ break; ++ case MLX5HWS_DEFINER_FNAME_MPLS0_O: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.outer_first_mpls); ++ break; ++ case MLX5HWS_DEFINER_FNAME_MPLS0_I: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.inner_first_mpls); ++ break; ++ case MLX5HWS_DEFINER_FNAME_TNL_HDR_0: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_0); ++ break; ++ case MLX5HWS_DEFINER_FNAME_TNL_HDR_1: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_1); ++ break; ++ case MLX5HWS_DEFINER_FNAME_TNL_HDR_2: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_2); ++ break; ++ case MLX5HWS_DEFINER_FNAME_TNL_HDR_3: ++ HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_3); ++ break; ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER0_OK: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER1_OK: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER2_OK: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER3_OK: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER4_OK: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER5_OK: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER6_OK: ++ case MLX5HWS_DEFINER_FNAME_FLEX_PARSER7_OK: ++ /* assuming this is flex parser for geneve option */ ++ if ((fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER0_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 0) || ++ (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER1_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 1) || ++ (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER2_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 2) || ++ (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER3_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 3) || ++ (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER4_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 4) || ++ (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER5_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 5) || ++ (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER6_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 6) || ++ (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER7_OK && ++ ctx->caps->flex_parser_id_geneve_tlv_option_0 != 7)) { ++ mlx5hws_err(ctx, ++ "Complex params: unsupported field %s (%d), flex parser ID for geneve is %d\n", ++ mlx5hws_definer_fname_to_str(fname), fname, ++ caps->flex_parser_id_geneve_tlv_option_0); ++ break; ++ } ++ HWS_CLEAR_MATCH_PARAM(mask, ++ misc_parameters.geneve_tlv_option_0_exist); ++ break; ++ case MLX5HWS_DEFINER_FNAME_REG_6: ++ default: ++ mlx5hws_err(ctx, "Complex params: unsupported field %s (%d)\n", ++ mlx5hws_definer_fname_to_str(fname), fname); ++ break; ++ } ++} ++ ++static bool ++hws_bwc_matcher_complex_params_comb_is_valid(struct mlx5hws_definer_fc *fc, ++ int fc_sz, ++ u32 combination_num) ++{ ++ bool m1[MLX5HWS_DEFINER_FNAME_MAX] = {0}; ++ bool m2[MLX5HWS_DEFINER_FNAME_MAX] = {0}; ++ bool is_first_matcher; ++ int i; ++ ++ for (i = 0; i < fc_sz; i++) { ++ is_first_matcher = !(combination_num & BIT(i)); ++ if (is_first_matcher) ++ m1[fc[i].fname] = true; ++ else ++ m2[fc[i].fname] = true; ++ } ++ ++ /* Not all the fields can be split into separate matchers. ++ * Some should be together on the same matcher. ++ * For example, IPv6 parts - the whole IPv6 address should be on the ++ * same matcher in order for us to deduce if it's IPv6 or IPv4 address. ++ */ ++ if (m1[MLX5HWS_DEFINER_FNAME_IP_FRAG_O] && ++ (m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_O] || ++ m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_O] || ++ m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_O] || ++ m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_O])) ++ return false; ++ ++ if (m2[MLX5HWS_DEFINER_FNAME_IP_FRAG_O] && ++ (m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_O] || ++ m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_O] || ++ m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_O] || ++ m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_O])) ++ return false; ++ ++ if (m1[MLX5HWS_DEFINER_FNAME_IP_FRAG_I] && ++ (m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_I] || ++ m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_I] || ++ m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_I] || ++ m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_I])) ++ return false; ++ ++ if (m2[MLX5HWS_DEFINER_FNAME_IP_FRAG_I] && ++ (m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_I] || ++ m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_I] || ++ m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_I] || ++ m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_I])) ++ return false; ++ ++ /* Don't split outer IPv6 dest address. */ ++ if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_O] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_O] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_O] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_O]) && ++ (m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_O] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_O] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_O] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_O])) ++ return false; ++ ++ /* Don't split outer IPv6 source address. */ ++ if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_O] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_O] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_O] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_O]) && ++ (m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_O] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_O] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_O] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_O])) ++ return false; ++ ++ /* Don't split inner IPv6 dest address. */ ++ if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_I] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_I] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_I] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_I]) && ++ (m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_I] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_I] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_I] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_I])) ++ return false; ++ ++ /* Don't split inner IPv6 source address. */ ++ if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_I] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_I] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_I] || ++ m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_I]) && ++ (m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_I] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_I] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_I] || ++ m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_I])) ++ return false; ++ ++ /* Don't split GRE parameters. */ ++ if ((m1[MLX5HWS_DEFINER_FNAME_GRE_C] || ++ m1[MLX5HWS_DEFINER_FNAME_GRE_K] || ++ m1[MLX5HWS_DEFINER_FNAME_GRE_S] || ++ m1[MLX5HWS_DEFINER_FNAME_GRE_PROTOCOL]) && ++ (m2[MLX5HWS_DEFINER_FNAME_GRE_C] || ++ m2[MLX5HWS_DEFINER_FNAME_GRE_K] || ++ m2[MLX5HWS_DEFINER_FNAME_GRE_S] || ++ m2[MLX5HWS_DEFINER_FNAME_GRE_PROTOCOL])) ++ return false; ++ ++ /* Don't split TCP ack/seq numbers. */ ++ if ((m1[MLX5HWS_DEFINER_FNAME_TCP_ACK_NUM] || ++ m1[MLX5HWS_DEFINER_FNAME_TCP_SEQ_NUM]) && ++ (m2[MLX5HWS_DEFINER_FNAME_TCP_ACK_NUM] || ++ m2[MLX5HWS_DEFINER_FNAME_TCP_SEQ_NUM])) ++ return false; ++ ++ /* Don't split flex parser. */ ++ if ((m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_0] || ++ m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_1] || ++ m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_2] || ++ m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_3] || ++ m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_4] || ++ m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_5] || ++ m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_6] || ++ m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_7]) && ++ (m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_0] || ++ m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_1] || ++ m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_2] || ++ m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_3] || ++ m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_4] || ++ m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_5] || ++ m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_6] || ++ m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_7])) ++ return false; ++ ++ return true; ++} ++ ++static void ++hws_bwc_matcher_complex_params_comb_create(struct mlx5hws_context *ctx, ++ struct mlx5hws_match_parameters *m, ++ struct mlx5hws_match_parameters *m1, ++ struct mlx5hws_match_parameters *m2, ++ struct mlx5hws_definer_fc *fc, ++ int fc_sz, ++ u32 combination_num) ++{ ++ bool is_first_matcher; ++ int i; ++ ++ memcpy(m1->match_buf, m->match_buf, m->match_sz); ++ memcpy(m2->match_buf, m->match_buf, m->match_sz); ++ ++ for (i = 0; i < fc_sz; i++) { ++ is_first_matcher = !(combination_num & BIT(i)); ++ hws_bwc_matcher_complex_params_clear_fld(ctx, ++ fc[i].fname, ++ is_first_matcher ? ++ m2 : m1); ++ } ++ ++ MLX5_SET(fte_match_param, m2->match_buf, ++ misc_parameters_2.metadata_reg_c_6, -1); ++} ++ ++static void ++hws_bwc_matcher_complex_params_destroy(struct mlx5hws_match_parameters *mask_1, ++ struct mlx5hws_match_parameters *mask_2) ++{ ++ kfree(mask_1->match_buf); ++ kfree(mask_2->match_buf); ++} ++ ++static int ++hws_bwc_matcher_complex_params_create(struct mlx5hws_context *ctx, ++ u8 match_criteria, ++ struct mlx5hws_match_parameters *mask, ++ struct mlx5hws_match_parameters *mask_1, ++ struct mlx5hws_match_parameters *mask_2) ++{ ++ struct mlx5hws_definer_fc *fc; ++ u32 num_of_combinations; ++ int fc_sz = 0; ++ int res = 0; ++ u32 i; ++ ++ if (MLX5_GET(fte_match_param, mask->match_buf, ++ misc_parameters_2.metadata_reg_c_6)) { ++ mlx5hws_err(ctx, "Complex matcher: REG_C_6 matching is reserved\n"); ++ res = -EINVAL; ++ goto out; ++ } ++ ++ mask_1->match_buf = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), ++ GFP_KERNEL); ++ mask_2->match_buf = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), ++ GFP_KERNEL); ++ if (!mask_1->match_buf || !mask_2->match_buf) { ++ mlx5hws_err(ctx, "Complex matcher: failed to allocate match_param\n"); ++ res = -ENOMEM; ++ goto free_params; ++ } ++ ++ mask_1->match_sz = mask->match_sz; ++ mask_2->match_sz = mask->match_sz; ++ ++ fc = mlx5hws_definer_conv_match_params_to_compressed_fc(ctx, ++ match_criteria, ++ mask->match_buf, ++ &fc_sz); ++ if (!fc) { ++ res = -ENOMEM; ++ goto free_params; ++ } ++ ++ if (fc_sz >= sizeof(num_of_combinations) * BITS_PER_BYTE) { ++ mlx5hws_err(ctx, ++ "Complex matcher: too many match parameters (%d)\n", ++ fc_sz); ++ res = -EINVAL; ++ goto free_fc; ++ } ++ ++ /* We have list of all the match fields from the match parameter. ++ * Now try all the possibilities of splitting them into two match ++ * buffers and look for the supported combination. ++ */ ++ num_of_combinations = 1 << fc_sz; ++ ++ /* Start from combination at index 1 - we know that 0 is unsupported */ ++ for (i = 1; i < num_of_combinations; i++) { ++ if (!hws_bwc_matcher_complex_params_comb_is_valid(fc, fc_sz, i)) ++ continue; ++ ++ hws_bwc_matcher_complex_params_comb_create(ctx, ++ mask, mask_1, mask_2, ++ fc, fc_sz, i); ++ /* We now have two separate sets of match params. ++ * Check if each of them can be used in its own matcher. ++ */ ++ if (!mlx5hws_bwc_match_params_is_complex(ctx, ++ match_criteria, ++ mask_1) && ++ !mlx5hws_bwc_match_params_is_complex(ctx, ++ match_criteria, ++ mask_2)) ++ break; ++ } ++ ++ if (i == num_of_combinations) { ++ /* We've scanned all the combinations, but to no avail */ ++ mlx5hws_err(ctx, "Complex matcher: couldn't find match params combination\n"); ++ res = -EINVAL; ++ goto free_fc; ++ } ++ ++ kfree(fc); ++ return 0; ++ ++free_fc: ++ kfree(fc); ++free_params: ++ hws_bwc_matcher_complex_params_destroy(mask_1, mask_2); ++out: ++ return res; ++} ++ ++static int ++hws_bwc_isolated_table_create(struct mlx5hws_bwc_matcher *bwc_matcher, ++ struct mlx5hws_table *table) ++{ ++ struct mlx5hws_cmd_ft_modify_attr ft_attr = {0}; ++ struct mlx5hws_context *ctx = table->ctx; ++ struct mlx5hws_table_attr tbl_attr = {0}; ++ struct mlx5hws_table *isolated_tbl; ++ int ret = 0; ++ ++ tbl_attr.type = table->type; ++ tbl_attr.level = table->level; ++ ++ bwc_matcher->complex->isolated_tbl = ++ mlx5hws_table_create(ctx, &tbl_attr); ++ isolated_tbl = bwc_matcher->complex->isolated_tbl; ++ if (!isolated_tbl) ++ return -EINVAL; ++ ++ /* Set the default miss of the isolated table to ++ * point to the end anchor of the original matcher. ++ */ ++ mlx5hws_cmd_set_attr_connect_miss_tbl(ctx, ++ isolated_tbl->fw_ft_type, ++ isolated_tbl->type, ++ &ft_attr); ++ ft_attr.table_miss_id = bwc_matcher->matcher->end_ft_id; ++ ++ ret = mlx5hws_cmd_flow_table_modify(ctx->mdev, ++ &ft_attr, ++ isolated_tbl->ft_id); ++ if (ret) { ++ mlx5hws_err(ctx, "Failed setting isolated tbl default miss\n"); ++ goto destroy_tbl; ++ } ++ ++ return 0; ++ ++destroy_tbl: ++ mlx5hws_table_destroy(isolated_tbl); ++ return ret; ++} ++ ++static void hws_bwc_isolated_table_destroy(struct mlx5hws_table *isolated_tbl) ++{ ++ /* This table is isolated - no table is pointing to it, no need to ++ * disconnect it from anywhere, it won't affect any other table's miss. ++ */ ++ mlx5hws_table_destroy(isolated_tbl); ++} ++ ++static int ++hws_bwc_isolated_matcher_create(struct mlx5hws_bwc_matcher *bwc_matcher, ++ struct mlx5hws_table *table, ++ u8 match_criteria_enable, ++ struct mlx5hws_match_parameters *mask) ++{ ++ struct mlx5hws_table *isolated_tbl = bwc_matcher->complex->isolated_tbl; ++ struct mlx5hws_bwc_matcher *isolated_bwc_matcher; ++ struct mlx5hws_context *ctx = table->ctx; ++ int ret; ++ ++ isolated_bwc_matcher = kzalloc(sizeof(*bwc_matcher), GFP_KERNEL); ++ if (!isolated_bwc_matcher) ++ return -ENOMEM; ++ ++ bwc_matcher->complex->isolated_bwc_matcher = isolated_bwc_matcher; ++ ++ /* Isolated BWC matcher needs access to the first BWC matcher */ ++ isolated_bwc_matcher->complex_first_bwc_matcher = bwc_matcher; ++ ++ /* Isolated matcher needs to match on REG_C_6, ++ * so make sure its criteria bit is on. ++ */ ++ match_criteria_enable |= MLX5HWS_DEFINER_MATCH_CRITERIA_MISC2; ++ ++ ret = mlx5hws_bwc_matcher_create_simple(isolated_bwc_matcher, ++ isolated_tbl, ++ 0, ++ match_criteria_enable, ++ mask, ++ NULL); ++ if (ret) { ++ mlx5hws_err(ctx, "Complex matcher: failed creating isolated BWC matcher\n"); ++ goto free_matcher; ++ } ++ ++ return 0; ++ ++free_matcher: ++ kfree(bwc_matcher->complex->isolated_bwc_matcher); ++ return ret; ++} ++ ++static void ++hws_bwc_isolated_matcher_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); ++ kfree(bwc_matcher); ++} ++ ++static int ++hws_bwc_isolated_actions_create(struct mlx5hws_bwc_matcher *bwc_matcher, ++ struct mlx5hws_table *table) ++{ ++ struct mlx5hws_table *isolated_tbl = bwc_matcher->complex->isolated_tbl; ++ u8 modify_hdr_action[MLX5_ST_SZ_BYTES(set_action_in)] = {0}; ++ struct mlx5hws_context *ctx = table->ctx; ++ struct mlx5hws_action_mh_pattern ptrn; ++ int ret = 0; ++ ++ /* Create action to jump to isolated table */ ++ ++ bwc_matcher->complex->action_go_to_tbl = ++ mlx5hws_action_create_dest_table(ctx, ++ isolated_tbl, ++ MLX5HWS_ACTION_FLAG_HWS_FDB); ++ if (!bwc_matcher->complex->action_go_to_tbl) { ++ mlx5hws_err(ctx, "Complex matcher: failed to create go-to-tbl action\n"); ++ return -EINVAL; ++ } ++ ++ /* Create modify header action to set REG_C_6 */ ++ ++ MLX5_SET(set_action_in, modify_hdr_action, ++ action_type, MLX5_MODIFICATION_TYPE_SET); ++ MLX5_SET(set_action_in, modify_hdr_action, ++ field, MLX5_MODI_META_REG_C_6); ++ MLX5_SET(set_action_in, modify_hdr_action, ++ length, 0); /* zero means length of 32 */ ++ MLX5_SET(set_action_in, modify_hdr_action, offset, 0); ++ MLX5_SET(set_action_in, modify_hdr_action, data, 0); ++ ++ ptrn.data = (void *)modify_hdr_action; ++ ptrn.sz = MLX5HWS_ACTION_DOUBLE_SIZE; ++ ++ bwc_matcher->complex->action_metadata = ++ mlx5hws_action_create_modify_header(ctx, 1, &ptrn, 0, ++ MLX5HWS_ACTION_FLAG_HWS_FDB); ++ if (!bwc_matcher->complex->action_metadata) { ++ ret = -EINVAL; ++ goto destroy_action_go_to_tbl; ++ } ++ ++ /* Create last action */ ++ ++ bwc_matcher->complex->action_last = ++ mlx5hws_action_create_last(ctx, MLX5HWS_ACTION_FLAG_HWS_FDB); ++ if (!bwc_matcher->complex->action_last) { ++ mlx5hws_err(ctx, "Complex matcher: failed to create last action\n"); ++ ret = -EINVAL; ++ goto destroy_action_metadata; ++ } ++ ++ return 0; ++ ++destroy_action_metadata: ++ mlx5hws_action_destroy(bwc_matcher->complex->action_metadata); ++destroy_action_go_to_tbl: ++ mlx5hws_action_destroy(bwc_matcher->complex->action_go_to_tbl); ++ return ret; ++} ++ ++static void ++hws_bwc_isolated_actions_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ mlx5hws_action_destroy(bwc_matcher->complex->action_last); ++ mlx5hws_action_destroy(bwc_matcher->complex->action_metadata); ++ mlx5hws_action_destroy(bwc_matcher->complex->action_go_to_tbl); ++} ++ + int mlx5hws_bwc_matcher_create_complex(struct mlx5hws_bwc_matcher *bwc_matcher, + struct mlx5hws_table *table, + u32 priority, + u8 match_criteria_enable, + struct mlx5hws_match_parameters *mask) + { +- mlx5hws_err(table->ctx, "Complex matcher is not supported yet\n"); +- return -EOPNOTSUPP; ++ enum mlx5hws_action_type complex_init_action_types[3]; ++ struct mlx5hws_bwc_matcher *isolated_bwc_matcher; ++ struct mlx5hws_match_parameters mask_1 = {0}; ++ struct mlx5hws_match_parameters mask_2 = {0}; ++ struct mlx5hws_context *ctx = table->ctx; ++ int ret; ++ ++ ret = hws_bwc_matcher_complex_params_create(table->ctx, ++ match_criteria_enable, ++ mask, &mask_1, &mask_2); ++ if (ret) ++ goto err; ++ ++ bwc_matcher->complex = ++ kzalloc(sizeof(*bwc_matcher->complex), GFP_KERNEL); ++ if (!bwc_matcher->complex) { ++ ret = -ENOMEM; ++ goto free_masks; ++ } ++ ++ ret = rhashtable_init(&bwc_matcher->complex->refcount_hash, ++ &hws_refcount_hash); ++ if (ret) { ++ mlx5hws_err(ctx, "Complex matcher: failed to initialize rhashtable\n"); ++ goto free_complex; ++ } ++ ++ mutex_init(&bwc_matcher->complex->hash_lock); ++ ida_init(&bwc_matcher->complex->metadata_ida); ++ ++ /* Create initial action template for the first matcher. ++ * Usually the initial AT is just dummy, but in case of complex ++ * matcher we know exactly which actions should it have. ++ */ ++ ++ complex_init_action_types[0] = MLX5HWS_ACTION_TYP_MODIFY_HDR; ++ complex_init_action_types[1] = MLX5HWS_ACTION_TYP_TBL; ++ complex_init_action_types[2] = MLX5HWS_ACTION_TYP_LAST; ++ ++ /* Create the first matcher */ ++ ++ ret = mlx5hws_bwc_matcher_create_simple(bwc_matcher, ++ table, ++ priority, ++ match_criteria_enable, ++ &mask_1, ++ complex_init_action_types); ++ if (ret) ++ goto destroy_ida; ++ ++ /* Create isolated table to hold the second isolated matcher */ ++ ++ ret = hws_bwc_isolated_table_create(bwc_matcher, table); ++ if (ret) { ++ mlx5hws_err(ctx, "Complex matcher: failed creating isolated table\n"); ++ goto destroy_first_matcher; ++ } ++ ++ /* Now create the second BWC matcher - the isolated one */ ++ ++ ret = hws_bwc_isolated_matcher_create(bwc_matcher, table, ++ match_criteria_enable, &mask_2); ++ if (ret) { ++ mlx5hws_err(ctx, "Complex matcher: failed creating isolated matcher\n"); ++ goto destroy_isolated_tbl; ++ } ++ ++ /* Create action for isolated matcher's rules */ ++ ++ ret = hws_bwc_isolated_actions_create(bwc_matcher, table); ++ if (ret) { ++ mlx5hws_err(ctx, "Complex matcher: failed creating isolated actions\n"); ++ goto destroy_isolated_matcher; ++ } ++ ++ hws_bwc_matcher_complex_params_destroy(&mask_1, &mask_2); ++ return 0; ++ ++destroy_isolated_matcher: ++ isolated_bwc_matcher = bwc_matcher->complex->isolated_bwc_matcher; ++ hws_bwc_isolated_matcher_destroy(isolated_bwc_matcher); ++destroy_isolated_tbl: ++ hws_bwc_isolated_table_destroy(bwc_matcher->complex->isolated_tbl); ++destroy_first_matcher: ++ mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); ++destroy_ida: ++ ida_destroy(&bwc_matcher->complex->metadata_ida); ++ mutex_destroy(&bwc_matcher->complex->hash_lock); ++ rhashtable_destroy(&bwc_matcher->complex->refcount_hash); ++free_complex: ++ kfree(bwc_matcher->complex); ++ bwc_matcher->complex = NULL; ++free_masks: ++ hws_bwc_matcher_complex_params_destroy(&mask_1, &mask_2); ++err: ++ return ret; + } + + void + mlx5hws_bwc_matcher_destroy_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + { +- /* nothing to do here */ ++ struct mlx5hws_bwc_matcher *isolated_bwc_matcher = ++ bwc_matcher->complex->isolated_bwc_matcher; ++ ++ hws_bwc_isolated_actions_destroy(bwc_matcher); ++ hws_bwc_isolated_matcher_destroy(isolated_bwc_matcher); ++ hws_bwc_isolated_table_destroy(bwc_matcher->complex->isolated_tbl); ++ mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); ++ ida_destroy(&bwc_matcher->complex->metadata_ida); ++ mutex_destroy(&bwc_matcher->complex->hash_lock); ++ rhashtable_destroy(&bwc_matcher->complex->refcount_hash); ++ kfree(bwc_matcher->complex); ++ bwc_matcher->complex = NULL; ++} ++ ++static void ++hws_bwc_matcher_complex_hash_lock(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ mutex_lock(&bwc_matcher->complex->hash_lock); ++} ++ ++static void ++hws_bwc_matcher_complex_hash_unlock(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ mutex_unlock(&bwc_matcher->complex->hash_lock); ++} ++ ++static int ++hws_bwc_rule_complex_hash_node_get(struct mlx5hws_bwc_rule *bwc_rule, ++ struct mlx5hws_match_parameters *params) ++{ ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_bwc_complex_rule_hash_node *node, *old_node; ++ struct rhashtable *refcount_hash; ++ int i; ++ ++ bwc_rule->complex_hash_node = NULL; ++ ++ node = kzalloc(sizeof(*node), GFP_KERNEL); ++ if (unlikely(!node)) ++ return -ENOMEM; ++ ++ node->tag = ida_alloc(&bwc_matcher->complex->metadata_ida, GFP_KERNEL); ++ refcount_set(&node->refcount, 1); ++ ++ /* Clear match buffer - turn off all the unrelated fields ++ * in accordance with the match params mask for the first ++ * matcher out of the two parts of the complex matcher. ++ * The resulting mask is the key for the hash. ++ */ ++ for (i = 0; i < MLX5_ST_SZ_DW_MATCH_PARAM; i++) ++ node->match_buf[i] = params->match_buf[i] & ++ bwc_matcher->mt->match_param[i]; ++ ++ refcount_hash = &bwc_matcher->complex->refcount_hash; ++ old_node = rhashtable_lookup_get_insert_fast(refcount_hash, ++ &node->hash_node, ++ hws_refcount_hash); ++ if (old_node) { ++ /* Rule with the same tag already exists - update refcount */ ++ refcount_inc(&old_node->refcount); ++ /* Let the new rule use the same tag as the existing rule. ++ * Note that we don't have any indication for the rule creation ++ * process that a rule with similar matching params already ++ * exists - no harm done when this rule is be overwritten by ++ * the same STE. ++ * There's some performance advantage in skipping such cases, ++ * so this is left for future optimizations. ++ */ ++ ida_free(&bwc_matcher->complex->metadata_ida, node->tag); ++ kfree(node); ++ node = old_node; ++ } ++ ++ bwc_rule->complex_hash_node = node; ++ return 0; ++} ++ ++static void ++hws_bwc_rule_complex_hash_node_put(struct mlx5hws_bwc_rule *bwc_rule, ++ bool *is_last_rule) ++{ ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_bwc_complex_rule_hash_node *node; ++ ++ if (is_last_rule) ++ *is_last_rule = false; ++ ++ node = bwc_rule->complex_hash_node; ++ if (refcount_dec_and_test(&node->refcount)) { ++ rhashtable_remove_fast(&bwc_matcher->complex->refcount_hash, ++ &node->hash_node, ++ hws_refcount_hash); ++ ida_free(&bwc_matcher->complex->metadata_ida, node->tag); ++ kfree(node); ++ if (is_last_rule) ++ *is_last_rule = true; ++ } ++ ++ bwc_rule->complex_hash_node = NULL; + } + + int mlx5hws_bwc_rule_create_complex(struct mlx5hws_bwc_rule *bwc_rule, +@@ -70,19 +1144,271 @@ int mlx5hws_bwc_rule_create_complex(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_rule_action rule_actions[], + u16 bwc_queue_idx) + { +- mlx5hws_err(bwc_rule->bwc_matcher->matcher->tbl->ctx, +- "Complex rule is not supported yet\n"); +- return -EOPNOTSUPP; ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ u8 modify_hdr_action[MLX5_ST_SZ_BYTES(set_action_in)] = {0}; ++ struct mlx5hws_rule_action rule_actions_1[3] = {0}; ++ struct mlx5hws_bwc_matcher *isolated_bwc_matcher; ++ u32 *match_buf_2; ++ u32 metadata_val; ++ int ret = 0; ++ ++ isolated_bwc_matcher = bwc_matcher->complex->isolated_bwc_matcher; ++ bwc_rule->isolated_bwc_rule = ++ mlx5hws_bwc_rule_alloc(isolated_bwc_matcher); ++ if (unlikely(!bwc_rule->isolated_bwc_rule)) ++ return -ENOMEM; ++ ++ hws_bwc_matcher_complex_hash_lock(bwc_matcher); ++ ++ /* Get a new hash node for this complex rule. ++ * If this is a unique set of match params for the first matcher, ++ * we will get a new hash node with newly allocated IDA. ++ * Otherwise we will get an existing node with IDA and updated refcount. ++ */ ++ ret = hws_bwc_rule_complex_hash_node_get(bwc_rule, params); ++ if (unlikely(ret)) { ++ mlx5hws_err(ctx, "Complex rule: failed getting RHT node for this rule\n"); ++ goto free_isolated_rule; ++ } ++ ++ /* No need to clear match buffer's fields in accordance to what ++ * will actually be matched on first and second matchers. ++ * Both matchers were created with the appropriate masks ++ * and each of them holds the appropriate field copy array, ++ * so rule creation will use only the fields that will be copied ++ * in accordance with setters in field copy array. ++ * We do, however, need to temporary allocate match buffer ++ * for the second (isolated) rule in order to not modify ++ * user's match params buffer. ++ */ ++ ++ match_buf_2 = kmemdup(params->match_buf, ++ MLX5_ST_SZ_BYTES(fte_match_param), ++ GFP_KERNEL); ++ if (unlikely(!match_buf_2)) { ++ mlx5hws_err(ctx, "Complex rule: failed allocating match_buf\n"); ++ ret = ENOMEM; ++ goto hash_node_put; ++ } ++ ++ /* On 2nd matcher, use unique 32-bit ID as a matching tag */ ++ metadata_val = bwc_rule->complex_hash_node->tag; ++ MLX5_SET(fte_match_param, match_buf_2, ++ misc_parameters_2.metadata_reg_c_6, metadata_val); ++ ++ /* Isolated rule's rule_actions contain all the original actions */ ++ ret = mlx5hws_bwc_rule_create_simple(bwc_rule->isolated_bwc_rule, ++ match_buf_2, ++ rule_actions, ++ flow_source, ++ bwc_queue_idx); ++ kfree(match_buf_2); ++ if (unlikely(ret)) { ++ mlx5hws_err(ctx, ++ "Complex rule: failed creating isolated BWC rule (%d)\n", ++ ret); ++ goto hash_node_put; ++ } ++ ++ /* First rule's rule_actions contain setting metadata and ++ * jump to isolated table that contains the second matcher. ++ * Set metadata value to a unique value for this rule. ++ */ ++ ++ MLX5_SET(set_action_in, modify_hdr_action, ++ action_type, MLX5_MODIFICATION_TYPE_SET); ++ MLX5_SET(set_action_in, modify_hdr_action, ++ field, MLX5_MODI_META_REG_C_6); ++ MLX5_SET(set_action_in, modify_hdr_action, ++ length, 0); /* zero means length of 32 */ ++ MLX5_SET(set_action_in, modify_hdr_action, ++ offset, 0); ++ MLX5_SET(set_action_in, modify_hdr_action, ++ data, metadata_val); ++ ++ rule_actions_1[0].action = bwc_matcher->complex->action_metadata; ++ rule_actions_1[0].modify_header.offset = 0; ++ rule_actions_1[0].modify_header.data = modify_hdr_action; ++ ++ rule_actions_1[1].action = bwc_matcher->complex->action_go_to_tbl; ++ rule_actions_1[2].action = bwc_matcher->complex->action_last; ++ ++ ret = mlx5hws_bwc_rule_create_simple(bwc_rule, ++ params->match_buf, ++ rule_actions_1, ++ flow_source, ++ bwc_queue_idx); ++ ++ if (unlikely(ret)) { ++ mlx5hws_err(ctx, ++ "Complex rule: failed creating first BWC rule (%d)\n", ++ ret); ++ goto destroy_isolated_rule; ++ } ++ ++ hws_bwc_matcher_complex_hash_unlock(bwc_matcher); ++ ++ return 0; ++ ++destroy_isolated_rule: ++ mlx5hws_bwc_rule_destroy_simple(bwc_rule->isolated_bwc_rule); ++hash_node_put: ++ hws_bwc_rule_complex_hash_node_put(bwc_rule, NULL); ++free_isolated_rule: ++ hws_bwc_matcher_complex_hash_unlock(bwc_matcher); ++ mlx5hws_bwc_rule_free(bwc_rule->isolated_bwc_rule); ++ return ret; + } + + int mlx5hws_bwc_rule_destroy_complex(struct mlx5hws_bwc_rule *bwc_rule) + { +- return 0; ++ struct mlx5hws_context *ctx = bwc_rule->bwc_matcher->matcher->tbl->ctx; ++ struct mlx5hws_bwc_rule *isolated_bwc_rule; ++ int ret_isolated, ret; ++ bool is_last_rule; ++ ++ hws_bwc_matcher_complex_hash_lock(bwc_rule->bwc_matcher); ++ ++ hws_bwc_rule_complex_hash_node_put(bwc_rule, &is_last_rule); ++ bwc_rule->rule->skip_delete = !is_last_rule; ++ ++ ret = mlx5hws_bwc_rule_destroy_simple(bwc_rule); ++ if (unlikely(ret)) ++ mlx5hws_err(ctx, "BWC complex rule: failed destroying first rule\n"); ++ ++ isolated_bwc_rule = bwc_rule->isolated_bwc_rule; ++ ret_isolated = mlx5hws_bwc_rule_destroy_simple(isolated_bwc_rule); ++ if (unlikely(ret_isolated)) ++ mlx5hws_err(ctx, "BWC complex rule: failed destroying second (isolated) rule\n"); ++ ++ hws_bwc_matcher_complex_hash_unlock(bwc_rule->bwc_matcher); ++ ++ mlx5hws_bwc_rule_free(isolated_bwc_rule); ++ ++ return ret || ret_isolated; ++} ++ ++static void ++hws_bwc_matcher_clear_hash_rtcs(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ struct mlx5hws_bwc_complex_rule_hash_node *node; ++ struct rhashtable_iter iter; ++ ++ rhashtable_walk_enter(&bwc_matcher->complex->refcount_hash, &iter); ++ rhashtable_walk_start(&iter); ++ ++ while ((node = rhashtable_walk_next(&iter)) != NULL) { ++ if (IS_ERR(node)) ++ continue; ++ node->rtc_valid = false; ++ } ++ ++ rhashtable_walk_stop(&iter); ++ rhashtable_walk_exit(&iter); + } + +-int mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) ++int ++mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + { +- mlx5hws_err(bwc_matcher->matcher->tbl->ctx, +- "Moving complex rule is not supported yet\n"); +- return -EOPNOTSUPP; ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mlx5hws_matcher *matcher = bwc_matcher->matcher; ++ bool move_error = false, poll_error = false; ++ u16 bwc_queues = mlx5hws_bwc_queues(ctx); ++ struct mlx5hws_bwc_rule *tmp_bwc_rule; ++ struct mlx5hws_rule_attr rule_attr; ++ struct mlx5hws_table *isolated_tbl; ++ struct mlx5hws_rule *tmp_rule; ++ struct list_head *rules_list; ++ u32 expected_completions = 1; ++ u32 end_ft_id; ++ int i, ret; ++ ++ /* We are rehashing the matcher that is the first part of the complex ++ * matcher. Need to update the isolated matcher to point to the end_ft ++ * of this new matcher. This needs to be done before moving any rules ++ * to prevent possible steering loops. ++ */ ++ isolated_tbl = bwc_matcher->complex->isolated_tbl; ++ end_ft_id = bwc_matcher->matcher->resize_dst->end_ft_id; ++ ret = mlx5hws_matcher_update_end_ft_isolated(isolated_tbl, end_ft_id); ++ if (ret) { ++ mlx5hws_err(ctx, ++ "Failed updating end_ft of isolated matcher (%d)\n", ++ ret); ++ return ret; ++ } ++ ++ hws_bwc_matcher_clear_hash_rtcs(bwc_matcher); ++ ++ mlx5hws_bwc_rule_fill_attr(bwc_matcher, 0, 0, &rule_attr); ++ ++ for (i = 0; i < bwc_queues; i++) { ++ rules_list = &bwc_matcher->rules[i]; ++ if (list_empty(rules_list)) ++ continue; ++ ++ rule_attr.queue_id = mlx5hws_bwc_get_queue_id(ctx, i); ++ ++ list_for_each_entry(tmp_bwc_rule, rules_list, list_node) { ++ /* Check if a rule with similar tag has already ++ * been moved. ++ */ ++ if (tmp_bwc_rule->complex_hash_node->rtc_valid) { ++ /* This rule is a duplicate of rule with similar ++ * tag that has already been moved earlier. ++ * Just update this rule's RTCs. ++ */ ++ tmp_bwc_rule->rule->rtc_0 = ++ tmp_bwc_rule->complex_hash_node->rtc_0; ++ tmp_bwc_rule->rule->rtc_1 = ++ tmp_bwc_rule->complex_hash_node->rtc_1; ++ tmp_bwc_rule->rule->matcher = ++ tmp_bwc_rule->rule->matcher->resize_dst; ++ continue; ++ } ++ ++ /* First time we're moving rule with this tag. ++ * Move it for real. ++ */ ++ tmp_rule = tmp_bwc_rule->rule; ++ tmp_rule->skip_delete = false; ++ ret = mlx5hws_matcher_resize_rule_move(matcher, ++ tmp_rule, ++ &rule_attr); ++ if (unlikely(ret && !move_error)) { ++ mlx5hws_err(ctx, ++ "Moving complex BWC rule failed (%d), attempting to move rest of the rules\n", ++ ret); ++ move_error = true; ++ } ++ ++ expected_completions = 1; ++ ret = mlx5hws_bwc_queue_poll(ctx, ++ rule_attr.queue_id, ++ &expected_completions, ++ true); ++ if (unlikely(ret && !poll_error)) { ++ mlx5hws_err(ctx, ++ "Moving complex BWC rule: poll failed (%d), attempting to move rest of the rules\n", ++ ret); ++ poll_error = true; ++ } ++ ++ /* Done moving the rule to the new matcher, ++ * now update RTCs for all the duplicated rules. ++ */ ++ tmp_bwc_rule->complex_hash_node->rtc_0 = ++ tmp_bwc_rule->rule->rtc_0; ++ tmp_bwc_rule->complex_hash_node->rtc_1 = ++ tmp_bwc_rule->rule->rtc_1; ++ ++ tmp_bwc_rule->complex_hash_node->rtc_valid = true; ++ } ++ } ++ ++ if (move_error || poll_error) ++ ret = -EINVAL; ++ ++ return ret; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h +index 340f0688e394..a6887c7e39d5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h +@@ -4,6 +4,27 @@ + #ifndef HWS_BWC_COMPLEX_H_ + #define HWS_BWC_COMPLEX_H_ + ++struct mlx5hws_bwc_complex_rule_hash_node { ++ u32 match_buf[MLX5_ST_SZ_DW_MATCH_PARAM]; ++ u32 tag; ++ refcount_t refcount; ++ bool rtc_valid; ++ u32 rtc_0; ++ u32 rtc_1; ++ struct rhash_head hash_node; ++}; ++ ++struct mlx5hws_bwc_matcher_complex_data { ++ struct mlx5hws_table *isolated_tbl; ++ struct mlx5hws_bwc_matcher *isolated_bwc_matcher; ++ struct mlx5hws_action *action_metadata; ++ struct mlx5hws_action *action_go_to_tbl; ++ struct mlx5hws_action *action_last; ++ struct rhashtable refcount_hash; ++ struct mutex hash_lock; /* Protect the refcount rhashtable */ ++ struct ida metadata_ida; ++}; ++ + bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, + u8 match_criteria_enable, + struct mlx5hws_match_parameters *mask); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1370-net-mlx5-hws-force-rehash-when-rule-insertion-failed.patch b/SOURCES/1370-net-mlx5-hws-force-rehash-when-rule-insertion-failed.patch new file mode 100644 index 000000000..166cd18eb --- /dev/null +++ b/SOURCES/1370-net-mlx5-hws-force-rehash-when-rule-insertion-failed.patch @@ -0,0 +1,93 @@ +From bf97c3c319b7e24f5e432eae209ee48da86864e3 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:04 -0400 +Subject: [PATCH] net/mlx5: HWS, force rehash when rule insertion failed + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9d4024edce1063b616fa8bf7b2363290503cc322 +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:06 2025 +0300 + + net/mlx5: HWS, force rehash when rule insertion failed + + Rules are inserted into hash table in accordance with their hash index. + When a certain number of rules is reached, the table is rehashed: + a bigger new table is allocated and all the rules are moved there. + But sometimes a new rule can't be inserted into the hash table + because its index is full, even though the number of rules in the + table is well below the threshold. The hash function is not perfect, + so such cases are not rare. When that happens, we want to do the same + rehash, in order to increase the table size and lower the probability + for such cases. + + This patch fixes the usecase where rule insertion was failing, but + rehash couldn't be initiated due to low number of rules: it adds flag + that denotes that rehash is required, even if the number of rules in + the table is below the rehash threshold. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-7-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index d70db6948dbb..dce2605fc99b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -169,6 +169,7 @@ mlx5hws_bwc_matcher_create(struct mlx5hws_table *table, + return NULL; + + atomic_set(&bwc_matcher->num_of_rules, 0); ++ atomic_set(&bwc_matcher->rehash_required, false); + + /* Check if the required match params can be all matched + * in single STE, otherwise complex matcher is needed. +@@ -769,9 +770,9 @@ hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher) + + /* It is possible that other rule has already performed rehash. + * Need to check again if we really need rehash. +- * If the reason for rehash was size, but not any more - skip rehash. + */ +- if (!hws_bwc_matcher_rehash_size_needed(bwc_matcher, ++ if (!atomic_read(&bwc_matcher->rehash_required) && ++ !hws_bwc_matcher_rehash_size_needed(bwc_matcher, + atomic_read(&bwc_matcher->num_of_rules))) + return 0; + +@@ -782,6 +783,8 @@ hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher) + * - destroy the old matcher + */ + ++ atomic_set(&bwc_matcher->rehash_required, false); ++ + ret = hws_bwc_matcher_extend_size(bwc_matcher); + if (ret) + return ret; +@@ -875,6 +878,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + * Try rehash by size and insert rule again - last chance. + */ + ++ atomic_set(&bwc_matcher->rehash_required, true); + mutex_unlock(queue_lock); + + hws_bwc_lock_all_queues(ctx); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +index cf2b65146317..d21fc247a510 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +@@ -30,6 +30,7 @@ struct mlx5hws_bwc_matcher { + u8 size_log; + u32 priority; + atomic_t num_of_rules; ++ atomic_t rehash_required; + struct list_head *rules; + }; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1371-net-mlx5-hws-fix-counting-of-rules-in-the-matcher.patch b/SOURCES/1371-net-mlx5-hws-fix-counting-of-rules-in-the-matcher.patch new file mode 100644 index 000000000..3319e20cd --- /dev/null +++ b/SOURCES/1371-net-mlx5-hws-fix-counting-of-rules-in-the-matcher.patch @@ -0,0 +1,99 @@ +From cfa1d3ecb42da569e4f38ae06c426e91a98f92ea Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:05 -0400 +Subject: [PATCH] net/mlx5: HWS, fix counting of rules in the matcher + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 4c56b5cbc323a10ebb6595500fb78fd8a4762efd +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:07 2025 +0300 + + net/mlx5: HWS, fix counting of rules in the matcher + + Currently the counter that counts number of rules in a matcher is + increased only when rule insertion is completed. In a multi-threaded + usecase this can lead to a scenario that many rules can be in process + of insertion in the same matcher, while none of them has completed + the insertion and the rule counter is not updated. This results in + a rule insertion failure for many of them at first attempt, which + leads to all of them requiring rehash and requiring locking of all + the queue locks. + + This patch fixes the case by increasing the rule counter in the + beginning of insertion process and decreasing in case of any failure. + + Signed-off-by: Vlad Dogaru + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-8-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index dce2605fc99b..7d991a61eeb3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -341,16 +341,12 @@ static void hws_bwc_rule_list_add(struct mlx5hws_bwc_rule *bwc_rule, u16 idx) + { + struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; + +- atomic_inc(&bwc_matcher->num_of_rules); + bwc_rule->bwc_queue_idx = idx; + list_add(&bwc_rule->list_node, &bwc_matcher->rules[idx]); + } + + static void hws_bwc_rule_list_remove(struct mlx5hws_bwc_rule *bwc_rule) + { +- struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; +- +- atomic_dec(&bwc_matcher->num_of_rules); + list_del_init(&bwc_rule->list_node); + } + +@@ -404,6 +400,7 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule) + mutex_lock(queue_lock); + + ret = hws_bwc_rule_destroy_hws_sync(bwc_rule, &attr); ++ atomic_dec(&bwc_matcher->num_of_rules); + hws_bwc_rule_list_remove(bwc_rule); + + mutex_unlock(queue_lock); +@@ -840,7 +837,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + } + + /* check if number of rules require rehash */ +- num_of_rules = atomic_read(&bwc_matcher->num_of_rules); ++ num_of_rules = atomic_inc_return(&bwc_matcher->num_of_rules); + + if (unlikely(hws_bwc_matcher_rehash_size_needed(bwc_matcher, num_of_rules))) { + mutex_unlock(queue_lock); +@@ -854,6 +851,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + bwc_matcher->size_log - MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP, + bwc_matcher->size_log, + ret); ++ atomic_dec(&bwc_matcher->num_of_rules); + return ret; + } + +@@ -887,6 +885,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + + if (ret) { + mlx5hws_err(ctx, "BWC rule insertion: rehash failed (%d)\n", ret); ++ atomic_dec(&bwc_matcher->num_of_rules); + return ret; + } + +@@ -902,6 +901,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + if (unlikely(ret)) { + mutex_unlock(queue_lock); + mlx5hws_err(ctx, "BWC rule insertion failed (%d)\n", ret); ++ atomic_dec(&bwc_matcher->num_of_rules); + return ret; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1372-net-mlx5-hws-fix-redundant-extension-of-action-templates.patch b/SOURCES/1372-net-mlx5-hws-fix-redundant-extension-of-action-templates.patch new file mode 100644 index 000000000..e9382cd40 --- /dev/null +++ b/SOURCES/1372-net-mlx5-hws-fix-redundant-extension-of-action-templates.patch @@ -0,0 +1,171 @@ +From cb05a4cab576c3226584ec674529506113f984f5 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:05 -0400 +Subject: [PATCH] net/mlx5: HWS, fix redundant extension of action templates + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 041861b40f599311214a52075140db8be29fd27f +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:08 2025 +0300 + + net/mlx5: HWS, fix redundant extension of action templates + + When a rule is inserted into a matcher, we search for the suitable + action template. If such template is not found, action template array + is extended with the new template. However, when several threads are + performing this in parallel, there is a race - we can end up with + extending the action templates array with the same template. + + This patch is doing the following: + - refactor the code to find action template index in rule create and + update, have the common code in an auxiliary function + - after locking all the queues, check again if the action template + array still needs to be extended + + Signed-off-by: Vlad Dogaru + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-9-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 7d991a61eeb3..456fac895f5e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -789,6 +789,53 @@ hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher) + return hws_bwc_matcher_move(bwc_matcher); + } + ++static int hws_bwc_rule_get_at_idx(struct mlx5hws_bwc_rule *bwc_rule, ++ struct mlx5hws_rule_action rule_actions[], ++ u16 bwc_queue_idx) ++{ ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mutex *queue_lock; /* Protect the queue */ ++ int at_idx, ret; ++ ++ /* check if rehash needed due to missing action template */ ++ at_idx = hws_bwc_matcher_find_at(bwc_matcher, rule_actions); ++ if (likely(at_idx >= 0)) ++ return at_idx; ++ ++ /* we need to extend BWC matcher action templates array */ ++ queue_lock = hws_bwc_get_queue_lock(ctx, bwc_queue_idx); ++ mutex_unlock(queue_lock); ++ hws_bwc_lock_all_queues(ctx); ++ ++ /* check again - perhaps other thread already did extend_at */ ++ at_idx = hws_bwc_matcher_find_at(bwc_matcher, rule_actions); ++ if (at_idx >= 0) ++ goto out; ++ ++ ret = hws_bwc_matcher_extend_at(bwc_matcher, rule_actions); ++ if (unlikely(ret)) { ++ mlx5hws_err(ctx, "BWC rule: failed extending AT (%d)", ret); ++ at_idx = -EINVAL; ++ goto out; ++ } ++ ++ /* action templates array was extended, we need the last idx */ ++ at_idx = bwc_matcher->num_of_at - 1; ++ ret = mlx5hws_matcher_attach_at(bwc_matcher->matcher, ++ bwc_matcher->at[at_idx]); ++ if (unlikely(ret)) { ++ mlx5hws_err(ctx, "BWC rule: failed attaching new AT (%d)", ret); ++ at_idx = -EINVAL; ++ goto out; ++ } ++ ++out: ++ hws_bwc_unlock_all_queues(ctx); ++ mutex_lock(queue_lock); ++ return at_idx; ++} ++ + int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + u32 *match_param, + struct mlx5hws_rule_action rule_actions[], +@@ -809,31 +856,12 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + + mutex_lock(queue_lock); + +- /* check if rehash needed due to missing action template */ +- at_idx = hws_bwc_matcher_find_at(bwc_matcher, rule_actions); ++ at_idx = hws_bwc_rule_get_at_idx(bwc_rule, rule_actions, bwc_queue_idx); + if (unlikely(at_idx < 0)) { +- /* we need to extend BWC matcher action templates array */ + mutex_unlock(queue_lock); +- hws_bwc_lock_all_queues(ctx); +- +- ret = hws_bwc_matcher_extend_at(bwc_matcher, rule_actions); +- if (unlikely(ret)) { +- hws_bwc_unlock_all_queues(ctx); +- return ret; +- } +- +- /* action templates array was extended, we need the last idx */ +- at_idx = bwc_matcher->num_of_at - 1; +- +- ret = mlx5hws_matcher_attach_at(bwc_matcher->matcher, +- bwc_matcher->at[at_idx]); +- if (unlikely(ret)) { +- hws_bwc_unlock_all_queues(ctx); +- return ret; +- } +- +- hws_bwc_unlock_all_queues(ctx); +- mutex_lock(queue_lock); ++ mlx5hws_err(ctx, "BWC rule create: failed getting AT (%d)", ++ ret); ++ return -EINVAL; + } + + /* check if number of rules require rehash */ +@@ -971,36 +999,11 @@ hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + + mutex_lock(queue_lock); + +- /* check if rehash needed due to missing action template */ +- at_idx = hws_bwc_matcher_find_at(bwc_matcher, rule_actions); ++ at_idx = hws_bwc_rule_get_at_idx(bwc_rule, rule_actions, idx); + if (unlikely(at_idx < 0)) { +- /* we need to extend BWC matcher action templates array */ + mutex_unlock(queue_lock); +- hws_bwc_lock_all_queues(ctx); +- +- /* check again - perhaps other thread already did extend_at */ +- at_idx = hws_bwc_matcher_find_at(bwc_matcher, rule_actions); +- if (likely(at_idx < 0)) { +- ret = hws_bwc_matcher_extend_at(bwc_matcher, rule_actions); +- if (unlikely(ret)) { +- hws_bwc_unlock_all_queues(ctx); +- mlx5hws_err(ctx, "BWC rule update: failed extending AT (%d)", ret); +- return -EINVAL; +- } +- +- /* action templates array was extended, we need the last idx */ +- at_idx = bwc_matcher->num_of_at - 1; +- +- ret = mlx5hws_matcher_attach_at(bwc_matcher->matcher, +- bwc_matcher->at[at_idx]); +- if (unlikely(ret)) { +- hws_bwc_unlock_all_queues(ctx); +- return ret; +- } +- } +- +- hws_bwc_unlock_all_queues(ctx); +- mutex_lock(queue_lock); ++ mlx5hws_err(ctx, "BWC rule update: failed getting AT\n"); ++ return -EINVAL; + } + + ret = hws_bwc_rule_update_sync(bwc_rule, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1373-net-mlx5-hws-rework-rehash-loop.patch b/SOURCES/1373-net-mlx5-hws-rework-rehash-loop.patch new file mode 100644 index 000000000..c3732cad3 --- /dev/null +++ b/SOURCES/1373-net-mlx5-hws-rework-rehash-loop.patch @@ -0,0 +1,209 @@ +From a380ec59fea7f1801a94b324bd7f688f2d3be0dd Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:05 -0400 +Subject: [PATCH] net/mlx5: HWS, rework rehash loop + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ef94799a87415790d4297cf06075f99b70c420cd +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:09 2025 +0300 + + net/mlx5: HWS, rework rehash loop + + Reworking the rehash loop - simplifying the code and making it less + error prone: + - Instead of doing round-robin on all the queues with batch of rules in + each cycle, just go over all the queues and move all the rules that + belong to this queue. + - If at some stage of moving the rule we get a failure (which should + not happen), this can't be rolled back. So instead of aborting + rehash and leaving the matcher in a broken state, allow the loop + to continue: attempt to move the rest of the rules and delete the + old matcher. A rule that failed to move to a new matcher will loose + its match STE once the rehash is completed and the old matcher is + deleted, so the rule won't match any traffic any more. This rule's + packets will fall back to the steering pipeline w/o HW offload. + Rehash procedure will return an error, which will cause the rule + insertion to fail for the rule that started this whole rehash. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-10-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 456fac895f5e..9e057f808ea5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -610,95 +610,69 @@ hws_bwc_matcher_find_at(struct mlx5hws_bwc_matcher *bwc_matcher, + + static int hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + { ++ bool move_error = false, poll_error = false, drain_error = false; + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mlx5hws_matcher *matcher = bwc_matcher->matcher; + u16 bwc_queues = mlx5hws_bwc_queues(ctx); +- struct mlx5hws_bwc_rule **bwc_rules; + struct mlx5hws_rule_attr rule_attr; +- u32 *pending_rules; +- int i, j, ret = 0; +- bool all_done; +- u16 burst_th; ++ struct mlx5hws_bwc_rule *bwc_rule; ++ struct mlx5hws_send_engine *queue; ++ struct list_head *rules_list; ++ u32 pending_rules; ++ int i, ret = 0; + + mlx5hws_bwc_rule_fill_attr(bwc_matcher, 0, 0, &rule_attr); + +- pending_rules = kcalloc(bwc_queues, sizeof(*pending_rules), GFP_KERNEL); +- if (!pending_rules) +- return -ENOMEM; +- +- bwc_rules = kcalloc(bwc_queues, sizeof(*bwc_rules), GFP_KERNEL); +- if (!bwc_rules) { +- ret = -ENOMEM; +- goto free_pending_rules; +- } +- + for (i = 0; i < bwc_queues; i++) { + if (list_empty(&bwc_matcher->rules[i])) +- bwc_rules[i] = NULL; +- else +- bwc_rules[i] = list_first_entry(&bwc_matcher->rules[i], +- struct mlx5hws_bwc_rule, +- list_node); +- } ++ continue; + +- do { +- all_done = true; ++ pending_rules = 0; ++ rule_attr.queue_id = mlx5hws_bwc_get_queue_id(ctx, i); ++ rules_list = &bwc_matcher->rules[i]; + +- for (i = 0; i < bwc_queues; i++) { +- rule_attr.queue_id = mlx5hws_bwc_get_queue_id(ctx, i); +- burst_th = hws_bwc_get_burst_th(ctx, rule_attr.queue_id); +- +- for (j = 0; j < burst_th && bwc_rules[i]; j++) { +- rule_attr.burst = !!((j + 1) % burst_th); +- ret = mlx5hws_matcher_resize_rule_move(bwc_matcher->matcher, +- bwc_rules[i]->rule, +- &rule_attr); +- if (unlikely(ret)) { +- mlx5hws_err(ctx, +- "Moving BWC rule failed during rehash (%d)\n", +- ret); +- goto free_bwc_rules; +- } ++ list_for_each_entry(bwc_rule, rules_list, list_node) { ++ ret = mlx5hws_matcher_resize_rule_move(matcher, ++ bwc_rule->rule, ++ &rule_attr); ++ if (unlikely(ret && !move_error)) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: move failed (%d), attempting to move rest of the rules\n", ++ ret); ++ move_error = true; ++ } + +- all_done = false; +- pending_rules[i]++; +- bwc_rules[i] = list_is_last(&bwc_rules[i]->list_node, +- &bwc_matcher->rules[i]) ? +- NULL : list_next_entry(bwc_rules[i], list_node); +- +- ret = mlx5hws_bwc_queue_poll(ctx, +- rule_attr.queue_id, +- &pending_rules[i], +- false); +- if (unlikely(ret)) { +- mlx5hws_err(ctx, +- "Moving BWC rule failed during rehash (%d)\n", +- ret); +- goto free_bwc_rules; +- } ++ pending_rules++; ++ ret = mlx5hws_bwc_queue_poll(ctx, ++ rule_attr.queue_id, ++ &pending_rules, ++ false); ++ if (unlikely(ret && !poll_error)) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: poll failed (%d), attempting to move rest of the rules\n", ++ ret); ++ poll_error = true; + } + } +- } while (!all_done); +- +- /* drain all the bwc queues */ +- for (i = 0; i < bwc_queues; i++) { +- if (pending_rules[i]) { +- u16 queue_id = mlx5hws_bwc_get_queue_id(ctx, i); + +- mlx5hws_send_engine_flush_queue(&ctx->send_queue[queue_id]); +- ret = mlx5hws_bwc_queue_poll(ctx, queue_id, +- &pending_rules[i], true); +- if (unlikely(ret)) { ++ if (pending_rules) { ++ queue = &ctx->send_queue[rule_attr.queue_id]; ++ mlx5hws_send_engine_flush_queue(queue); ++ ret = mlx5hws_bwc_queue_poll(ctx, ++ rule_attr.queue_id, ++ &pending_rules, ++ true); ++ if (unlikely(ret && !drain_error)) { + mlx5hws_err(ctx, +- "Moving BWC rule failed during rehash (%d)\n", ret); +- goto free_bwc_rules; ++ "Moving BWC rule: drain failed (%d), attempting to move rest of the rules\n", ++ ret); ++ drain_error = true; + } + } + } + +-free_bwc_rules: +- kfree(bwc_rules); +-free_pending_rules: +- kfree(pending_rules); ++ if (move_error || poll_error || drain_error) ++ ret = -EINVAL; + + return ret; + } +@@ -742,15 +716,18 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) + } + + ret = hws_bwc_matcher_move_all(bwc_matcher); +- if (ret) { +- mlx5hws_err(ctx, "Rehash error: moving rules failed\n"); +- return -ENOMEM; +- } ++ if (ret) ++ mlx5hws_err(ctx, "Rehash error: moving rules failed, attempting to remove the old matcher\n"); ++ ++ /* Error during rehash can't be rolled back. ++ * The best option here is to allow the rehash to complete and remove ++ * the old matcher - can't leave the matcher in the 'in_resize' state. ++ */ + + bwc_matcher->matcher = new_matcher; + mlx5hws_matcher_destroy(old_matcher); + +- return 0; ++ return ret; + } + + static int +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1374-net-mlx5-hws-dump-bad-completion-details.patch b/SOURCES/1374-net-mlx5-hws-dump-bad-completion-details.patch new file mode 100644 index 000000000..16fb21381 --- /dev/null +++ b/SOURCES/1374-net-mlx5-hws-dump-bad-completion-details.patch @@ -0,0 +1,191 @@ +From 965036aa649e8bc7524cb8c517e5701242864867 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:05 -0400 +Subject: [PATCH] net/mlx5: HWS, dump bad completion details + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 578b856b5e72b7b8cd2390a0e525e240d3e80c92 +Author: Yevgeny Kliteynik +Date: Sun May 11 22:38:10 2025 +0300 + + net/mlx5: HWS, dump bad completion details + + Failing to insert/delete a rule should not happen. If it does happen, + it would be good to know at which stage it happened and what was the + failure. This patch adds printing of bad CQE details. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1746992290-568936-11-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +index cb6abc4ab7df..c4b22be19a9b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +@@ -344,18 +344,133 @@ hws_send_engine_update_rule_resize(struct mlx5hws_send_engine *queue, + } + } + ++static void hws_send_engine_dump_error_cqe(struct mlx5hws_send_engine *queue, ++ struct mlx5hws_send_ring_priv *priv, ++ struct mlx5_cqe64 *cqe) ++{ ++ u8 wqe_opcode = cqe ? be32_to_cpu(cqe->sop_drop_qpn) >> 24 : 0; ++ struct mlx5hws_context *ctx = priv->rule->matcher->tbl->ctx; ++ u32 opcode = cqe ? get_cqe_opcode(cqe) : 0; ++ struct mlx5hws_rule *rule = priv->rule; ++ ++ /* If something bad happens and lots of rules are failing, we don't ++ * want to pollute dmesg. Print only the first bad cqe per engine, ++ * the one that started the avalanche. ++ */ ++ if (queue->error_cqe_printed) ++ return; ++ ++ queue->error_cqe_printed = true; ++ ++ if (mlx5hws_rule_move_in_progress(rule)) ++ mlx5hws_err(ctx, ++ "--- rule 0x%08llx: error completion moving rule: phase %s, wqes left %d\n", ++ HWS_PTR_TO_ID(rule), ++ rule->resize_info->state == ++ MLX5HWS_RULE_RESIZE_STATE_WRITING ? "WRITING" : ++ rule->resize_info->state == ++ MLX5HWS_RULE_RESIZE_STATE_DELETING ? "DELETING" : ++ "UNKNOWN", ++ rule->pending_wqes); ++ else ++ mlx5hws_err(ctx, ++ "--- rule 0x%08llx: error completion %s (%d), wqes left %d\n", ++ HWS_PTR_TO_ID(rule), ++ rule->status == ++ MLX5HWS_RULE_STATUS_CREATING ? "CREATING" : ++ rule->status == ++ MLX5HWS_RULE_STATUS_DELETING ? "DELETING" : ++ rule->status == ++ MLX5HWS_RULE_STATUS_FAILING ? "FAILING" : ++ rule->status == ++ MLX5HWS_RULE_STATUS_UPDATING ? "UPDATING" : "NA", ++ rule->status, ++ rule->pending_wqes); ++ ++ mlx5hws_err(ctx, " rule 0x%08llx: matcher 0x%llx %s\n", ++ HWS_PTR_TO_ID(rule), ++ HWS_PTR_TO_ID(rule->matcher), ++ (rule->matcher->flags & MLX5HWS_MATCHER_FLAGS_ISOLATED) ? ++ "(isolated)" : ""); ++ ++ if (!cqe) { ++ mlx5hws_err(ctx, " rule 0x%08llx: no CQE\n", ++ HWS_PTR_TO_ID(rule)); ++ return; ++ } ++ ++ mlx5hws_err(ctx, " rule 0x%08llx: cqe->opcode = %d %s\n", ++ HWS_PTR_TO_ID(rule), opcode, ++ opcode == MLX5_CQE_REQ ? "(MLX5_CQE_REQ)" : ++ opcode == MLX5_CQE_REQ_ERR ? "(MLX5_CQE_REQ_ERR)" : " "); ++ ++ if (opcode == MLX5_CQE_REQ_ERR) { ++ struct mlx5_err_cqe *err_cqe = (struct mlx5_err_cqe *)cqe; ++ ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |--- hw_error_syndrome = 0x%x\n", ++ HWS_PTR_TO_ID(rule), ++ err_cqe->rsvd1[16]); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |--- hw_syndrome_type = 0x%x\n", ++ HWS_PTR_TO_ID(rule), ++ err_cqe->rsvd1[17] >> 4); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |--- vendor_err_synd = 0x%x\n", ++ HWS_PTR_TO_ID(rule), ++ err_cqe->vendor_err_synd); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |--- syndrome = 0x%x\n", ++ HWS_PTR_TO_ID(rule), ++ err_cqe->syndrome); ++ } ++ ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: cqe->byte_cnt = 0x%08x\n", ++ HWS_PTR_TO_ID(rule), be32_to_cpu(cqe->byte_cnt)); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |-- UPDATE STATUS = %s\n", ++ HWS_PTR_TO_ID(rule), ++ (be32_to_cpu(cqe->byte_cnt) & 0x80000000) ? ++ "FAILURE" : "SUCCESS"); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |------- SYNDROME = %s\n", ++ HWS_PTR_TO_ID(rule), ++ ((be32_to_cpu(cqe->byte_cnt) & 0x00000003) == 1) ? ++ "SET_FLOW_FAIL" : ++ ((be32_to_cpu(cqe->byte_cnt) & 0x00000003) == 2) ? ++ "DISABLE_FLOW_FAIL" : "UNKNOWN"); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: cqe->sop_drop_qpn = 0x%08x\n", ++ HWS_PTR_TO_ID(rule), be32_to_cpu(cqe->sop_drop_qpn)); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |-send wqe opcode = 0x%02x %s\n", ++ HWS_PTR_TO_ID(rule), wqe_opcode, ++ wqe_opcode == MLX5HWS_WQE_OPCODE_TBL_ACCESS ? ++ "(MLX5HWS_WQE_OPCODE_TBL_ACCESS)" : "(UNKNOWN)"); ++ mlx5hws_err(ctx, ++ " rule 0x%08llx: |------------ qpn = 0x%06x\n", ++ HWS_PTR_TO_ID(rule), ++ be32_to_cpu(cqe->sop_drop_qpn) & 0xffffff); ++} ++ + static void hws_send_engine_update_rule(struct mlx5hws_send_engine *queue, + struct mlx5hws_send_ring_priv *priv, + u16 wqe_cnt, +- enum mlx5hws_flow_op_status *status) ++ enum mlx5hws_flow_op_status *status, ++ struct mlx5_cqe64 *cqe) + { + priv->rule->pending_wqes--; + +- if (*status == MLX5HWS_FLOW_OP_ERROR) { ++ if (unlikely(*status == MLX5HWS_FLOW_OP_ERROR)) { + if (priv->retry_id) { ++ /* If there is a retry_id, then it's not an error yet, ++ * retry to insert this rule in the collision RTC. ++ */ + hws_send_engine_retry_post_send(queue, priv, wqe_cnt); + return; + } ++ hws_send_engine_dump_error_cqe(queue, priv, cqe); + /* Some part of the rule failed */ + priv->rule->status = MLX5HWS_RULE_STATUS_FAILING; + *priv->used_id = 0; +@@ -420,7 +535,8 @@ static void hws_send_engine_update(struct mlx5hws_send_engine *queue, + + if (priv->user_data) { + if (priv->rule) { +- hws_send_engine_update_rule(queue, priv, wqe_cnt, &status); ++ hws_send_engine_update_rule(queue, priv, wqe_cnt, ++ &status, cqe); + /* Completion is provided on the last rule WQE */ + if (priv->rule->pending_wqes) + return; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.h +index f833092235c1..3fb8e99309b2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.h +@@ -140,6 +140,7 @@ struct mlx5hws_send_engine { + u16 used_entries; + u16 num_entries; + bool err; ++ bool error_cqe_printed; + struct mutex lock; /* Protects the send engine */ + }; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1375-net-mlx5-use-to-delayed-work.patch b/SOURCES/1375-net-mlx5-use-to-delayed-work.patch new file mode 100644 index 000000000..68373c1a5 --- /dev/null +++ b/SOURCES/1375-net-mlx5-use-to-delayed-work.patch @@ -0,0 +1,40 @@ +From 5447a8c66fd43b005c3f33fe8b63145af1ce5893 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:06 -0400 +Subject: [PATCH] net/mlx5: Use to_delayed_work() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ee39bae6c141876f5b4c001f6b12b4f8ffb4cd08 +Author: Chen Ni +Date: Wed May 14 15:24:19 2025 +0800 + + net/mlx5: Use to_delayed_work() + + Use to_delayed_work() instead of open-coding it. + + Signed-off-by: Chen Ni + Acked-by: Mark Bloch + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20250514072419.2707578-1-nichen@iscas.ac.cn + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +index e53dbdc0a7a1..b1aeea7c4a91 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +@@ -927,8 +927,7 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force + + static void cb_timeout_handler(struct work_struct *work) + { +- struct delayed_work *dwork = container_of(work, struct delayed_work, +- work); ++ struct delayed_work *dwork = to_delayed_work(work); + struct mlx5_cmd_work_ent *ent = container_of(dwork, + struct mlx5_cmd_work_ent, + cb_timeout_work); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1376-net-mlx5-sws-fix-reformat-id-error-handling.patch b/SOURCES/1376-net-mlx5-sws-fix-reformat-id-error-handling.patch new file mode 100644 index 000000000..f902c0c0d --- /dev/null +++ b/SOURCES/1376-net-mlx5-sws-fix-reformat-id-error-handling.patch @@ -0,0 +1,196 @@ +From d9f5ece10ab6345c8de1a61e674bcdef45d0ce56 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:06 -0400 +Subject: [PATCH] net/mlx5: SWS, fix reformat id error handling + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ca7690dae1269f454572c163ed5271feed060af5 +Author: Vlad Dogaru +Date: Tue May 20 21:46:39 2025 +0300 + + net/mlx5: SWS, fix reformat id error handling + + The firmware reformat id is a u32 and can't safely be returned as an + int. Because the functions also need a way to signal error, prefer to + return the id as an output parameter and keep the return code only for + success/error. + + While we're at it, also extract some duplicate code to fetch the + reformat id from a more generic struct pkt_reformat. + + Signed-off-by: Vlad Dogaru + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1747766802-958178-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +index a47c29571f64..1af76da8b132 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +@@ -527,7 +527,7 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev, + struct mlx5_flow_rule *dst; + void *in_flow_context, *vlan; + void *in_match_value; +- int reformat_id = 0; ++ u32 reformat_id = 0; + unsigned int inlen; + int dst_cnt_size; + u32 *in, action; +@@ -580,23 +580,21 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev, + MLX5_SET(flow_context, in_flow_context, action, action); + + if (!extended_dest && fte->act_dests.action.pkt_reformat) { +- struct mlx5_pkt_reformat *pkt_reformat = fte->act_dests.action.pkt_reformat; +- +- if (pkt_reformat->owner == MLX5_FLOW_RESOURCE_OWNER_SW) { +- reformat_id = mlx5_fs_dr_action_get_pkt_reformat_id(pkt_reformat); +- if (reformat_id < 0) { +- mlx5_core_err(dev, +- "Unsupported SW-owned pkt_reformat type (%d) in FW-owned table\n", +- pkt_reformat->reformat_type); +- err = reformat_id; +- goto err_out; +- } +- } else { +- reformat_id = fte->act_dests.action.pkt_reformat->id; ++ struct mlx5_pkt_reformat *pkt_reformat = ++ fte->act_dests.action.pkt_reformat; ++ ++ err = mlx5_fs_get_packet_reformat_id(pkt_reformat, ++ &reformat_id); ++ if (err) { ++ mlx5_core_err(dev, ++ "Unsupported pkt_reformat type (%d)\n", ++ pkt_reformat->reformat_type); ++ goto err_out; + } + } + +- MLX5_SET(flow_context, in_flow_context, packet_reformat_id, (u32)reformat_id); ++ MLX5_SET(flow_context, in_flow_context, packet_reformat_id, ++ reformat_id); + + if (fte->act_dests.action.modify_hdr) { + if (fte->act_dests.action.modify_hdr->owner == MLX5_FLOW_RESOURCE_OWNER_SW) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index a22ecf141518..c7ce9fc797c4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -1830,14 +1830,33 @@ static int create_auto_flow_group(struct mlx5_flow_table *ft, + return err; + } + ++int mlx5_fs_get_packet_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *id) ++{ ++ switch (pkt_reformat->owner) { ++ case MLX5_FLOW_RESOURCE_OWNER_FW: ++ *id = pkt_reformat->id; ++ return 0; ++ case MLX5_FLOW_RESOURCE_OWNER_SW: ++ return mlx5_fs_dr_action_get_pkt_reformat_id(pkt_reformat, id); ++ default: ++ return -EINVAL; ++ } ++} ++ + static bool mlx5_pkt_reformat_cmp(struct mlx5_pkt_reformat *p1, + struct mlx5_pkt_reformat *p2) + { +- return p1->owner == p2->owner && +- (p1->owner == MLX5_FLOW_RESOURCE_OWNER_FW ? +- p1->id == p2->id : +- mlx5_fs_dr_action_get_pkt_reformat_id(p1) == +- mlx5_fs_dr_action_get_pkt_reformat_id(p2)); ++ int err1, err2; ++ u32 id1, id2; ++ ++ if (p1->owner != p2->owner) ++ return false; ++ ++ err1 = mlx5_fs_get_packet_reformat_id(p1, &id1); ++ err2 = mlx5_fs_get_packet_reformat_id(p2, &id2); ++ ++ return !err1 && !err2 && id1 == id2; + } + + static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +index 1f523fb761f6..a41d3491d2af 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +@@ -387,6 +387,9 @@ u32 mlx5_fs_get_capabilities(struct mlx5_core_dev *dev, enum mlx5_flow_namespace + + struct mlx5_flow_root_namespace *find_root(struct fs_node *node); + ++int mlx5_fs_get_packet_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *id); ++ + #define fs_get_obj(v, _node) {v = container_of((_node), typeof(*v), node); } + + #define fs_list_for_each_entry(pos, root) \ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.c +index 8007d3f523c9..f367997ab61e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.c +@@ -833,15 +833,21 @@ static u32 mlx5_cmd_dr_get_capabilities(struct mlx5_flow_root_namespace *ns, + return steering_caps; + } + +-int mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat) ++int ++mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *reformat_id) + { ++ struct mlx5dr_action *dr_action; ++ + switch (pkt_reformat->reformat_type) { + case MLX5_REFORMAT_TYPE_L2_TO_VXLAN: + case MLX5_REFORMAT_TYPE_L2_TO_NVGRE: + case MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: + case MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: + case MLX5_REFORMAT_TYPE_INSERT_HDR: +- return mlx5dr_action_get_pkt_reformat_id(pkt_reformat->fs_dr_action.dr_action); ++ dr_action = pkt_reformat->fs_dr_action.dr_action; ++ *reformat_id = mlx5dr_action_get_pkt_reformat_id(dr_action); ++ return 0; + } + return -EOPNOTSUPP; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.h +index 99a3b2eff6b8..f869f2daefbf 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/fs_dr.h +@@ -38,7 +38,9 @@ struct mlx5_fs_dr_table { + + bool mlx5_fs_dr_is_supported(struct mlx5_core_dev *dev); + +-int mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat); ++int ++mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *reformat_id); + + const struct mlx5_flow_cmds *mlx5_fs_cmd_get_dr_cmds(void); + +@@ -49,9 +51,11 @@ static inline const struct mlx5_flow_cmds *mlx5_fs_cmd_get_dr_cmds(void) + return NULL; + } + +-static inline u32 mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat) ++static inline int ++mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *reformat_id) + { +- return 0; ++ return -EOPNOTSUPP; + } + + static inline bool mlx5_fs_dr_is_supported(struct mlx5_core_dev *dev) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1377-net-mlx5-hws-register-reformat-actions-with-fw.patch b/SOURCES/1377-net-mlx5-hws-register-reformat-actions-with-fw.patch new file mode 100644 index 000000000..815a51a5c --- /dev/null +++ b/SOURCES/1377-net-mlx5-hws-register-reformat-actions-with-fw.patch @@ -0,0 +1,246 @@ +From a2158eb7d1e077c0cf3a4b1bad863e6bd081d31d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:06 -0400 +Subject: [PATCH] net/mlx5: HWS, register reformat actions with fw + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b206d9ec19dfc2db706883ff6b46b259831a033d +Author: Vlad Dogaru +Date: Tue May 20 21:46:40 2025 +0300 + + net/mlx5: HWS, register reformat actions with fw + + Hardware steering handles actions differently from firmware, but for + termination rules that use encapsulation the firmware needs to be aware + of the action. + + Fix this by registering reformat actions with the firmware the first + time this is needed. To do this, add a third possible owner for an + action, and also a lock to protect against registration of the same + action from different threads. + + Signed-off-by: Vlad Dogaru + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1747766802-958178-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index c7ce9fc797c4..c330b64a506b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -1839,6 +1839,8 @@ int mlx5_fs_get_packet_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, + return 0; + case MLX5_FLOW_RESOURCE_OWNER_SW: + return mlx5_fs_dr_action_get_pkt_reformat_id(pkt_reformat, id); ++ case MLX5_FLOW_RESOURCE_OWNER_HWS: ++ return mlx5_fs_hws_action_get_pkt_reformat_id(pkt_reformat, id); + default: + return -EINVAL; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +index a41d3491d2af..e6a95b310b55 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +@@ -58,6 +58,7 @@ struct mlx5_flow_definer { + enum mlx5_flow_resource_owner { + MLX5_FLOW_RESOURCE_OWNER_FW, + MLX5_FLOW_RESOURCE_OWNER_SW, ++ MLX5_FLOW_RESOURCE_OWNER_HWS, + }; + + struct mlx5_modify_hdr { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index bef4d25c1a2a..aa47a7af6f50 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -72,6 +72,11 @@ enum mlx5hws_action_type mlx5hws_action_get_type(struct mlx5hws_action *action) + return action->type; + } + ++struct mlx5_core_dev *mlx5hws_action_get_dev(struct mlx5hws_action *action) ++{ ++ return action->ctx->mdev; ++} ++ + static int hws_action_get_shared_stc_nic(struct mlx5hws_context *ctx, + enum mlx5hws_context_shared_stc_type stc_type, + u8 tbl_type) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +index 1b787cd66e6f..9d1c0e4b224a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +@@ -1081,13 +1081,8 @@ static int mlx5_cmd_hws_create_fte(struct mlx5_flow_root_namespace *ns, + struct mlx5hws_bwc_rule *rule; + int err = 0; + +- if (mlx5_fs_cmd_is_fw_term_table(ft)) { +- /* Packet reformat on terminamtion table not supported yet */ +- if (fte->act_dests.action.action & +- MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT) +- return -EOPNOTSUPP; ++ if (mlx5_fs_cmd_is_fw_term_table(ft)) + return mlx5_fs_cmd_get_fw_cmds()->create_fte(ns, ft, group, fte); +- } + + err = mlx5_fs_fte_get_hws_actions(ns, ft, group, fte, &ractions); + if (err) +@@ -1362,7 +1357,7 @@ mlx5_cmd_hws_packet_reformat_alloc(struct mlx5_flow_root_namespace *ns, + pkt_reformat->fs_hws_action.pr_data = pr_data; + } + +- pkt_reformat->owner = MLX5_FLOW_RESOURCE_OWNER_SW; ++ pkt_reformat->owner = MLX5_FLOW_RESOURCE_OWNER_HWS; + pkt_reformat->fs_hws_action.hws_action = hws_action; + return 0; + +@@ -1380,6 +1375,15 @@ static void mlx5_cmd_hws_packet_reformat_dealloc(struct mlx5_flow_root_namespace + struct mlx5_fs_hws_pr *pr_data; + struct mlx5_fs_pool *pr_pool; + ++ if (pkt_reformat->fs_hws_action.fw_reformat_id != 0) { ++ struct mlx5_pkt_reformat fw_pkt_reformat = { 0 }; ++ ++ fw_pkt_reformat.id = pkt_reformat->fs_hws_action.fw_reformat_id; ++ mlx5_fs_cmd_get_fw_cmds()-> ++ packet_reformat_dealloc(ns, &fw_pkt_reformat); ++ pkt_reformat->fs_hws_action.fw_reformat_id = 0; ++ } ++ + if (pkt_reformat->reformat_type == MLX5_REFORMAT_TYPE_REMOVE_HDR) + return; + +@@ -1499,6 +1503,7 @@ static int mlx5_cmd_hws_modify_header_alloc(struct mlx5_flow_root_namespace *ns, + err = -ENOMEM; + goto release_mh; + } ++ mutex_init(&modify_hdr->fs_hws_action.lock); + modify_hdr->fs_hws_action.mh_data = mh_data; + modify_hdr->fs_hws_action.fs_pool = pool; + modify_hdr->owner = MLX5_FLOW_RESOURCE_OWNER_SW; +@@ -1532,6 +1537,58 @@ static void mlx5_cmd_hws_modify_header_dealloc(struct mlx5_flow_root_namespace * + modify_hdr->fs_hws_action.mh_data = NULL; + } + ++int ++mlx5_fs_hws_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *reformat_id) ++{ ++ enum mlx5_flow_namespace_type ns_type = pkt_reformat->ns_type; ++ struct mutex *lock = &pkt_reformat->fs_hws_action.lock; ++ u32 *id = &pkt_reformat->fs_hws_action.fw_reformat_id; ++ struct mlx5_pkt_reformat fw_pkt_reformat = { 0 }; ++ struct mlx5_pkt_reformat_params params = { 0 }; ++ struct mlx5_flow_root_namespace *ns; ++ struct mlx5_core_dev *dev; ++ int ret; ++ ++ mutex_lock(lock); ++ ++ if (*id != 0) { ++ *reformat_id = *id; ++ ret = 0; ++ goto unlock; ++ } ++ ++ dev = mlx5hws_action_get_dev(pkt_reformat->fs_hws_action.hws_action); ++ if (!dev) { ++ ret = -EINVAL; ++ goto unlock; ++ } ++ ++ ns = mlx5_get_root_namespace(dev, ns_type); ++ if (!ns) { ++ ret = -EINVAL; ++ goto unlock; ++ } ++ ++ params.type = pkt_reformat->reformat_type; ++ params.size = pkt_reformat->fs_hws_action.pr_data->data_size; ++ params.data = pkt_reformat->fs_hws_action.pr_data->data; ++ ++ ret = mlx5_fs_cmd_get_fw_cmds()-> ++ packet_reformat_alloc(ns, ¶ms, ns_type, &fw_pkt_reformat); ++ if (ret) ++ goto unlock; ++ ++ *id = fw_pkt_reformat.id; ++ *reformat_id = *id; ++ ret = 0; ++ ++unlock: ++ mutex_unlock(lock); ++ ++ return ret; ++} ++ + static int mlx5_cmd_hws_create_match_definer(struct mlx5_flow_root_namespace *ns, + u16 format_id, u32 *match_mask) + { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.h +index 8b56298288da..b92d55b2d147 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.h +@@ -41,6 +41,11 @@ struct mlx5_fs_hws_action { + struct mlx5_fs_pool *fs_pool; + struct mlx5_fs_hws_pr *pr_data; + struct mlx5_fs_hws_mh *mh_data; ++ u32 fw_reformat_id; ++ /* Protect `fw_reformat_id` against being initialized from multiple ++ * threads. ++ */ ++ struct mutex lock; + }; + + struct mlx5_fs_hws_matcher { +@@ -84,12 +89,23 @@ void mlx5_fs_put_hws_action(struct mlx5_fs_hws_data *fs_hws_data); + + #ifdef CONFIG_MLX5_HW_STEERING + ++int ++mlx5_fs_hws_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *reformat_id); ++ + bool mlx5_fs_hws_is_supported(struct mlx5_core_dev *dev); + + const struct mlx5_flow_cmds *mlx5_fs_cmd_get_hws_cmds(void); + + #else + ++static inline int ++mlx5_fs_hws_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat, ++ u32 *reformat_id) ++{ ++ return -EOPNOTSUPP; ++} ++ + static inline bool mlx5_fs_hws_is_supported(struct mlx5_core_dev *dev) + { + return false; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index fbd63369da10..9bbadc4d8a0b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -503,6 +503,15 @@ int mlx5hws_rule_action_update(struct mlx5hws_rule *rule, + enum mlx5hws_action_type + mlx5hws_action_get_type(struct mlx5hws_action *action); + ++/** ++ * mlx5hws_action_get_dev - Get mlx5 core device. ++ * ++ * @action: The action to get the device from. ++ * ++ * Return: mlx5 core device. ++ */ ++struct mlx5_core_dev *mlx5hws_action_get_dev(struct mlx5hws_action *action); ++ + /** + * mlx5hws_action_create_dest_drop - Create a direct rule drop action. + * +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1378-net-mlx5-hws-fix-typo-nope-to-nop.patch b/SOURCES/1378-net-mlx5-hws-fix-typo-nope-to-nop.patch new file mode 100644 index 000000000..b3857668c --- /dev/null +++ b/SOURCES/1378-net-mlx5-hws-fix-typo-nope-to-nop.patch @@ -0,0 +1,220 @@ +From 80492ad30af1ed97b996b34f6e6daac2f98ef98d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:07 -0400 +Subject: [PATCH] net/mlx5: HWS, fix typo - 'nope' to 'nop' + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 0b6e452caf03da63aeb2e84475771d6fb6d6cd99 +Author: Yevgeny Kliteynik +Date: Tue May 20 21:46:41 2025 +0300 + + net/mlx5: HWS, fix typo - 'nope' to 'nop' + + Fix typo - rename 'nope_locations' to 'nop_locations', which describes + the locations of 'nop' actions. To shorten the lines, this renaming + also required some refactoring. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1747766802-958178-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index aa47a7af6f50..64d115feef2c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1207,16 +1207,16 @@ hws_action_create_modify_header_hws(struct mlx5hws_action *action, + for (i = 0; i < num_of_patterns; i++) { + size_t new_num_actions; + size_t cur_num_actions; +- u32 nope_location; ++ u32 nop_locations; + + cur_num_actions = pattern[i].sz / MLX5HWS_MODIFY_ACTION_SIZE; + +- mlx5hws_pat_calc_nope(pattern[i].data, cur_num_actions, +- pat_max_sz / MLX5HWS_MODIFY_ACTION_SIZE, +- &new_num_actions, &nope_location, +- &new_pattern[i * pat_max_sz]); ++ mlx5hws_pat_calc_nop(pattern[i].data, cur_num_actions, ++ pat_max_sz / MLX5HWS_MODIFY_ACTION_SIZE, ++ &new_num_actions, &nop_locations, ++ &new_pattern[i * pat_max_sz]); + +- action[i].modify_header.nope_locations = nope_location; ++ action[i].modify_header.nop_locations = nop_locations; + action[i].modify_header.num_of_actions = new_num_actions; + + max_mh_actions = max(max_mh_actions, new_num_actions); +@@ -1263,7 +1263,7 @@ hws_action_create_modify_header_hws(struct mlx5hws_action *action, + MLX5_GET(set_action_in, pattern[i].data, action_type); + } else { + /* Multiple modify actions require a pattern */ +- if (unlikely(action[i].modify_header.nope_locations)) { ++ if (unlikely(action[i].modify_header.nop_locations)) { + size_t pattern_sz; + + pattern_sz = action[i].modify_header.num_of_actions * +@@ -2105,12 +2105,12 @@ static void hws_action_modify_write(struct mlx5hws_send_engine *queue, + u32 arg_idx, + u8 *arg_data, + u16 num_of_actions, +- u32 nope_locations) ++ u32 nop_locations) + { + u8 *new_arg_data = NULL; + int i, j; + +- if (unlikely(nope_locations)) { ++ if (unlikely(nop_locations)) { + new_arg_data = kcalloc(num_of_actions, + MLX5HWS_MODIFY_ACTION_SIZE, GFP_KERNEL); + if (unlikely(!new_arg_data)) +@@ -2118,7 +2118,7 @@ static void hws_action_modify_write(struct mlx5hws_send_engine *queue, + + for (i = 0, j = 0; i < num_of_actions; i++, j++) { + memcpy(&new_arg_data[j], arg_data, MLX5HWS_MODIFY_ACTION_SIZE); +- if (BIT(i) & nope_locations) ++ if (BIT(i) & nop_locations) + j++; + } + } +@@ -2215,6 +2215,7 @@ hws_action_setter_modify_header(struct mlx5hws_actions_apply_data *apply, + struct mlx5hws_action *action; + u32 arg_sz, arg_idx; + u8 *single_action; ++ u8 max_actions; + __be32 stc_idx; + + rule_action = &apply->rule_action[setter->idx_double]; +@@ -2242,21 +2243,23 @@ hws_action_setter_modify_header(struct mlx5hws_actions_apply_data *apply, + + apply->wqe_data[MLX5HWS_ACTION_OFFSET_DW7] = + *(__be32 *)MLX5_ADDR_OF(set_action_in, single_action, data); +- } else { +- /* Argument offset multiple with number of args per these actions */ +- arg_sz = mlx5hws_arg_get_arg_size(action->modify_header.max_num_of_actions); +- arg_idx = rule_action->modify_header.offset * arg_sz; +- +- apply->wqe_data[MLX5HWS_ACTION_OFFSET_DW7] = htonl(arg_idx); +- +- if (!(action->flags & MLX5HWS_ACTION_FLAG_SHARED)) { +- apply->require_dep = 1; +- hws_action_modify_write(apply->queue, +- action->modify_header.arg_id + arg_idx, +- rule_action->modify_header.data, +- action->modify_header.num_of_actions, +- action->modify_header.nope_locations); +- } ++ return; ++ } ++ ++ /* Argument offset multiple with number of args per these actions */ ++ max_actions = action->modify_header.max_num_of_actions; ++ arg_sz = mlx5hws_arg_get_arg_size(max_actions); ++ arg_idx = rule_action->modify_header.offset * arg_sz; ++ ++ apply->wqe_data[MLX5HWS_ACTION_OFFSET_DW7] = htonl(arg_idx); ++ ++ if (!(action->flags & MLX5HWS_ACTION_FLAG_SHARED)) { ++ apply->require_dep = 1; ++ hws_action_modify_write(apply->queue, ++ action->modify_header.arg_id + arg_idx, ++ rule_action->modify_header.data, ++ action->modify_header.num_of_actions, ++ action->modify_header.nop_locations); + } + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h +index 25fa0d4c9221..55a079fdd08f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h +@@ -136,7 +136,7 @@ struct mlx5hws_action { + u32 pat_id; + u32 arg_id; + __be64 single_action; +- u32 nope_locations; ++ u32 nop_locations; + u8 num_of_patterns; + u8 single_action_type; + u8 num_of_actions; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +index f51ed24526b9..78de19c074a7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +@@ -522,9 +522,9 @@ bool mlx5hws_pat_verify_actions(struct mlx5hws_context *ctx, __be64 pattern[], s + return true; + } + +-void mlx5hws_pat_calc_nope(__be64 *pattern, size_t num_actions, +- size_t max_actions, size_t *new_size, +- u32 *nope_location, __be64 *new_pat) ++void mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, ++ size_t max_actions, size_t *new_size, ++ u32 *nop_locations, __be64 *new_pat) + { + u16 prev_src_field = 0, prev_dst_field = 0; + u16 src_field, dst_field; +@@ -532,7 +532,7 @@ void mlx5hws_pat_calc_nope(__be64 *pattern, size_t num_actions, + size_t i, j; + + *new_size = num_actions; +- *nope_location = 0; ++ *nop_locations = 0; + + if (num_actions == 1) + return; +@@ -546,18 +546,18 @@ void mlx5hws_pat_calc_nope(__be64 *pattern, size_t num_actions, + if (action_type == MLX5_ACTION_TYPE_COPY && + (prev_src_field == src_field || + prev_dst_field == dst_field)) { +- /* need Nope */ ++ /* need Nop */ + *new_size += 1; +- *nope_location |= BIT(i); ++ *nop_locations |= BIT(i); + memset(&new_pat[j], 0, MLX5HWS_MODIFY_ACTION_SIZE); + MLX5_SET(set_action_in, &new_pat[j], + action_type, + MLX5_MODIFICATION_TYPE_NOP); + j++; + } else if (prev_src_field == src_field) { +- /* need Nope*/ ++ /* need Nop */ + *new_size += 1; +- *nope_location |= BIT(i); ++ *nop_locations |= BIT(i); + MLX5_SET(set_action_in, &new_pat[j], + action_type, + MLX5_MODIFICATION_TYPE_NOP); +@@ -568,7 +568,7 @@ void mlx5hws_pat_calc_nope(__be64 *pattern, size_t num_actions, + /* check if no more space */ + if (j > max_actions) { + *new_size = num_actions; +- *nope_location = 0; ++ *nop_locations = 0; + return; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h +index 8ddb51980044..91bd2572a341 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h +@@ -96,6 +96,7 @@ int mlx5hws_arg_write_inline_arg_data(struct mlx5hws_context *ctx, + u8 *arg_data, + size_t data_size); + +-void mlx5hws_pat_calc_nope(__be64 *pattern, size_t num_actions, size_t max_actions, +- size_t *new_size, u32 *nope_location, __be64 *new_pat); ++void mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, ++ size_t max_actions, size_t *new_size, ++ u32 *nop_locations, __be64 *new_pat); + #endif /* MLX5HWS_PAT_ARG_H_ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1379-net-mlx5-hws-handle-modify-header-actions-dependency.patch b/SOURCES/1379-net-mlx5-hws-handle-modify-header-actions-dependency.patch new file mode 100644 index 000000000..7fc53d5a3 --- /dev/null +++ b/SOURCES/1379-net-mlx5-hws-handle-modify-header-actions-dependency.patch @@ -0,0 +1,228 @@ +From 49dd65e6ee03b3423cd69b885fdab5ae5ec32303 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:07 -0400 +Subject: [PATCH] net/mlx5: HWS, handle modify header actions dependency + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 01e035fd0380b285d72725adb5a45f1d73549db8 +Author: Yevgeny Kliteynik +Date: Tue May 20 21:46:42 2025 +0300 + + net/mlx5: HWS, handle modify header actions dependency + + Having adjacent accelerated modify header actions (so-called + pattern-argument actions) may result in inconsistent outcome. + These inconsistencies can take the form of writes to the same + field or a read coupled with a write to the same field. The + solution is to detect such dependencies and insert nops between + the offending actions. + + The existing implementation had a few issues, which pretty much + required a complete rewrite of the code that handles these + dependencies. + + In the new implementation we're doing the following: + + * Checking any two adjacent actions for conflicts (not just + odd-even pairs). + * Marking 'set' and 'add' action fields as destination, rather + than source, for the purposes of checking for conflicts. + * Checking all types of actions ('add', 'set', 'copy') for + dependencies. + * Managing offsets of the args in the buffer - copy the action + args to the right place in the buffer. + * Checking that after inserting nops we're still within the number + of supported actions - return an error otherwise. + + Signed-off-by: Vlad Dogaru + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1747766802-958178-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index 64d115feef2c..fb62f3bc4bd4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1190,14 +1190,15 @@ hws_action_create_modify_header_hws(struct mlx5hws_action *action, + struct mlx5hws_action_mh_pattern *pattern, + u32 log_bulk_size) + { ++ u16 num_actions, max_mh_actions = 0, hw_max_actions; + struct mlx5hws_context *ctx = action->ctx; +- u16 num_actions, max_mh_actions = 0; + int i, ret, size_in_bytes; + u32 pat_id, arg_id = 0; + __be64 *new_pattern; + size_t pat_max_sz; + + pat_max_sz = MLX5HWS_ARG_CHUNK_SIZE_MAX * MLX5HWS_ARG_DATA_SIZE; ++ hw_max_actions = pat_max_sz / MLX5HWS_MODIFY_ACTION_SIZE; + size_in_bytes = pat_max_sz * sizeof(__be64); + new_pattern = kcalloc(num_of_patterns, size_in_bytes, GFP_KERNEL); + if (!new_pattern) +@@ -1211,10 +1212,14 @@ hws_action_create_modify_header_hws(struct mlx5hws_action *action, + + cur_num_actions = pattern[i].sz / MLX5HWS_MODIFY_ACTION_SIZE; + +- mlx5hws_pat_calc_nop(pattern[i].data, cur_num_actions, +- pat_max_sz / MLX5HWS_MODIFY_ACTION_SIZE, +- &new_num_actions, &nop_locations, +- &new_pattern[i * pat_max_sz]); ++ ret = mlx5hws_pat_calc_nop(pattern[i].data, cur_num_actions, ++ hw_max_actions, &new_num_actions, ++ &nop_locations, ++ &new_pattern[i * pat_max_sz]); ++ if (ret) { ++ mlx5hws_err(ctx, "Too many actions after nop insertion\n"); ++ goto free_new_pat; ++ } + + action[i].modify_header.nop_locations = nop_locations; + action[i].modify_header.num_of_actions = new_num_actions; +@@ -2116,10 +2121,12 @@ static void hws_action_modify_write(struct mlx5hws_send_engine *queue, + if (unlikely(!new_arg_data)) + return; + +- for (i = 0, j = 0; i < num_of_actions; i++, j++) { +- memcpy(&new_arg_data[j], arg_data, MLX5HWS_MODIFY_ACTION_SIZE); ++ for (i = 0, j = 0; j < num_of_actions; i++, j++) { + if (BIT(i) & nop_locations) + j++; ++ memcpy(&new_arg_data[j * MLX5HWS_MODIFY_ACTION_SIZE], ++ &arg_data[i * MLX5HWS_MODIFY_ACTION_SIZE], ++ MLX5HWS_MODIFY_ACTION_SIZE); + } + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +index 78de19c074a7..51e4c551e0ef 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +@@ -490,8 +490,8 @@ hws_action_modify_get_target_fields(u8 action_type, __be64 *pattern, + switch (action_type) { + case MLX5_ACTION_TYPE_SET: + case MLX5_ACTION_TYPE_ADD: +- *src_field = MLX5_GET(set_action_in, pattern, field); +- *dst_field = INVALID_FIELD; ++ *src_field = INVALID_FIELD; ++ *dst_field = MLX5_GET(set_action_in, pattern, field); + break; + case MLX5_ACTION_TYPE_COPY: + *src_field = MLX5_GET(copy_action_in, pattern, src_field); +@@ -522,57 +522,59 @@ bool mlx5hws_pat_verify_actions(struct mlx5hws_context *ctx, __be64 pattern[], s + return true; + } + +-void mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, +- size_t max_actions, size_t *new_size, +- u32 *nop_locations, __be64 *new_pat) ++int mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, ++ size_t max_actions, size_t *new_size, ++ u32 *nop_locations, __be64 *new_pat) + { +- u16 prev_src_field = 0, prev_dst_field = 0; ++ u16 prev_src_field = INVALID_FIELD, prev_dst_field = INVALID_FIELD; + u16 src_field, dst_field; + u8 action_type; ++ bool dependent; + size_t i, j; + + *new_size = num_actions; + *nop_locations = 0; + + if (num_actions == 1) +- return; ++ return 0; + + for (i = 0, j = 0; i < num_actions; i++, j++) { +- action_type = MLX5_GET(set_action_in, &pattern[i], action_type); ++ if (j >= max_actions) ++ return -EINVAL; + ++ action_type = MLX5_GET(set_action_in, &pattern[i], action_type); + hws_action_modify_get_target_fields(action_type, &pattern[i], + &src_field, &dst_field); +- if (i % 2) { +- if (action_type == MLX5_ACTION_TYPE_COPY && +- (prev_src_field == src_field || +- prev_dst_field == dst_field)) { +- /* need Nop */ +- *new_size += 1; +- *nop_locations |= BIT(i); +- memset(&new_pat[j], 0, MLX5HWS_MODIFY_ACTION_SIZE); +- MLX5_SET(set_action_in, &new_pat[j], +- action_type, +- MLX5_MODIFICATION_TYPE_NOP); +- j++; +- } else if (prev_src_field == src_field) { +- /* need Nop */ +- *new_size += 1; +- *nop_locations |= BIT(i); +- MLX5_SET(set_action_in, &new_pat[j], +- action_type, +- MLX5_MODIFICATION_TYPE_NOP); +- j++; +- } +- } +- memcpy(&new_pat[j], &pattern[i], MLX5HWS_MODIFY_ACTION_SIZE); +- /* check if no more space */ +- if (j > max_actions) { +- *new_size = num_actions; +- *nop_locations = 0; +- return; ++ ++ /* For every action, look at it and the previous one. The two ++ * actions are dependent if: ++ */ ++ dependent = ++ (i > 0) && ++ /* At least one of the actions is a write and */ ++ (dst_field != INVALID_FIELD || ++ prev_dst_field != INVALID_FIELD) && ++ /* One reads from the other's source */ ++ (dst_field == prev_src_field || ++ src_field == prev_dst_field || ++ /* Or both write to the same destination */ ++ dst_field == prev_dst_field); ++ ++ if (dependent) { ++ *new_size += 1; ++ *nop_locations |= BIT(i); ++ memset(&new_pat[j], 0, MLX5HWS_MODIFY_ACTION_SIZE); ++ MLX5_SET(set_action_in, &new_pat[j], action_type, ++ MLX5_MODIFICATION_TYPE_NOP); ++ j++; ++ if (j >= max_actions) ++ return -EINVAL; + } + ++ memcpy(&new_pat[j], &pattern[i], MLX5HWS_MODIFY_ACTION_SIZE); + prev_src_field = src_field; + prev_dst_field = dst_field; + } ++ ++ return 0; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h +index 91bd2572a341..7fbd8dc7aa18 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.h +@@ -96,7 +96,7 @@ int mlx5hws_arg_write_inline_arg_data(struct mlx5hws_context *ctx, + u8 *arg_data, + size_t data_size); + +-void mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, +- size_t max_actions, size_t *new_size, +- u32 *nop_locations, __be64 *new_pat); ++int mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, ++ size_t max_actions, size_t *new_size, ++ u32 *nop_locations, __be64 *new_pat); + #endif /* MLX5HWS_PAT_ARG_H_ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1380-net-mlx5-core-add-error-handling-inmlx5-query-nic-vport-qkey.patch b/SOURCES/1380-net-mlx5-core-add-error-handling-inmlx5-query-nic-vport-qkey.patch new file mode 100644 index 000000000..56723ff72 --- /dev/null +++ b/SOURCES/1380-net-mlx5-core-add-error-handling-inmlx5-query-nic-vport-qkey.patch @@ -0,0 +1,64 @@ +From 07d72a58ed409c2ba3a0099febcaf5e9186274e4 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:07 -0400 +Subject: [PATCH] net/mlx5_core: Add error handling + inmlx5_query_nic_vport_qkey_viol_cntr() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f0b50730bdd8f2734e548de541e845c0d40dceb6 +Author: Wentao Liang +Date: Wed May 21 21:36:20 2025 +0800 + + net/mlx5_core: Add error handling inmlx5_query_nic_vport_qkey_viol_cntr() + + The function mlx5_query_nic_vport_qkey_viol_cntr() calls the function + mlx5_query_nic_vport_context() but does not check its return value. This + could lead to undefined behavior if the query fails. A proper + implementation can be found in mlx5_nic_vport_query_local_lb(). + + Add error handling for mlx5_query_nic_vport_context(). If it fails, free + the out buffer via kvfree() and return error code. + + Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields") + Cc: stable@vger.kernel.org # v4.5 + Signed-off-by: Wentao Liang + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20250521133620.912-1-vulab@iscas.ac.cn + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +index d10d4c396040..a3c57bb8b521 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +@@ -519,19 +519,22 @@ int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev, + { + u32 *out; + int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out); ++ int err; + + out = kvzalloc(outlen, GFP_KERNEL); + if (!out) + return -ENOMEM; + +- mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, 0, out); ++ if (err) ++ goto out; + + *qkey_viol_cntr = MLX5_GET(query_nic_vport_context_out, out, + nic_vport_context.qkey_violation_counter); +- ++out: + kvfree(out); + +- return 0; ++ return err; + } + EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_qkey_viol_cntr); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1381-net-mlx5e-allow-setting-mac-address-of-representors.patch b/SOURCES/1381-net-mlx5e-allow-setting-mac-address-of-representors.patch new file mode 100644 index 000000000..e9a3d4d64 --- /dev/null +++ b/SOURCES/1381-net-mlx5e-allow-setting-mac-address-of-representors.patch @@ -0,0 +1,41 @@ +From 8c0cb6c3c0865ac0421987596b2b3d90ec2f2510 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:08 -0400 +Subject: [PATCH] net/mlx5e: Allow setting MAC address of representors + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f95633adc177416ac21f16db9ce1e75c74db805a +Author: Mark Bloch +Date: Thu May 22 10:13:56 2025 +0300 + + net/mlx5e: Allow setting MAC address of representors + + A representor netdev does not correspond to real hardware that needs to + be updated when setting the MAC address. The default eth_mac_addr() is + sufficient for simply updating the netdev's MAC address with validation. + + Signed-off-by: Mark Bloch + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1747898036-1121904-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +index 58cd153ccc61..2640cace0f76 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +@@ -803,6 +803,7 @@ static const struct net_device_ops mlx5e_netdev_ops_rep = { + .ndo_stop = mlx5e_rep_close, + .ndo_start_xmit = mlx5e_xmit, + .ndo_setup_tc = mlx5e_rep_setup_tc, ++ .ndo_set_mac_address = eth_mac_addr, + .ndo_get_stats64 = mlx5e_rep_get_stats, + .ndo_has_offload_stats = mlx5e_rep_has_offload_stats, + .ndo_get_offload_stats = mlx5e_rep_get_offload_stats, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1382-net-mlx5-add-error-handling-in-mlx5-query-nic-vport-node-gui.patch b/SOURCES/1382-net-mlx5-add-error-handling-in-mlx5-query-nic-vport-node-gui.patch new file mode 100644 index 000000000..22f4c58ec --- /dev/null +++ b/SOURCES/1382-net-mlx5-add-error-handling-in-mlx5-query-nic-vport-node-gui.patch @@ -0,0 +1,63 @@ +From dd1d14ed8fcc14c857f5bcd97ad842a291d390e6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:08 -0400 +Subject: [PATCH] net/mlx5: Add error handling in + mlx5_query_nic_vport_node_guid() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c6bb8a21cdad8c975a3a646b9e5c8df01ad29783 +Author: Wentao Liang +Date: Sun May 25 00:34:25 2025 +0800 + + net/mlx5: Add error handling in mlx5_query_nic_vport_node_guid() + + The function mlx5_query_nic_vport_node_guid() calls the function + mlx5_query_nic_vport_context() but does not check its return value. + A proper implementation can be found in mlx5_nic_vport_query_local_lb(). + + Add error handling for mlx5_query_nic_vport_context(). If it fails, free + the out buffer via kvfree() and return error code. + + Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields") + Cc: stable@vger.kernel.org # v4.5 + Signed-off-by: Wentao Liang + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20250524163425.1695-1-vulab@iscas.ac.cn + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +index a3c57bb8b521..da5c24fc7b30 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +@@ -465,19 +465,22 @@ int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid) + { + u32 *out; + int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out); ++ int err; + + out = kvzalloc(outlen, GFP_KERNEL); + if (!out) + return -ENOMEM; + +- mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, 0, out); ++ if (err) ++ goto out; + + *node_guid = MLX5_GET64(query_nic_vport_context_out, out, + nic_vport_context.node_guid); +- ++out: + kvfree(out); + +- return 0; ++ return err; + } + EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_node_guid); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1383-net-mlx5-hws-fix-an-error-code-in-mlx5hws-bwc-rule-create-co.patch b/SOURCES/1383-net-mlx5-hws-fix-an-error-code-in-mlx5hws-bwc-rule-create-co.patch new file mode 100644 index 000000000..440f0e3ca --- /dev/null +++ b/SOURCES/1383-net-mlx5-hws-fix-an-error-code-in-mlx5hws-bwc-rule-create-co.patch @@ -0,0 +1,43 @@ +From e2f7050c9a027152fc3001519b825cfd333320c0 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:08 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix an error code in + mlx5hws_bwc_rule_create_complex() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a540ee75945a96f606c6ac955bfed5410d318f7d +Author: Dan Carpenter +Date: Fri May 23 19:00:12 2025 +0300 + + net/mlx5: HWS, Fix an error code in mlx5hws_bwc_rule_create_complex() + + This was intended to be negative -ENOMEM but the '-' character was left + off accidentally. This typo doesn't affect runtime because the caller + treats all non-zero returns the same. + + Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers") + Signed-off-by: Dan Carpenter + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/aDCbjNcquNC68Hyj@stanley.mountain + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +index 5d30c5b094fc..70768953a4f6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +@@ -1188,7 +1188,7 @@ int mlx5hws_bwc_rule_create_complex(struct mlx5hws_bwc_rule *bwc_rule, + GFP_KERNEL); + if (unlikely(!match_buf_2)) { + mlx5hws_err(ctx, "Complex rule: failed allocating match_buf\n"); +- ret = ENOMEM; ++ ret = -ENOMEM; + goto hash_node_put; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1384-net-mlx5-ensure-fw-pages-are-always-allocated-on-same-numa.patch b/SOURCES/1384-net-mlx5-ensure-fw-pages-are-always-allocated-on-same-numa.patch new file mode 100644 index 000000000..bfad78e2b --- /dev/null +++ b/SOURCES/1384-net-mlx5-ensure-fw-pages-are-always-allocated-on-same-numa.patch @@ -0,0 +1,45 @@ +From 15be556a20263870bb8b7e0974d3452f4d4b3616 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:08 -0400 +Subject: [PATCH] net/mlx5: Ensure fw pages are always allocated on same NUMA + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f37258133c1e95e61db532e14067e28b4881bf24 +Author: Moshe Shemesh +Date: Tue Jun 10 18:15:06 2025 +0300 + + net/mlx5: Ensure fw pages are always allocated on same NUMA + + When firmware asks the driver to allocate more pages, using event of + give_pages, the driver should always allocate it from same NUMA, the + original device NUMA. Current code uses dev_to_node() which can result + in different NUMA as it is changed by other driver flows, such as + mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for + allocating firmware pages. + + Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node") + Signed-off-by: Moshe Shemesh + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250610151514.1094735-2-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +index 972e8e9df585..9bc9bd83c232 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +@@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function) + static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) + { + struct device *device = mlx5_core_dma_dev(dev); +- int nid = dev_to_node(device); ++ int nid = dev->priv.numa_node; + struct page *page; + u64 zero_addr = 1; + u64 addr; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1385-net-mlx5-fix-return-value-when-searching-for-existing-flow-g.patch b/SOURCES/1385-net-mlx5-fix-return-value-when-searching-for-existing-flow-g.patch new file mode 100644 index 000000000..92e4328e8 --- /dev/null +++ b/SOURCES/1385-net-mlx5-fix-return-value-when-searching-for-existing-flow-g.patch @@ -0,0 +1,64 @@ +From 92a46cdb282adb03ef6c38e4e163528e9a19f3d7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:09 -0400 +Subject: [PATCH] net/mlx5: Fix return value when searching for existing flow + group + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 8ec40e3f1f72bf8f8accf18020d487caa99f46a4 +Author: Patrisious Haddad +Date: Tue Jun 10 18:15:08 2025 +0300 + + net/mlx5: Fix return value when searching for existing flow group + + When attempting to add a rule to an existing flow group, if a matching + flow group exists but is not active, the error code returned should be + EAGAIN, so that the rule can be added to the matching flow group once + it is active, rather than ENOENT, which indicates that no matching + flow group was found. + + Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") + Signed-off-by: Gavi Teitz + Signed-off-by: Roi Dayan + Signed-off-by: Patrisious Haddad + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250610151514.1094735-4-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index c330b64a506b..5f0f546fa126 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -2228,6 +2228,7 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft, + struct mlx5_flow_handle *rule; + struct match_list *iter; + bool take_write = false; ++ bool try_again = false; + struct fs_fte *fte; + u64 version = 0; + int err; +@@ -2292,6 +2293,7 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft, + nested_down_write_ref_node(&g->node, FS_LOCK_PARENT); + + if (!g->node.active) { ++ try_again = true; + up_write_ref_node(&g->node, false); + continue; + } +@@ -2313,7 +2315,8 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft, + tree_put_node(&fte->node, false); + return rule; + } +- rule = ERR_PTR(-ENOENT); ++ err = try_again ? -EAGAIN : -ENOENT; ++ rule = ERR_PTR(err); + out: + kmem_cache_free(steering->ftes_cache, fte); + return rule; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1386-net-mlx5-hws-init-mutex-on-the-correct-path.patch b/SOURCES/1386-net-mlx5-hws-init-mutex-on-the-correct-path.patch new file mode 100644 index 000000000..c6f7e023d --- /dev/null +++ b/SOURCES/1386-net-mlx5-hws-init-mutex-on-the-correct-path.patch @@ -0,0 +1,51 @@ +From 3d6361f488b417481996a88647f6d958c09986d9 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:09 -0400 +Subject: [PATCH] net/mlx5: HWS, Init mutex on the correct path + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a002602676cdae0c9996adb75b9310559b718a93 +Author: Vlad Dogaru +Date: Tue Jun 10 18:15:09 2025 +0300 + + net/mlx5: HWS, Init mutex on the correct path + + The newly introduced mutex is only used for reformat actions, but it was + initialized for modify header instead. + + The struct that contains the mutex is zero-initialized and an all-zero + mutex is valid, so the issue only shows up with CONFIG_DEBUG_MUTEXES. + + Fixes: b206d9ec19df ("net/mlx5: HWS, register reformat actions with fw") + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250610151514.1094735-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +index 9d1c0e4b224a..372e2be90706 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +@@ -1357,6 +1357,7 @@ mlx5_cmd_hws_packet_reformat_alloc(struct mlx5_flow_root_namespace *ns, + pkt_reformat->fs_hws_action.pr_data = pr_data; + } + ++ mutex_init(&pkt_reformat->fs_hws_action.lock); + pkt_reformat->owner = MLX5_FLOW_RESOURCE_OWNER_HWS; + pkt_reformat->fs_hws_action.hws_action = hws_action; + return 0; +@@ -1503,7 +1504,6 @@ static int mlx5_cmd_hws_modify_header_alloc(struct mlx5_flow_root_namespace *ns, + err = -ENOMEM; + goto release_mh; + } +- mutex_init(&modify_hdr->fs_hws_action.lock); + modify_hdr->fs_hws_action.mh_data = mh_data; + modify_hdr->fs_hws_action.fs_pool = pool; + modify_hdr->owner = MLX5_FLOW_RESOURCE_OWNER_SW; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1387-net-mlx5-hws-fix-missing-ip-version-handling-in-definer.patch b/SOURCES/1387-net-mlx5-hws-fix-missing-ip-version-handling-in-definer.patch new file mode 100644 index 000000000..088cabf7c --- /dev/null +++ b/SOURCES/1387-net-mlx5-hws-fix-missing-ip-version-handling-in-definer.patch @@ -0,0 +1,40 @@ +From ba815e2e82f00d7211164de9c0409a7e53173ba8 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:10 -0400 +Subject: [PATCH] net/mlx5: HWS, fix missing ip_version handling in definer + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b5e3c76f35ee7e814c2469c73406c5bbf110d89c +Author: Yevgeny Kliteynik +Date: Tue Jun 10 18:15:10 2025 +0300 + + net/mlx5: HWS, fix missing ip_version handling in definer + + Fix missing field handling in definer - outer IP version. + + Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling") + Signed-off-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250610151514.1094735-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +index 5cc0dc002ac1..d45e1145d197 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +@@ -785,6 +785,9 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + HWS_SET_HDR(fc, match_param, IP_PROTOCOL_O, + outer_headers.ip_protocol, + eth_l3_outer.protocol_next_header); ++ HWS_SET_HDR(fc, match_param, IP_VERSION_O, ++ outer_headers.ip_version, ++ eth_l3_outer.ip_version); + HWS_SET_HDR(fc, match_param, IP_TTL_O, + outer_headers.ttl_hoplimit, + eth_l3_outer.time_to_live_hop_limit); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1388-net-mlx5-hws-make-sure-the-uplink-is-the-last-destination.patch b/SOURCES/1388-net-mlx5-hws-make-sure-the-uplink-is-the-last-destination.patch new file mode 100644 index 000000000..c81117b64 --- /dev/null +++ b/SOURCES/1388-net-mlx5-hws-make-sure-the-uplink-is-the-last-destination.patch @@ -0,0 +1,98 @@ +From cb2e3bb95bee469a17238ac30011a055ff897886 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:10 -0400 +Subject: [PATCH] net/mlx5: HWS, make sure the uplink is the last destination + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b8335829518ec5988294280e37d735799209d70d +Author: Vlad Dogaru +Date: Tue Jun 10 18:15:11 2025 +0300 + + net/mlx5: HWS, make sure the uplink is the last destination + + When there are more than one destinations, we create a FW flow + table and provide it with all the destinations. FW requires to + have wire as the last destination in the list (if it exists), + otherwise the operation fails with FW syndrome. + + This patch fixes the destination array action creation: if it + contains a wire destination, it is moved to the end. + + Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling") + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250610151514.1094735-7-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index fb62f3bc4bd4..447ea3f8722c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1370,8 +1370,8 @@ mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, + struct mlx5hws_cmd_set_fte_attr fte_attr = {0}; + struct mlx5hws_cmd_forward_tbl *fw_island; + struct mlx5hws_action *action; +- u32 i /*, packet_reformat_id*/; +- int ret; ++ int ret, last_dest_idx = -1; ++ u32 i; + + if (num_dest <= 1) { + mlx5hws_err(ctx, "Action must have multiple dests\n"); +@@ -1401,11 +1401,8 @@ mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, + dest_list[i].destination_id = dests[i].dest->dest_obj.obj_id; + fte_attr.action_flags |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + fte_attr.ignore_flow_level = ignore_flow_level; +- /* ToDo: In SW steering we have a handling of 'go to WIRE' +- * destination here by upper layer setting 'is_wire_ft' flag +- * if the destination is wire. +- * This is because uplink should be last dest in the list. +- */ ++ if (dests[i].is_wire_ft) ++ last_dest_idx = i; + break; + case MLX5HWS_ACTION_TYP_VPORT: + dest_list[i].destination_type = MLX5_FLOW_DESTINATION_TYPE_VPORT; +@@ -1429,6 +1426,9 @@ mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, + } + } + ++ if (last_dest_idx != -1) ++ swap(dest_list[last_dest_idx], dest_list[num_dest - 1]); ++ + fte_attr.dests_num = num_dest; + fte_attr.dests = dest_list; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +index 372e2be90706..bf4643d0ce17 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +@@ -966,6 +966,9 @@ static int mlx5_fs_fte_get_hws_actions(struct mlx5_flow_root_namespace *ns, + switch (attr->type) { + case MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE: + dest_action = mlx5_fs_get_dest_action_ft(fs_ctx, dst); ++ if (dst->dest_attr.ft->flags & ++ MLX5_FLOW_TABLE_UPLINK_VPORT) ++ dest_actions[num_dest_actions].is_wire_ft = true; + break; + case MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE_NUM: + dest_action = mlx5_fs_get_dest_action_table_num(fs_ctx, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index 9bbadc4d8a0b..d8ac6c196211 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -213,6 +213,7 @@ struct mlx5hws_action_dest_attr { + struct mlx5hws_action *dest; + /* Optional reformat action */ + struct mlx5hws_action *reformat; ++ bool is_wire_ft; + }; + + /** +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1389-net-mlx5e-fix-leak-of-geneve-tlv-option-object.patch b/SOURCES/1389-net-mlx5e-fix-leak-of-geneve-tlv-option-object.patch new file mode 100644 index 000000000..fd9cd7cce --- /dev/null +++ b/SOURCES/1389-net-mlx5e-fix-leak-of-geneve-tlv-option-object.patch @@ -0,0 +1,81 @@ +From 7af56f3f1bdf8cc9ba3e4a85456cedb7a69a8b86 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:10 -0400 +Subject: [PATCH] net/mlx5e: Fix leak of Geneve TLV option object + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit aa9c44b842096c553871bc68a8cebc7861fa192b +Author: Jianbo Liu +Date: Tue Jun 10 18:15:13 2025 +0300 + + net/mlx5e: Fix leak of Geneve TLV option object + + Previously, a unique tunnel id was added for the matching on TC + non-zero chains, to support inner header rewrite with goto action. + Later, it was used to support VF tunnel offload for vxlan, then for + Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is + used to parse tunnel options. For Geneve, if there is TLV option, a + object is created, or refcnt is added if already exists. But the + temporary mlx5_flow_spec is directly freed after parsing, which causes + the leak because no information regarding the object is saved in + flow's mlx5_flow_spec, which is used to free the object when deleting + the flow. + + To fix the leak, call mlx5_geneve_tlv_option_del() before free the + temporary spec if it has TLV object. + + Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload") + Signed-off-by: Jianbo Liu + Reviewed-by: Tariq Toukan + Reviewed-by: Alex Lazar + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250610151514.1094735-9-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index f1d908f61134..fef418e1ed1a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -2028,9 +2028,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, + return err; + } + +-static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow) ++static bool mlx5_flow_has_geneve_opt(struct mlx5_flow_spec *spec) + { +- struct mlx5_flow_spec *spec = &flow->attr->parse_attr->spec; + void *headers_v = MLX5_ADDR_OF(fte_match_param, + spec->match_value, + misc_parameters_3); +@@ -2069,7 +2068,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, + } + complete_all(&flow->del_hw_done); + +- if (mlx5_flow_has_geneve_opt(flow)) ++ if (mlx5_flow_has_geneve_opt(&attr->parse_attr->spec)) + mlx5_geneve_tlv_option_del(priv->mdev->geneve); + + if (flow->decap_route) +@@ -2574,12 +2573,13 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv, + + err = mlx5e_tc_tun_parse(filter_dev, priv, tmp_spec, f, match_level); + if (err) { +- kvfree(tmp_spec); + NL_SET_ERR_MSG_MOD(extack, "Failed to parse tunnel attributes"); + netdev_warn(priv->netdev, "Failed to parse tunnel attributes"); +- return err; ++ } else { ++ err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec); + } +- err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec); ++ if (mlx5_flow_has_geneve_opt(tmp_spec)) ++ mlx5_geneve_tlv_option_del(priv->mdev->geneve); + kvfree(tmp_spec); + if (err) + return err; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1390-net-mlx5-hws-add-error-checking-to-hws-bwc-rule-complex-hash.patch b/SOURCES/1390-net-mlx5-hws-add-error-checking-to-hws-bwc-rule-complex-hash.patch new file mode 100644 index 000000000..b62748478 --- /dev/null +++ b/SOURCES/1390-net-mlx5-hws-add-error-checking-to-hws-bwc-rule-complex-hash.patch @@ -0,0 +1,78 @@ +From c4f4fb210193420862f49e61b6a6865512ee9636 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:10 -0400 +Subject: [PATCH] net/mlx5: HWS, Add error checking to + hws_bwc_rule_complex_hash_node_get() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1619bdf4389c829f16af5c7d5b4fa5f1673614d7 +Author: Dan Carpenter +Date: Wed Jun 11 16:14:32 2025 +0300 + + net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get() + + Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails. + + Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers") + Signed-off-by: Dan Carpenter + Reviewed-by: Yevgeny Kliteynik + Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +index 70768953a4f6..ca7501c57468 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +@@ -1070,7 +1070,7 @@ hws_bwc_rule_complex_hash_node_get(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; + struct mlx5hws_bwc_complex_rule_hash_node *node, *old_node; + struct rhashtable *refcount_hash; +- int i; ++ int ret, i; + + bwc_rule->complex_hash_node = NULL; + +@@ -1078,7 +1078,11 @@ hws_bwc_rule_complex_hash_node_get(struct mlx5hws_bwc_rule *bwc_rule, + if (unlikely(!node)) + return -ENOMEM; + +- node->tag = ida_alloc(&bwc_matcher->complex->metadata_ida, GFP_KERNEL); ++ ret = ida_alloc(&bwc_matcher->complex->metadata_ida, GFP_KERNEL); ++ if (ret < 0) ++ goto err_free_node; ++ node->tag = ret; ++ + refcount_set(&node->refcount, 1); + + /* Clear match buffer - turn off all the unrelated fields +@@ -1094,6 +1098,11 @@ hws_bwc_rule_complex_hash_node_get(struct mlx5hws_bwc_rule *bwc_rule, + old_node = rhashtable_lookup_get_insert_fast(refcount_hash, + &node->hash_node, + hws_refcount_hash); ++ if (IS_ERR(old_node)) { ++ ret = PTR_ERR(old_node); ++ goto err_free_ida; ++ } ++ + if (old_node) { + /* Rule with the same tag already exists - update refcount */ + refcount_inc(&old_node->refcount); +@@ -1112,6 +1121,12 @@ hws_bwc_rule_complex_hash_node_get(struct mlx5hws_bwc_rule *bwc_rule, + + bwc_rule->complex_hash_node = node; + return 0; ++ ++err_free_ida: ++ ida_free(&bwc_matcher->complex->metadata_ida, node->tag); ++err_free_node: ++ kfree(node); ++ return ret; + } + + static void +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1391-net-mlx5e-fix-race-between-dim-disable-and-net-dim.patch b/SOURCES/1391-net-mlx5e-fix-race-between-dim-disable-and-net-dim.patch new file mode 100644 index 000000000..129ad9337 --- /dev/null +++ b/SOURCES/1391-net-mlx5e-fix-race-between-dim-disable-and-net-dim.patch @@ -0,0 +1,101 @@ +From ad54bc890c4892fa93557d437d96dd787a30b98b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:11 -0400 +Subject: [PATCH] net/mlx5e: Fix race between DIM disable and net_dim() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit eb41a264a3a576dc040ee37c3d9d6b7e2d9be968 +Author: Carolina Jubran +Date: Thu Jul 10 16:53:43 2025 +0300 + + net/mlx5e: Fix race between DIM disable and net_dim() + + There's a race between disabling DIM and NAPI callbacks using the dim + pointer on the RQ or SQ. + + If NAPI checks the DIM state bit and sees it still set, it assumes + `rq->dim` or `sq->dim` is valid. But if DIM gets disabled right after + that check, the pointer might already be set to NULL, leading to a NULL + pointer dereference in net_dim(). + + Fix this by calling `synchronize_net()` before freeing the DIM context. + This ensures all in-progress NAPI callbacks are finished before the + pointer is cleared. + + Kernel log: + + BUG: kernel NULL pointer dereference, address: 0000000000000000 + ... + RIP: 0010:net_dim+0x23/0x190 + ... + Call Trace: + + ? __die+0x20/0x60 + ? page_fault_oops+0x150/0x3e0 + ? common_interrupt+0xf/0xa0 + ? sysvec_call_function_single+0xb/0x90 + ? exc_page_fault+0x74/0x130 + ? asm_exc_page_fault+0x22/0x30 + ? net_dim+0x23/0x190 + ? mlx5e_poll_ico_cq+0x41/0x6f0 [mlx5_core] + ? sysvec_apic_timer_interrupt+0xb/0x90 + mlx5e_handle_rx_dim+0x92/0xd0 [mlx5_core] + mlx5e_napi_poll+0x2cd/0xac0 [mlx5_core] + ? mlx5e_poll_ico_cq+0xe5/0x6f0 [mlx5_core] + busy_poll_stop+0xa2/0x200 + ? mlx5e_napi_poll+0x1d9/0xac0 [mlx5_core] + ? mlx5e_trigger_irq+0x130/0x130 [mlx5_core] + __napi_busy_loop+0x345/0x3b0 + ? sysvec_call_function_single+0xb/0x90 + ? asm_sysvec_call_function_single+0x16/0x20 + ? sysvec_apic_timer_interrupt+0xb/0x90 + ? pcpu_free_area+0x1e4/0x2e0 + napi_busy_loop+0x11/0x20 + xsk_recvmsg+0x10c/0x130 + sock_recvmsg+0x44/0x70 + __sys_recvfrom+0xbc/0x130 + ? __schedule+0x398/0x890 + __x64_sys_recvfrom+0x20/0x30 + do_syscall_64+0x4c/0x100 + entry_SYSCALL_64_after_hwframe+0x4b/0x53 + ... + ---[ end trace 0000000000000000 ]--- + ... + ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- + + Fixes: 445a25f6e1a2 ("net/mlx5e: Support updating coalescing configuration without resetting channels") + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Reviewed-by: Jacob Keller + Link: https://patch.msgid.link/1752155624-24095-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c +index 298bb74ec5e9..d1d629697e28 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c +@@ -113,7 +113,7 @@ int mlx5e_dim_rx_change(struct mlx5e_rq *rq, bool enable) + __set_bit(MLX5E_RQ_STATE_DIM, &rq->state); + } else { + __clear_bit(MLX5E_RQ_STATE_DIM, &rq->state); +- ++ synchronize_net(); + mlx5e_dim_disable(rq->dim); + rq->dim = NULL; + } +@@ -140,7 +140,7 @@ int mlx5e_dim_tx_change(struct mlx5e_txqsq *sq, bool enable) + __set_bit(MLX5E_SQ_STATE_DIM, &sq->state); + } else { + __clear_bit(MLX5E_SQ_STATE_DIM, &sq->state); +- ++ synchronize_net(); + mlx5e_dim_disable(sq->dim); + sq->dim = NULL; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1392-net-mlx5e-add-new-prio-for-promiscuous-mode.patch b/SOURCES/1392-net-mlx5e-add-new-prio-for-promiscuous-mode.patch new file mode 100644 index 000000000..65eb21511 --- /dev/null +++ b/SOURCES/1392-net-mlx5e-add-new-prio-for-promiscuous-mode.patch @@ -0,0 +1,116 @@ +From bd88819cc929d1193d943400f4977a8f573b5d15 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:11 -0400 +Subject: [PATCH] net/mlx5e: Add new prio for promiscuous mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 4c9fce56fa702059bbc5ab737265b68f79cbaac4 +Author: Jianbo Liu +Date: Thu Jul 10 16:53:44 2025 +0300 + + net/mlx5e: Add new prio for promiscuous mode + + An optimization for promiscuous mode adds a high-priority steering + table with a single catch-all rule to steer all traffic directly to + the TTC table. + + However, a gap exists between the creation of this table and the + insertion of the catch-all rule. Packets arriving in this brief window + would miss as no rule was inserted yet, unnecessarily incrementing the + 'rx_steer_missed_packets' counter and dropped. + + This patch resolves the issue by introducing a new prio for this + table, placing it between MLX5E_TC_PRIO and MLX5E_NIC_PRIO. By doing + so, packets arriving during the window now fall through to the next + prio (at MLX5E_NIC_PRIO) instead of being dropped. + + Fixes: 1c46d7409f30 ("net/mlx5e: Optimize promiscuous mode") + Signed-off-by: Jianbo Liu + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Jacob Keller + Link: https://patch.msgid.link/1752155624-24095-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +index b5c3a2a9d2a5..9560fcba643f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +@@ -18,7 +18,8 @@ enum { + + enum { + MLX5E_TC_PRIO = 0, +- MLX5E_NIC_PRIO ++ MLX5E_PROMISC_PRIO, ++ MLX5E_NIC_PRIO, + }; + + struct mlx5e_flow_table { +@@ -68,9 +69,13 @@ struct mlx5e_l2_table { + MLX5_HASH_FIELD_SEL_DST_IP |\ + MLX5_HASH_FIELD_SEL_IPSEC_SPI) + +-/* NIC prio FTS */ ++/* NIC promisc FT level */ + enum { + MLX5E_PROMISC_FT_LEVEL, ++}; ++ ++/* NIC prio FTS */ ++enum { + MLX5E_VLAN_FT_LEVEL, + MLX5E_L2_FT_LEVEL, + MLX5E_TTC_FT_LEVEL, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +index 05058710d2c7..537e732085b2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +@@ -776,7 +776,7 @@ static int mlx5e_create_promisc_table(struct mlx5e_flow_steering *fs) + ft_attr.max_fte = MLX5E_PROMISC_TABLE_SIZE; + ft_attr.autogroup.max_num_groups = 1; + ft_attr.level = MLX5E_PROMISC_FT_LEVEL; +- ft_attr.prio = MLX5E_NIC_PRIO; ++ ft_attr.prio = MLX5E_PROMISC_PRIO; + + ft->t = mlx5_create_auto_grouped_flow_table(fs->ns, &ft_attr); + if (IS_ERR(ft->t)) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 5f0f546fa126..b29e67466701 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -113,13 +113,16 @@ + #define ETHTOOL_PRIO_NUM_LEVELS 1 + #define ETHTOOL_NUM_PRIOS 11 + #define ETHTOOL_MIN_LEVEL (KERNEL_MIN_LEVEL + ETHTOOL_NUM_PRIOS) +-/* Promiscuous, Vlan, mac, ttc, inner ttc, {UDP/ANY/aRFS/accel/{esp, esp_err}}, IPsec policy, ++/* Vlan, mac, ttc, inner ttc, {UDP/ANY/aRFS/accel/{esp, esp_err}}, IPsec policy, + * {IPsec RoCE MPV,Alias table},IPsec RoCE policy + */ +-#define KERNEL_NIC_PRIO_NUM_LEVELS 11 ++#define KERNEL_NIC_PRIO_NUM_LEVELS 10 + #define KERNEL_NIC_NUM_PRIOS 1 +-/* One more level for tc */ +-#define KERNEL_MIN_LEVEL (KERNEL_NIC_PRIO_NUM_LEVELS + 1) ++/* One more level for tc, and one more for promisc */ ++#define KERNEL_MIN_LEVEL (KERNEL_NIC_PRIO_NUM_LEVELS + 2) ++ ++#define KERNEL_NIC_PROMISC_NUM_PRIOS 1 ++#define KERNEL_NIC_PROMISC_NUM_LEVELS 1 + + #define KERNEL_NIC_TC_NUM_PRIOS 1 + #define KERNEL_NIC_TC_NUM_LEVELS 3 +@@ -187,6 +190,8 @@ static struct init_tree_node { + ADD_NS(MLX5_FLOW_TABLE_MISS_ACTION_DEF, + ADD_MULTIPLE_PRIO(KERNEL_NIC_TC_NUM_PRIOS, + KERNEL_NIC_TC_NUM_LEVELS), ++ ADD_MULTIPLE_PRIO(KERNEL_NIC_PROMISC_NUM_PRIOS, ++ KERNEL_NIC_PROMISC_NUM_LEVELS), + ADD_MULTIPLE_PRIO(KERNEL_NIC_NUM_PRIOS, + KERNEL_NIC_PRIO_NUM_LEVELS))), + ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0, FS_CHAINING_CAPS, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1393-net-mlx5-correctly-set-gso-size-when-lro-is-used.patch b/SOURCES/1393-net-mlx5-correctly-set-gso-size-when-lro-is-used.patch new file mode 100644 index 000000000..1a3a2c4e0 --- /dev/null +++ b/SOURCES/1393-net-mlx5-correctly-set-gso-size-when-lro-is-used.patch @@ -0,0 +1,86 @@ +From 07084a10593c81a6071d6f045927f1c2f52ac5b3 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:11 -0400 +Subject: [PATCH] net/mlx5: Correctly set gso_size when LRO is used + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 531d0d32de3e1b6b77a87bd37de0c2c6e17b496a +Author: Christoph Paasch +Date: Tue Jul 15 13:20:53 2025 -0700 + + net/mlx5: Correctly set gso_size when LRO is used + + gso_size is expected by the networking stack to be the size of the + payload (thus, not including ethernet/IP/TCP-headers). However, cqe_bcnt + is the full sized frame (including the headers). Dividing cqe_bcnt by + lro_num_seg will then give incorrect results. + + For example, running a bpftrace higher up in the TCP-stack + (tcp_event_data_recv), we commonly have gso_size set to 1450 or 1451 even + though in reality the payload was only 1448 bytes. + + This can have unintended consequences: + - In tcp_measure_rcv_mss() len will be for example 1450, but. rcv_mss + will be 1448 (because tp->advmss is 1448). Thus, we will always + recompute scaling_ratio each time an LRO-packet is received. + - In tcp_gro_receive(), it will interfere with the decision whether or + not to flush and thus potentially result in less gro'ed packets. + + So, we need to discount the protocol headers from cqe_bcnt so we can + actually divide the payload by lro_num_seg to get the real gso_size. + + v2: + - Use "(unsigned char *)tcp + tcp->doff * 4 - skb->data)" to compute header-len + (Tariq Toukan ) + - Improve commit-message (Gal Pressman ) + + Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") + Signed-off-by: Christoph Paasch + Reviewed-by: Tariq Toukan + Reviewed-by: Gal Pressman + Link: https://patch.msgid.link/20250715-cpaasch-pf-925-investigate-incorrect-gso_size-on-cx-7-nic-v2-1-e06c3475f3ac@openai.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 12ca0a3e8514..382679838113 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -1156,8 +1156,9 @@ static void mlx5e_lro_update_tcp_hdr(struct mlx5_cqe64 *cqe, struct tcphdr *tcp) + } + } + +-static void mlx5e_lro_update_hdr(struct sk_buff *skb, struct mlx5_cqe64 *cqe, +- u32 cqe_bcnt) ++static unsigned int mlx5e_lro_update_hdr(struct sk_buff *skb, ++ struct mlx5_cqe64 *cqe, ++ u32 cqe_bcnt) + { + struct ethhdr *eth = (struct ethhdr *)(skb->data); + struct tcphdr *tcp; +@@ -1207,6 +1208,8 @@ static void mlx5e_lro_update_hdr(struct sk_buff *skb, struct mlx5_cqe64 *cqe, + tcp->check = tcp_v6_check(payload_len, &ipv6->saddr, + &ipv6->daddr, check); + } ++ ++ return (unsigned int)((unsigned char *)tcp + tcp->doff * 4 - skb->data); + } + + static void *mlx5e_shampo_get_packet_hd(struct mlx5e_rq *rq, u16 header_index) +@@ -1563,8 +1566,9 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, + mlx5e_macsec_offload_handle_rx_skb(netdev, skb, cqe); + + if (lro_num_seg > 1) { +- mlx5e_lro_update_hdr(skb, cqe, cqe_bcnt); +- skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt, lro_num_seg); ++ unsigned int hdrlen = mlx5e_lro_update_hdr(skb, cqe, cqe_bcnt); ++ ++ skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt - hdrlen, lro_num_seg); + /* Subtract one since we already counted this as one + * "regular" packet in mlx5e_complete_rx_cqe() + */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1394-net-mlx5-fix-memory-leak-in-cmd-exec.patch b/SOURCES/1394-net-mlx5-fix-memory-leak-in-cmd-exec.patch new file mode 100644 index 000000000..165f31566 --- /dev/null +++ b/SOURCES/1394-net-mlx5-fix-memory-leak-in-cmd-exec.patch @@ -0,0 +1,49 @@ +From 0e6b8d1b57695231dc16f426ca329c7fc415ff43 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:12 -0400 +Subject: [PATCH] net/mlx5: Fix memory leak in cmd_exec() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3afa3ae3db52e3c216d77bd5907a5a86833806cc +Author: Chiara Meiohas +Date: Thu Jul 17 15:06:09 2025 +0300 + + net/mlx5: Fix memory leak in cmd_exec() + + If cmd_exec() is called with callback and mlx5_cmd_invoke() returns an + error, resources allocated in cmd_exec() will not be freed. + + Fix the code to release the resources if mlx5_cmd_invoke() returns an + error. + + Fixes: f086470122d5 ("net/mlx5: cmdif, Return value improvements") + Reported-by: Alex Tereshkin + Signed-off-by: Chiara Meiohas + Reviewed-by: Moshe Shemesh + Signed-off-by: Vlad Dumitrescu + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1752753970-261832-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +index b1aeea7c4a91..e395ef5f356e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +@@ -1947,8 +1947,8 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out, + + err = mlx5_cmd_invoke(dev, inb, outb, out, out_size, callback, context, + pages_queue, token, force_polling); +- if (callback) +- return err; ++ if (callback && !err) ++ return 0; + + if (err > 0) /* Failed in FW, command didn't execute */ + err = deliv_status_to_err(err); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1395-net-mlx5-e-switch-fix-peer-miss-rules-to-use-peer-eswitch.patch b/SOURCES/1395-net-mlx5-e-switch-fix-peer-miss-rules-to-use-peer-eswitch.patch new file mode 100644 index 000000000..f806b906a --- /dev/null +++ b/SOURCES/1395-net-mlx5-e-switch-fix-peer-miss-rules-to-use-peer-eswitch.patch @@ -0,0 +1,247 @@ +From 52b9ece6153bacd7511923119dae3562495686df Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:42:12 -0400 +Subject: [PATCH] net/mlx5: E-Switch, Fix peer miss rules to use peer eswitch + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5b4c56ad4da0aa00b258ab50b1f5775b7d3108c7 +Author: Shahar Shitrit +Date: Thu Jul 17 15:06:10 2025 +0300 + + net/mlx5: E-Switch, Fix peer miss rules to use peer eswitch + + In the original design, it is assumed local and peer eswitches have the + same number of vfs. However, in new firmware, local and peer eswitches + can have different number of vfs configured by mlxconfig. In such + configuration, it is incorrect to derive the number of vfs from the + local device's eswitch. + + Fix this by updating the peer miss rules add and delete functions to use + the peer device's eswitch and vf count instead of the local device's + information, ensuring correct behavior regardless of vf configuration + differences. + + Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") + Signed-off-by: Shahar Shitrit + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1752753970-261832-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 0e3a977d5332..bee906661282 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -1182,19 +1182,19 @@ static void esw_set_peer_miss_rule_source_port(struct mlx5_eswitch *esw, + static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + struct mlx5_core_dev *peer_dev) + { ++ struct mlx5_eswitch *peer_esw = peer_dev->priv.eswitch; + struct mlx5_flow_destination dest = {}; + struct mlx5_flow_act flow_act = {0}; + struct mlx5_flow_handle **flows; +- /* total vports is the same for both e-switches */ +- int nvports = esw->total_vports; + struct mlx5_flow_handle *flow; ++ struct mlx5_vport *peer_vport; + struct mlx5_flow_spec *spec; +- struct mlx5_vport *vport; + int err, pfindex; + unsigned long i; + void *misc; + +- if (!MLX5_VPORT_MANAGER(esw->dev) && !mlx5_core_is_ecpf_esw_manager(esw->dev)) ++ if (!MLX5_VPORT_MANAGER(peer_dev) && ++ !mlx5_core_is_ecpf_esw_manager(peer_dev)) + return 0; + + spec = kvzalloc(sizeof(*spec), GFP_KERNEL); +@@ -1203,7 +1203,7 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + + peer_miss_rules_setup(esw, peer_dev, spec, &dest); + +- flows = kvcalloc(nvports, sizeof(*flows), GFP_KERNEL); ++ flows = kvcalloc(peer_esw->total_vports, sizeof(*flows), GFP_KERNEL); + if (!flows) { + err = -ENOMEM; + goto alloc_flows_err; +@@ -1213,10 +1213,10 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, + misc_parameters); + +- if (mlx5_core_is_ecpf_esw_manager(esw->dev)) { +- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF); +- esw_set_peer_miss_rule_source_port(esw, peer_dev->priv.eswitch, +- spec, MLX5_VPORT_PF); ++ if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { ++ peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); ++ esw_set_peer_miss_rule_source_port(esw, peer_esw, spec, ++ MLX5_VPORT_PF); + + flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), + spec, &flow_act, &dest, 1); +@@ -1224,11 +1224,11 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + err = PTR_ERR(flow); + goto add_pf_flow_err; + } +- flows[vport->index] = flow; ++ flows[peer_vport->index] = flow; + } + +- if (mlx5_ecpf_vport_exists(esw->dev)) { +- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF); ++ if (mlx5_ecpf_vport_exists(peer_dev)) { ++ peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_ECPF); + MLX5_SET(fte_match_set_misc, misc, source_port, MLX5_VPORT_ECPF); + flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), + spec, &flow_act, &dest, 1); +@@ -1236,13 +1236,14 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + err = PTR_ERR(flow); + goto add_ecpf_flow_err; + } +- flows[vport->index] = flow; ++ flows[peer_vport->index] = flow; + } + +- mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev)) { ++ mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_vfs(peer_dev)) { + esw_set_peer_miss_rule_source_port(esw, +- peer_dev->priv.eswitch, +- spec, vport->vport); ++ peer_esw, ++ spec, peer_vport->vport); + + flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), + spec, &flow_act, &dest, 1); +@@ -1250,22 +1251,22 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + err = PTR_ERR(flow); + goto add_vf_flow_err; + } +- flows[vport->index] = flow; ++ flows[peer_vport->index] = flow; + } + +- if (mlx5_core_ec_sriov_enabled(esw->dev)) { +- mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) { +- if (i >= mlx5_core_max_ec_vfs(peer_dev)) +- break; +- esw_set_peer_miss_rule_source_port(esw, peer_dev->priv.eswitch, +- spec, vport->vport); ++ if (mlx5_core_ec_sriov_enabled(peer_dev)) { ++ mlx5_esw_for_each_ec_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_ec_vfs(peer_dev)) { ++ esw_set_peer_miss_rule_source_port(esw, peer_esw, ++ spec, ++ peer_vport->vport); + flow = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb, + spec, &flow_act, &dest, 1); + if (IS_ERR(flow)) { + err = PTR_ERR(flow); + goto add_ec_vf_flow_err; + } +- flows[vport->index] = flow; ++ flows[peer_vport->index] = flow; + } + } + +@@ -1282,25 +1283,27 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + return 0; + + add_ec_vf_flow_err: +- mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) { +- if (!flows[vport->index]) ++ mlx5_esw_for_each_ec_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_ec_vfs(peer_dev)) { ++ if (!flows[peer_vport->index]) + continue; +- mlx5_del_flow_rules(flows[vport->index]); ++ mlx5_del_flow_rules(flows[peer_vport->index]); + } + add_vf_flow_err: +- mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev)) { +- if (!flows[vport->index]) ++ mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_vfs(peer_dev)) { ++ if (!flows[peer_vport->index]) + continue; +- mlx5_del_flow_rules(flows[vport->index]); ++ mlx5_del_flow_rules(flows[peer_vport->index]); + } +- if (mlx5_ecpf_vport_exists(esw->dev)) { +- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF); +- mlx5_del_flow_rules(flows[vport->index]); ++ if (mlx5_ecpf_vport_exists(peer_dev)) { ++ peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_ECPF); ++ mlx5_del_flow_rules(flows[peer_vport->index]); + } + add_ecpf_flow_err: +- if (mlx5_core_is_ecpf_esw_manager(esw->dev)) { +- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF); +- mlx5_del_flow_rules(flows[vport->index]); ++ if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { ++ peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); ++ mlx5_del_flow_rules(flows[peer_vport->index]); + } + add_pf_flow_err: + esw_warn(esw->dev, "FDB: Failed to add peer miss flow rule err %d\n", err); +@@ -1313,37 +1316,34 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + static void esw_del_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + struct mlx5_core_dev *peer_dev) + { ++ struct mlx5_eswitch *peer_esw = peer_dev->priv.eswitch; + u16 peer_index = mlx5_get_dev_index(peer_dev); + struct mlx5_flow_handle **flows; +- struct mlx5_vport *vport; ++ struct mlx5_vport *peer_vport; + unsigned long i; + + flows = esw->fdb_table.offloads.peer_miss_rules[peer_index]; + if (!flows) + return; + +- if (mlx5_core_ec_sriov_enabled(esw->dev)) { +- mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) { +- /* The flow for a particular vport could be NULL if the other ECPF +- * has fewer or no VFs enabled +- */ +- if (!flows[vport->index]) +- continue; +- mlx5_del_flow_rules(flows[vport->index]); +- } ++ if (mlx5_core_ec_sriov_enabled(peer_dev)) { ++ mlx5_esw_for_each_ec_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_ec_vfs(peer_dev)) ++ mlx5_del_flow_rules(flows[peer_vport->index]); + } + +- mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev)) +- mlx5_del_flow_rules(flows[vport->index]); ++ mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_vfs(peer_dev)) ++ mlx5_del_flow_rules(flows[peer_vport->index]); + +- if (mlx5_ecpf_vport_exists(esw->dev)) { +- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF); +- mlx5_del_flow_rules(flows[vport->index]); ++ if (mlx5_ecpf_vport_exists(peer_dev)) { ++ peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_ECPF); ++ mlx5_del_flow_rules(flows[peer_vport->index]); + } + +- if (mlx5_core_is_ecpf_esw_manager(esw->dev)) { +- vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF); +- mlx5_del_flow_rules(flows[vport->index]); ++ if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { ++ peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); ++ mlx5_del_flow_rules(flows[peer_vport->index]); + } + + kvfree(flows); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1396-rdma-mlx5-convert-timeouts-to-secs-to-jiffies.patch b/SOURCES/1396-rdma-mlx5-convert-timeouts-to-secs-to-jiffies.patch new file mode 100644 index 000000000..614d34f5f --- /dev/null +++ b/SOURCES/1396-rdma-mlx5-convert-timeouts-to-secs-to-jiffies.patch @@ -0,0 +1,70 @@ +From dc192079bc2276b1a7c8163940448e003bf856c7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:25 -0400 +Subject: [PATCH] RDMA/mlx5: convert timeouts to secs_to_jiffies() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 16b82367aa28bd31795e720548421b58824108e1 +Author: Easwar Hariharan +Date: Wed Feb 19 21:36:40 2025 +0000 + + RDMA/mlx5: convert timeouts to secs_to_jiffies() + + Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced + secs_to_jiffies(). As the value here is a multiple of 1000, use + secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. + + This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with + the following Coccinelle rules: + + @depends on patch@ + expression E; + @@ + + -msecs_to_jiffies(E * 1000) + +secs_to_jiffies(E) + + -msecs_to_jiffies(E * MSEC_PER_SEC) + +secs_to_jiffies(E) + + Link: https://patch.msgid.link/r/20250219-rdma-secs-to-jiffies-v1-2-b506746561a9@linux.microsoft.com + Signed-off-by: Easwar Hariharan + Signed-off-by: Jason Gunthorpe + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c +index 247f7248a0c0..5a7b234bdfd9 100644 +--- a/drivers/infiniband/hw/mlx5/mr.c ++++ b/drivers/infiniband/hw/mlx5/mr.c +@@ -525,7 +525,7 @@ static void queue_adjust_cache_locked(struct mlx5_cache_ent *ent) + ent->fill_to_high_water = false; + if (ent->pending) + queue_delayed_work(ent->dev->cache.wq, &ent->dwork, +- msecs_to_jiffies(1000)); ++ secs_to_jiffies(1)); + else + mod_delayed_work(ent->dev->cache.wq, &ent->dwork, 0); + } +@@ -576,7 +576,7 @@ static void __cache_work_func(struct mlx5_cache_ent *ent) + "add keys command failed, err %d\n", + err); + queue_delayed_work(cache->wq, &ent->dwork, +- msecs_to_jiffies(1000)); ++ secs_to_jiffies(1)); + } + } + } else if (ent->mkeys_queue.ci > 2 * ent->limit) { +@@ -2080,7 +2080,7 @@ static int mlx5r_handle_mkey_cleanup(struct mlx5_ib_mr *mr) + ent->in_use--; + if (ent->is_tmp && !ent->tmp_cleanup_scheduled) { + mod_delayed_work(ent->dev->cache.wq, &ent->dwork, +- msecs_to_jiffies(30 * 1000)); ++ secs_to_jiffies(30)); + ent->tmp_cleanup_scheduled = true; + } + spin_unlock_irq(&ent->mkeys_queue.lock); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1397-rdma-mlx5-remove-the-redundant-mlx5-ib-stage-uar-stage.patch b/SOURCES/1397-rdma-mlx5-remove-the-redundant-mlx5-ib-stage-uar-stage.patch new file mode 100644 index 000000000..bef25b745 --- /dev/null +++ b/SOURCES/1397-rdma-mlx5-remove-the-redundant-mlx5-ib-stage-uar-stage.patch @@ -0,0 +1,103 @@ +From a146abefb34ccd688e9923429cd51b9b7b1a3442 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:26 -0400 +Subject: [PATCH] RDMA/mlx5: Remove the redundant MLX5_IB_STAGE_UAR stage + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 972db388d40ded1a5ef8ce09d92ef1e2b9e40f07 +Author: Yishai Hadas +Date: Tue May 13 14:02:40 2025 +0300 + + RDMA/mlx5: Remove the redundant MLX5_IB_STAGE_UAR stage + + The MLX5_IB_STAGE_UAR stage in the RDMA driver is redundant and should + be removed. + + Responsibility for initializing the device's UAR pointer + (mdev->priv.uar) lies with mlx5_core, which already sets it during the + mlx5_load() process. + + At present, the RDMA UAR stage overwrites this pointer, which was + correctly initialized by mlx5_core, creating the risk of inconsistency. + + Ownership and management of the UAR pointer should remain exclusively + within mlx5_core. + + In the current upstream code, we luckily receive the same pointer, since + mlx5_get_uars_page() still finds available BF registers for that UAR, + allowing it to be shared. + + However, future changes in mlx5_core may expose this flaw. + For instance, if mlx5_alloc_bfreg() is invoked twice before the RDMA UAR + stage runs, the RDMA driver may overwrite the UAR allocated by + mlx5_core. + + This could lead to real bugs. For example, if mlx5_ib is unloaded + (rmmod), it might free the UAR, leaving mlx5_core with a dangling + reference to an invalid UAR. + + Signed-off-by: Yishai Hadas + Reviewed-by: Fan Li + Link: https://patch.msgid.link/feaa84ec6f20468b4935c439923e9266122a93d0.1747134130.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index 64f1e0fafd46..6a75c5a2f6c8 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -4422,17 +4422,6 @@ static void mlx5_ib_stage_cong_debugfs_cleanup(struct mlx5_ib_dev *dev) + mlx5_core_native_port_num(dev->mdev) - 1); + } + +-static int mlx5_ib_stage_uar_init(struct mlx5_ib_dev *dev) +-{ +- dev->mdev->priv.uar = mlx5_get_uars_page(dev->mdev); +- return PTR_ERR_OR_ZERO(dev->mdev->priv.uar); +-} +- +-static void mlx5_ib_stage_uar_cleanup(struct mlx5_ib_dev *dev) +-{ +- mlx5_put_uars_page(dev->mdev, dev->mdev->priv.uar); +-} +- + static int mlx5_ib_stage_bfrag_init(struct mlx5_ib_dev *dev) + { + int err; +@@ -4661,9 +4650,6 @@ static const struct mlx5_ib_profile pf_profile = { + STAGE_CREATE(MLX5_IB_STAGE_CONG_DEBUGFS, + mlx5_ib_stage_cong_debugfs_init, + mlx5_ib_stage_cong_debugfs_cleanup), +- STAGE_CREATE(MLX5_IB_STAGE_UAR, +- mlx5_ib_stage_uar_init, +- mlx5_ib_stage_uar_cleanup), + STAGE_CREATE(MLX5_IB_STAGE_BFREG, + mlx5_ib_stage_bfrag_init, + mlx5_ib_stage_bfrag_cleanup), +@@ -4721,9 +4707,6 @@ const struct mlx5_ib_profile raw_eth_profile = { + STAGE_CREATE(MLX5_IB_STAGE_CONG_DEBUGFS, + mlx5_ib_stage_cong_debugfs_init, + mlx5_ib_stage_cong_debugfs_cleanup), +- STAGE_CREATE(MLX5_IB_STAGE_UAR, +- mlx5_ib_stage_uar_init, +- mlx5_ib_stage_uar_cleanup), + STAGE_CREATE(MLX5_IB_STAGE_BFREG, + mlx5_ib_stage_bfrag_init, + mlx5_ib_stage_bfrag_cleanup), +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index c84ef94bb9fc..54ca6e010bd4 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -1002,7 +1002,6 @@ enum mlx5_ib_stages { + MLX5_IB_STAGE_ODP, + MLX5_IB_STAGE_COUNTERS, + MLX5_IB_STAGE_CONG_DEBUGFS, +- MLX5_IB_STAGE_UAR, + MLX5_IB_STAGE_BFREG, + MLX5_IB_STAGE_PRE_IB_REG_UMR, + MLX5_IB_STAGE_WHITELIST_UID, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1398-rdma-mlx5-add-support-for-200gbps-per-lane-speeds.patch b/SOURCES/1398-rdma-mlx5-add-support-for-200gbps-per-lane-speeds.patch new file mode 100644 index 000000000..1c05337e3 --- /dev/null +++ b/SOURCES/1398-rdma-mlx5-add-support-for-200gbps-per-lane-speeds.patch @@ -0,0 +1,60 @@ +From 1e3839f66a3dc069dc1f0b26277180792852653c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:26 -0400 +Subject: [PATCH] RDMA/mlx5: Add support for 200Gbps per lane speeds + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d00d16bcbc2553a3ac9acccf2d6444cda5502adf +Author: Patrisious Haddad +Date: Tue May 13 14:03:41 2025 +0300 + + RDMA/mlx5: Add support for 200Gbps per lane speeds + + Add support for 200Gbps per lane speeds speed when querying PTYS and + report it back correctly when needed. + + Signed-off-by: Patrisious Haddad + Reviewed-by: Maor Gottlieb + Link: https://patch.msgid.link/b842d2f523e9b82e221378c444ebd5860d612959.1747134197.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index 6a75c5a2f6c8..d5f44c83b667 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -485,6 +485,10 @@ static int translate_eth_ext_proto_oper(u32 eth_proto_oper, u16 *active_speed, + *active_width = IB_WIDTH_2X; + *active_speed = IB_SPEED_NDR; + break; ++ case MLX5E_PROT_MASK(MLX5E_200GAUI_1_200GBASE_CR1_KR1): ++ *active_width = IB_WIDTH_1X; ++ *active_speed = IB_SPEED_XDR; ++ break; + case MLX5E_PROT_MASK(MLX5E_400GAUI_8_400GBASE_CR8): + *active_width = IB_WIDTH_8X; + *active_speed = IB_SPEED_HDR; +@@ -493,10 +497,18 @@ static int translate_eth_ext_proto_oper(u32 eth_proto_oper, u16 *active_speed, + *active_width = IB_WIDTH_4X; + *active_speed = IB_SPEED_NDR; + break; ++ case MLX5E_PROT_MASK(MLX5E_400GAUI_2_400GBASE_CR2_KR2): ++ *active_width = IB_WIDTH_2X; ++ *active_speed = IB_SPEED_XDR; ++ break; + case MLX5E_PROT_MASK(MLX5E_800GAUI_8_800GBASE_CR8_KR8): + *active_width = IB_WIDTH_8X; + *active_speed = IB_SPEED_NDR; + break; ++ case MLX5E_PROT_MASK(MLX5E_800GAUI_4_800GBASE_CR4_KR4): ++ *active_width = IB_WIDTH_4X; ++ *active_speed = IB_SPEED_XDR; ++ break; + default: + return -EINVAL; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1399-rdma-mlx5-avoid-flexible-array-warning.patch b/SOURCES/1399-rdma-mlx5-avoid-flexible-array-warning.patch new file mode 100644 index 000000000..16aece30b --- /dev/null +++ b/SOURCES/1399-rdma-mlx5-avoid-flexible-array-warning.patch @@ -0,0 +1,109 @@ +From 08a08451012e72442d1813506229d85d3f785535 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:26 -0400 +Subject: [PATCH] RDMA/mlx5: Avoid flexible array warning + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e91fb8b9d0edec86a1ef26490bc80af96210863d +Author: Leon Romanovsky +Date: Wed May 21 14:34:58 2025 +0300 + + RDMA/mlx5: Avoid flexible array warning + + The following warning is reported by sparse tool: + drivers/infiniband/hw/mlx5/fs.c:1664:26: warning: array of flexible + structures + + Avoid it by simply splitting array into two separate structs. + + Link: https://patch.msgid.link/7b891b96a9fc053d01284c184d25ae98d35db2d4.1747827041.git.leon@kernel.org + Reviewed-by: Zhu Yanjun + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c +index 0ff9f18a71e8..680627f1de33 100644 +--- a/drivers/infiniband/hw/mlx5/fs.c ++++ b/drivers/infiniband/hw/mlx5/fs.c +@@ -1645,11 +1645,6 @@ static struct mlx5_ib_flow_handler *create_flow_rule(struct mlx5_ib_dev *dev, + return _create_flow_rule(dev, ft_prio, flow_attr, dst, 0, NULL); + } + +-enum { +- LEFTOVERS_MC, +- LEFTOVERS_UC, +-}; +- + static struct mlx5_ib_flow_handler *create_leftovers_rule(struct mlx5_ib_dev *dev, + struct mlx5_ib_flow_prio *ft_prio, + struct ib_flow_attr *flow_attr, +@@ -1659,43 +1654,32 @@ static struct mlx5_ib_flow_handler *create_leftovers_rule(struct mlx5_ib_dev *de + struct mlx5_ib_flow_handler *handler = NULL; + + static struct { +- struct ib_flow_attr flow_attr; + struct ib_flow_spec_eth eth_flow; +- } leftovers_specs[] = { +- [LEFTOVERS_MC] = { +- .flow_attr = { +- .num_of_specs = 1, +- .size = sizeof(leftovers_specs[0]) +- }, +- .eth_flow = { +- .type = IB_FLOW_SPEC_ETH, +- .size = sizeof(struct ib_flow_spec_eth), +- .mask = {.dst_mac = {0x1} }, +- .val = {.dst_mac = {0x1} } +- } +- }, +- [LEFTOVERS_UC] = { +- .flow_attr = { +- .num_of_specs = 1, +- .size = sizeof(leftovers_specs[0]) +- }, +- .eth_flow = { +- .type = IB_FLOW_SPEC_ETH, +- .size = sizeof(struct ib_flow_spec_eth), +- .mask = {.dst_mac = {0x1} }, +- .val = {.dst_mac = {} } +- } +- } +- }; ++ struct ib_flow_attr flow_attr; ++ } leftovers_wc = { .flow_attr = { .num_of_specs = 1, ++ .size = sizeof(leftovers_wc) }, ++ .eth_flow = { ++ .type = IB_FLOW_SPEC_ETH, ++ .size = sizeof(struct ib_flow_spec_eth), ++ .mask = { .dst_mac = { 0x1 } }, ++ .val = { .dst_mac = { 0x1 } } } }; + +- handler = create_flow_rule(dev, ft_prio, +- &leftovers_specs[LEFTOVERS_MC].flow_attr, +- dst); ++ static struct { ++ struct ib_flow_spec_eth eth_flow; ++ struct ib_flow_attr flow_attr; ++ } leftovers_uc = { .flow_attr = { .num_of_specs = 1, ++ .size = sizeof(leftovers_uc) }, ++ .eth_flow = { ++ .type = IB_FLOW_SPEC_ETH, ++ .size = sizeof(struct ib_flow_spec_eth), ++ .mask = { .dst_mac = { 0x1 } }, ++ .val = { .dst_mac = {} } } }; ++ ++ handler = create_flow_rule(dev, ft_prio, &leftovers_wc.flow_attr, dst); + if (!IS_ERR(handler) && + flow_attr->type == IB_FLOW_ATTR_ALL_DEFAULT) { + handler_ucast = create_flow_rule(dev, ft_prio, +- &leftovers_specs[LEFTOVERS_UC].flow_attr, +- dst); ++ &leftovers_uc.flow_attr, dst); + if (IS_ERR(handler_ucast)) { + mlx5_del_flow_rules(handler->rule); + ft_prio->refcount--; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1400-rdma-mlx5-initialize-obj-event-obj-sub-list-before-xa-insert.patch b/SOURCES/1400-rdma-mlx5-initialize-obj-event-obj-sub-list-before-xa-insert.patch new file mode 100644 index 000000000..c04b0f934 --- /dev/null +++ b/SOURCES/1400-rdma-mlx5-initialize-obj-event-obj-sub-list-before-xa-insert.patch @@ -0,0 +1,103 @@ +From 06a22c7cc0922e8dbc79ec9dad2cec1493943213 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:26 -0400 +Subject: [PATCH] RDMA/mlx5: Initialize obj_event->obj_sub_list before + xa_insert + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 8edab8a72d67742f87e9dc2e2b0cdfddda5dc29a +Author: Mark Zhang +Date: Tue Jun 17 11:13:55 2025 +0300 + + RDMA/mlx5: Initialize obj_event->obj_sub_list before xa_insert + + The obj_event may be loaded immediately after inserted, then if the + list_head is not initialized then we may get a poisonous pointer. This + fixes the crash below: + + mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced) + mlx5_core.sf mlx5_core.sf.4: firmware version: 32.38.3056 + mlx5_core 0000:03:00.0 en3f0pf0sf2002: renamed from eth0 + mlx5_core.sf mlx5_core.sf.4: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps + IPv6: ADDRCONF(NETDEV_CHANGE): en3f0pf0sf2002: link becomes ready + Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060 + Mem abort info: + ESR = 0x96000006 + EC = 0x25: DABT (current EL), IL = 32 bits + SET = 0, FnV = 0 + EA = 0, S1PTW = 0 + Data abort info: + ISV = 0, ISS = 0x00000006 + CM = 0, WnR = 0 + user pgtable: 4k pages, 48-bit VAs, pgdp=00000007760fb000 + [0000000000000060] pgd=000000076f6d7003, p4d=000000076f6d7003, pud=0000000777841003, pmd=0000000000000000 + Internal error: Oops: 96000006 [#1] SMP + Modules linked in: ipmb_host(OE) act_mirred(E) cls_flower(E) sch_ingress(E) mptcp_diag(E) udp_diag(E) raw_diag(E) unix_diag(E) tcp_diag(E) inet_diag(E) binfmt_misc(E) bonding(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) isofs(E) cdrom(E) mst_pciconf(OE) ib_umad(OE) mlx5_ib(OE) ipmb_dev_int(OE) mlx5_core(OE) kpatch_15237886(OEK) mlxdevm(OE) auxiliary(OE) ib_uverbs(OE) ib_core(OE) psample(E) mlxfw(OE) tls(E) sunrpc(E) vfat(E) fat(E) crct10dif_ce(E) ghash_ce(E) sha1_ce(E) sbsa_gwdt(E) virtio_console(E) ext4(E) mbcache(E) jbd2(E) xfs(E) libcrc32c(E) mmc_block(E) virtio_net(E) net_failover(E) failover(E) sha2_ce(E) sha256_arm64(E) nvme(OE) nvme_core(OE) gpio_mlxbf3(OE) mlx_compat(OE) mlxbf_pmc(OE) i2c_mlxbf(OE) sdhci_of_dwcmshc(OE) pinctrl_mlxbf3(OE) mlxbf_pka(OE) gpio_generic(E) i2c_core(E) mmc_core(E) mlxbf_gige(OE) vitesse(E) pwr_mlxbf(OE) mlxbf_tmfifo(OE) micrel(E) mlxbf_bootctl(OE) virtio_ring(E) virtio(E) ipmi_devintf(E) ipmi_msghandler(E) + [last unloaded: mst_pci] + CPU: 11 PID: 20913 Comm: rte-worker-11 Kdump: loaded Tainted: G OE K 5.10.134-13.1.an8.aarch64 #1 + Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.2.2.12968 Oct 26 2023 + pstate: a0400089 (NzCv daIf +PAN -UAO -TCO BTYPE=--) + pc : dispatch_event_fd+0x68/0x300 [mlx5_ib] + lr : devx_event_notifier+0xcc/0x228 [mlx5_ib] + sp : ffff80001005bcf0 + x29: ffff80001005bcf0 x28: 0000000000000001 + x27: ffff244e0740a1d8 x26: ffff244e0740a1d0 + x25: ffffda56beff5ae0 x24: ffffda56bf911618 + x23: ffff244e0596a480 x22: ffff244e0596a480 + x21: ffff244d8312ad90 x20: ffff244e0596a480 + x19: fffffffffffffff0 x18: 0000000000000000 + x17: 0000000000000000 x16: ffffda56be66d620 + x15: 0000000000000000 x14: 0000000000000000 + x13: 0000000000000000 x12: 0000000000000000 + x11: 0000000000000040 x10: ffffda56bfcafb50 + x9 : ffffda5655c25f2c x8 : 0000000000000010 + x7 : 0000000000000000 x6 : ffff24545a2e24b8 + x5 : 0000000000000003 x4 : ffff80001005bd28 + x3 : 0000000000000000 x2 : 0000000000000000 + x1 : ffff244e0596a480 x0 : ffff244d8312ad90 + Call trace: + dispatch_event_fd+0x68/0x300 [mlx5_ib] + devx_event_notifier+0xcc/0x228 [mlx5_ib] + atomic_notifier_call_chain+0x58/0x80 + mlx5_eq_async_int+0x148/0x2b0 [mlx5_core] + atomic_notifier_call_chain+0x58/0x80 + irq_int_handler+0x20/0x30 [mlx5_core] + __handle_irq_event_percpu+0x60/0x220 + handle_irq_event_percpu+0x3c/0x90 + handle_irq_event+0x58/0x158 + handle_fasteoi_irq+0xfc/0x188 + generic_handle_irq+0x34/0x48 + ... + + Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") + Link: https://patch.msgid.link/r/3ce7f20e0d1a03dc7de6e57494ec4b8eaf1f05c2.1750147949.git.leon@kernel.org + Signed-off-by: Mark Zhang + Signed-off-by: Leon Romanovsky + Signed-off-by: Jason Gunthorpe + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c +index 6485ce3208ce..fae11535b1a0 100644 +--- a/drivers/infiniband/hw/mlx5/devx.c ++++ b/drivers/infiniband/hw/mlx5/devx.c +@@ -1958,6 +1958,7 @@ subscribe_event_xa_alloc(struct mlx5_devx_event_table *devx_event_table, + /* Level1 is valid for future use, no need to free */ + return -ENOMEM; + ++ INIT_LIST_HEAD(&obj_event->obj_sub_list); + err = xa_insert(&event->object_ids, + key_level2, + obj_event, +@@ -1966,7 +1967,6 @@ subscribe_event_xa_alloc(struct mlx5_devx_event_table *devx_event_table, + kfree(obj_event); + return err; + } +- INIT_LIST_HEAD(&obj_event->obj_sub_list); + } + + return 0; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1401-rdma-mlx5-fix-hw-counters-query-for-non-representor-devices.patch b/SOURCES/1401-rdma-mlx5-fix-hw-counters-query-for-non-representor-devices.patch new file mode 100644 index 000000000..707cc630a --- /dev/null +++ b/SOURCES/1401-rdma-mlx5-fix-hw-counters-query-for-non-representor-devices.patch @@ -0,0 +1,48 @@ +From 1607e340f1d5df6e422bdd53e34444bf73986864 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:26 -0400 +Subject: [PATCH] RDMA/mlx5: Fix HW counters query for non-representor devices + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3cc1dbfddf88dc5ecce0a75185061403b1f7352d +Author: Patrisious Haddad +Date: Mon Jun 16 12:14:52 2025 +0300 + + RDMA/mlx5: Fix HW counters query for non-representor devices + + To get the device HW counters, a non-representor switchdev device + should use the mlx5_ib_query_q_counters() function and query all of + the available counters. While a representor device in switchdev mode + should use the mlx5_ib_query_q_counters_vport() function and query only + the Q_Counters without the PPCNT counters and congestion control counters, + since they aren't relevant for a representor device. + + Currently a non-representor switchdev device skips querying the PPCNT + counters and congestion control counters, leaving them unupdated. + Fix that by properly querying those counters for non-representor devices. + + Fixes: d22467a71ebe ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics") + Signed-off-by: Patrisious Haddad + Reviewed-by: Maher Sanalla + Link: https://patch.msgid.link/56bf8af4ca8c58e3fb9f7e47b1dca2009eeeed81.1750064969.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c +index b847084dcd99..943e9eb2ad20 100644 +--- a/drivers/infiniband/hw/mlx5/counters.c ++++ b/drivers/infiniband/hw/mlx5/counters.c +@@ -398,7 +398,7 @@ static int do_get_hw_stats(struct ib_device *ibdev, + return ret; + + /* We don't expose device counters over Vports */ +- if (is_mdev_switchdev_mode(dev->mdev) && port_num != 0) ++ if (is_mdev_switchdev_mode(dev->mdev) && dev->is_rep && port_num != 0) + goto done; + + if (MLX5_CAP_PCAM_FEATURE(dev->mdev, rx_icrc_encapsulated_counter)) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1402-rdma-mlx5-fix-cc-counters-query-for-mpv.patch b/SOURCES/1402-rdma-mlx5-fix-cc-counters-query-for-mpv.patch new file mode 100644 index 000000000..0463791c4 --- /dev/null +++ b/SOURCES/1402-rdma-mlx5-fix-cc-counters-query-for-mpv.patch @@ -0,0 +1,40 @@ +From daaee9898eeb1ee187247fd1ed5452440eba9edc Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:26 -0400 +Subject: [PATCH] RDMA/mlx5: Fix CC counters query for MPV + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit acd245b1e33fc4b9d0f2e3372021d632f7ee0652 +Author: Patrisious Haddad +Date: Mon Jun 16 12:14:53 2025 +0300 + + RDMA/mlx5: Fix CC counters query for MPV + + In case, CC counters are querying for the second port use the correct + core device for the query instead of always using the master core device. + + Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE") + Signed-off-by: Patrisious Haddad + Reviewed-by: Michael Guralnik + Link: https://patch.msgid.link/9cace74dcf106116118bebfa9146d40d4166c6b0.1750064969.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c +index 943e9eb2ad20..a506fafd2b15 100644 +--- a/drivers/infiniband/hw/mlx5/counters.c ++++ b/drivers/infiniband/hw/mlx5/counters.c +@@ -418,7 +418,7 @@ static int do_get_hw_stats(struct ib_device *ibdev, + */ + goto done; + } +- ret = mlx5_lag_query_cong_counters(dev->mdev, ++ ret = mlx5_lag_query_cong_counters(mdev, + stats->value + + cnts->num_q_counters, + cnts->num_cong_counters, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1403-rdma-mlx5-fix-vport-loopback-for-mpv-device.patch b/SOURCES/1403-rdma-mlx5-fix-vport-loopback-for-mpv-device.patch new file mode 100644 index 000000000..3e8314cbb --- /dev/null +++ b/SOURCES/1403-rdma-mlx5-fix-vport-loopback-for-mpv-device.patch @@ -0,0 +1,89 @@ +From b58e4be62a28810484d1d1203db45cc311c6d48f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 17:43:26 -0400 +Subject: [PATCH] RDMA/mlx5: Fix vport loopback for MPV device + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a9a9e68954f29b1e197663f76289db4879fd51bb +Author: Patrisious Haddad +Date: Mon Jun 16 12:14:54 2025 +0300 + + RDMA/mlx5: Fix vport loopback for MPV device + + Always enable vport loopback for both MPV devices on driver start. + + Previously in some cases related to MPV RoCE, packets weren't correctly + executing loopback check at vport in FW, since it was disabled. + Due to complexity of identifying such cases for MPV always enable vport + loopback for both GVMIs when binding the slave to the master port. + + Fixes: 0042f9e458a5 ("RDMA/mlx5: Enable vport loopback when user context or QP mandate") + Signed-off-by: Patrisious Haddad + Reviewed-by: Mark Bloch + Link: https://patch.msgid.link/d4298f5ebb2197459e9e7221c51ecd6a34699847.1750064969.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index d5f44c83b667..f463b8e7cfca 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -1791,6 +1791,33 @@ static void deallocate_uars(struct mlx5_ib_dev *dev, + context->devx_uid); + } + ++static int mlx5_ib_enable_lb_mp(struct mlx5_core_dev *master, ++ struct mlx5_core_dev *slave) ++{ ++ int err; ++ ++ err = mlx5_nic_vport_update_local_lb(master, true); ++ if (err) ++ return err; ++ ++ err = mlx5_nic_vport_update_local_lb(slave, true); ++ if (err) ++ goto out; ++ ++ return 0; ++ ++out: ++ mlx5_nic_vport_update_local_lb(master, false); ++ return err; ++} ++ ++static void mlx5_ib_disable_lb_mp(struct mlx5_core_dev *master, ++ struct mlx5_core_dev *slave) ++{ ++ mlx5_nic_vport_update_local_lb(slave, false); ++ mlx5_nic_vport_update_local_lb(master, false); ++} ++ + int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) + { + int err = 0; +@@ -3495,6 +3522,8 @@ static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev, + + lockdep_assert_held(&mlx5_ib_multiport_mutex); + ++ mlx5_ib_disable_lb_mp(ibdev->mdev, mpi->mdev); ++ + mlx5_core_mp_event_replay(ibdev->mdev, + MLX5_DRIVER_EVENT_AFFILIATION_REMOVED, + NULL); +@@ -3590,6 +3619,10 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev, + MLX5_DRIVER_EVENT_AFFILIATION_DONE, + &key); + ++ err = mlx5_ib_enable_lb_mp(ibdev->mdev, mpi->mdev); ++ if (err) ++ goto unbind; ++ + return true; + + unbind: +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1404-net-mlx5-expose-serial-numbers-in-devlink-info.patch b/SOURCES/1404-net-mlx5-expose-serial-numbers-in-devlink-info.patch new file mode 100644 index 000000000..6f0025843 --- /dev/null +++ b/SOURCES/1404-net-mlx5-expose-serial-numbers-in-devlink-info.patch @@ -0,0 +1,132 @@ +From c49aebd76637b284358e3705c4642286f996b607 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:17 -0400 +Subject: [PATCH] net/mlx5: Expose serial numbers in devlink info + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 18667214b955ef89f208d451820c39a5dfd77f27 +Author: Jiri Pirko +Date: Tue Jun 10 04:51:28 2025 +0200 + + net/mlx5: Expose serial numbers in devlink info + + Devlink info allows to expose serial number and board serial number + Get the values from PCI VPD and expose it. + + $ devlink dev info + pci/0000:08:00.0: + driver mlx5_core + serial_number e4397f872caeed218000846daa7d2f49 + board.serial_number MT2314XZ00YA + versions: + fixed: + fw.psid MT_0000000894 + running: + fw.version 28.41.1000 + fw 28.41.1000 + stored: + fw.version 28.41.1000 + fw 28.41.1000 + auxiliary/mlx5_core.eth.0: + driver mlx5_core.eth + pci/0000:08:00.1: + driver mlx5_core + serial_number e4397f872caeed218000846daa7d2f49 + board.serial_number MT2314XZ00YA + versions: + fixed: + fw.psid MT_0000000894 + running: + fw.version 28.41.1000 + fw 28.41.1000 + stored: + fw.version 28.41.1000 + fw 28.41.1000 + auxiliary/mlx5_core.eth.1: + driver mlx5_core.eth + + Signed-off-by: Jiri Pirko + Reviewed-by: Parav Pandit + Reviewed-by: Simon Horman + Reviewed-by: Kalesh AP + Acked-by: Tariq Toukan + Link: https://patch.msgid.link/20250610025128.109232-1-jiri@resnulli.us + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index 3b27da79ba94..4b536b384fc0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -35,6 +35,55 @@ static u16 mlx5_fw_ver_subminor(u32 version) + return version & 0xffff; + } + ++static int mlx5_devlink_serial_numbers_put(struct mlx5_core_dev *dev, ++ struct devlink_info_req *req, ++ struct netlink_ext_ack *extack) ++{ ++ struct pci_dev *pdev = dev->pdev; ++ unsigned int vpd_size, kw_len; ++ char *str, *end; ++ u8 *vpd_data; ++ int err = 0; ++ int start; ++ ++ vpd_data = pci_vpd_alloc(pdev, &vpd_size); ++ if (IS_ERR(vpd_data)) ++ return 0; ++ ++ start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size, ++ PCI_VPD_RO_KEYWORD_SERIALNO, &kw_len); ++ if (start >= 0) { ++ str = kstrndup(vpd_data + start, kw_len, GFP_KERNEL); ++ if (!str) { ++ err = -ENOMEM; ++ goto end; ++ } ++ end = strchrnul(str, ' '); ++ *end = '\0'; ++ err = devlink_info_board_serial_number_put(req, str); ++ kfree(str); ++ if (err) ++ goto end; ++ } ++ ++ start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size, "V3", &kw_len); ++ if (start >= 0) { ++ str = kstrndup(vpd_data + start, kw_len, GFP_KERNEL); ++ if (!str) { ++ err = -ENOMEM; ++ goto end; ++ } ++ err = devlink_info_serial_number_put(req, str); ++ kfree(str); ++ if (err) ++ goto end; ++ } ++ ++end: ++ kfree(vpd_data); ++ return err; ++} ++ + #define DEVLINK_FW_STRING_LEN 32 + + static int +@@ -49,6 +98,10 @@ mlx5_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, + if (!mlx5_core_is_pf(dev)) + return 0; + ++ err = mlx5_devlink_serial_numbers_put(dev, req, extack); ++ if (err) ++ return err; ++ + err = devlink_info_version_fixed_put(req, "fw.psid", dev->board_id); + if (err) + return err; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1405-net-mlx5e-shampo-reorganize-mlx5-rq-shampo-alloc.patch b/SOURCES/1405-net-mlx5e-shampo-reorganize-mlx5-rq-shampo-alloc.patch new file mode 100644 index 000000000..18a17bdbf --- /dev/null +++ b/SOURCES/1405-net-mlx5e-shampo-reorganize-mlx5-rq-shampo-alloc.patch @@ -0,0 +1,242 @@ +From 9440e62748742ac2e252b1559c6575de37da4e90 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:17 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit af4312c4c9c11da84b13b3aa8f472ab287cf1f0b +Author: Saeed Mahameed +Date: Mon Jun 16 17:14:33 2025 +0300 + + net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc + + Drop redundant SHAMPO structure alloc/free functions. + + Gather together function calls pertaining to header split info, pass + header per WQE (hd_per_wqe) as parameter to those function to avoid use + before initialization future mistakes. + + Allocate HW GRO related info outside of the header related info scope. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Dragos Tatulea + Signed-off-by: Cosmin Ratiu + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 5b0d03b3efe8..211ea429ea89 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -638,7 +638,6 @@ struct mlx5e_shampo_hd { + struct mlx5e_frag_page *pages; + u32 hd_per_wq; + u16 hd_per_wqe; +- u16 pages_per_wq; + unsigned long *bitmap; + u16 pi; + u16 ci; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 9bd166f489e7..a074f1eac3f4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -330,47 +330,6 @@ static inline void mlx5e_build_umr_wqe(struct mlx5e_rq *rq, + ucseg->mkey_mask = cpu_to_be64(MLX5_MKEY_MASK_FREE); + } + +-static int mlx5e_rq_shampo_hd_alloc(struct mlx5e_rq *rq, int node) +-{ +- rq->mpwqe.shampo = kvzalloc_node(sizeof(*rq->mpwqe.shampo), +- GFP_KERNEL, node); +- if (!rq->mpwqe.shampo) +- return -ENOMEM; +- return 0; +-} +- +-static void mlx5e_rq_shampo_hd_free(struct mlx5e_rq *rq) +-{ +- kvfree(rq->mpwqe.shampo); +-} +- +-static int mlx5e_rq_shampo_hd_info_alloc(struct mlx5e_rq *rq, int node) +-{ +- struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; +- +- shampo->bitmap = bitmap_zalloc_node(shampo->hd_per_wq, GFP_KERNEL, +- node); +- shampo->pages = kvzalloc_node(array_size(shampo->hd_per_wq, +- sizeof(*shampo->pages)), +- GFP_KERNEL, node); +- if (!shampo->bitmap || !shampo->pages) +- goto err_nomem; +- +- return 0; +- +-err_nomem: +- bitmap_free(shampo->bitmap); +- kvfree(shampo->pages); +- +- return -ENOMEM; +-} +- +-static void mlx5e_rq_shampo_hd_info_free(struct mlx5e_rq *rq) +-{ +- bitmap_free(rq->mpwqe.shampo->bitmap); +- kvfree(rq->mpwqe.shampo->pages); +-} +- + static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq, int node) + { + int wq_sz = mlx5_wq_ll_get_size(&rq->mpwqe.wq); +@@ -583,19 +542,18 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq + } + + static int mlx5e_create_rq_hd_umr_mkey(struct mlx5_core_dev *mdev, +- struct mlx5e_rq *rq) ++ u16 hd_per_wq, u32 *umr_mkey) + { + u32 max_ksm_size = BIT(MLX5_CAP_GEN(mdev, log_max_klm_list_size)); + +- if (max_ksm_size < rq->mpwqe.shampo->hd_per_wq) { ++ if (max_ksm_size < hd_per_wq) { + mlx5_core_err(mdev, "max ksm list size 0x%x is smaller than shampo header buffer list size 0x%x\n", +- max_ksm_size, rq->mpwqe.shampo->hd_per_wq); ++ max_ksm_size, hd_per_wq); + return -EINVAL; + } +- +- return mlx5e_create_umr_ksm_mkey(mdev, rq->mpwqe.shampo->hd_per_wq, ++ return mlx5e_create_umr_ksm_mkey(mdev, hd_per_wq, + MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE, +- &rq->mpwqe.shampo->mkey); ++ umr_mkey); + } + + static void mlx5e_init_frags_partition(struct mlx5e_rq *rq) +@@ -757,6 +715,35 @@ static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param + xdp_frag_size); + } + ++static int mlx5e_rq_shampo_hd_info_alloc(struct mlx5e_rq *rq, u16 hd_per_wq, ++ int node) ++{ ++ struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; ++ ++ shampo->hd_per_wq = hd_per_wq; ++ ++ shampo->bitmap = bitmap_zalloc_node(hd_per_wq, GFP_KERNEL, node); ++ shampo->pages = kvzalloc_node(array_size(hd_per_wq, ++ sizeof(*shampo->pages)), ++ GFP_KERNEL, node); ++ if (!shampo->bitmap || !shampo->pages) ++ goto err_nomem; ++ ++ return 0; ++ ++err_nomem: ++ kvfree(shampo->pages); ++ bitmap_free(shampo->bitmap); ++ ++ return -ENOMEM; ++} ++ ++static void mlx5e_rq_shampo_hd_info_free(struct mlx5e_rq *rq) ++{ ++ kvfree(rq->mpwqe.shampo->pages); ++ bitmap_free(rq->mpwqe.shampo->bitmap); ++} ++ + static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + struct mlx5e_rq_param *rqp, +@@ -764,42 +751,52 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + u32 *pool_size, + int node) + { ++ void *wqc = MLX5_ADDR_OF(rqc, rqp->rqc, wq); ++ u16 hd_per_wq; ++ int wq_size; + int err; + + if (!test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) + return 0; +- err = mlx5e_rq_shampo_hd_alloc(rq, node); +- if (err) +- goto out; +- rq->mpwqe.shampo->hd_per_wq = +- mlx5e_shampo_hd_per_wq(mdev, params, rqp); +- err = mlx5e_create_rq_hd_umr_mkey(mdev, rq); ++ ++ rq->mpwqe.shampo = kvzalloc_node(sizeof(*rq->mpwqe.shampo), ++ GFP_KERNEL, node); ++ if (!rq->mpwqe.shampo) ++ return -ENOMEM; ++ ++ /* split headers data structures */ ++ hd_per_wq = mlx5e_shampo_hd_per_wq(mdev, params, rqp); ++ err = mlx5e_rq_shampo_hd_info_alloc(rq, hd_per_wq, node); + if (err) +- goto err_shampo_hd; +- err = mlx5e_rq_shampo_hd_info_alloc(rq, node); ++ goto err_shampo_hd_info_alloc; ++ ++ err = mlx5e_create_rq_hd_umr_mkey(mdev, hd_per_wq, ++ &rq->mpwqe.shampo->mkey); + if (err) +- goto err_shampo_info; ++ goto err_umr_mkey; ++ ++ rq->mpwqe.shampo->key = cpu_to_be32(rq->mpwqe.shampo->mkey); ++ rq->mpwqe.shampo->hd_per_wqe = ++ mlx5e_shampo_hd_per_wqe(mdev, params, rqp); ++ wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz)); ++ *pool_size += (rq->mpwqe.shampo->hd_per_wqe * wq_size) / ++ MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; ++ ++ /* gro only data structures */ + rq->hw_gro_data = kvzalloc_node(sizeof(*rq->hw_gro_data), GFP_KERNEL, node); + if (!rq->hw_gro_data) { + err = -ENOMEM; + goto err_hw_gro_data; + } +- rq->mpwqe.shampo->key = +- cpu_to_be32(rq->mpwqe.shampo->mkey); +- rq->mpwqe.shampo->hd_per_wqe = +- mlx5e_shampo_hd_per_wqe(mdev, params, rqp); +- rq->mpwqe.shampo->pages_per_wq = +- rq->mpwqe.shampo->hd_per_wq / MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; +- *pool_size += rq->mpwqe.shampo->pages_per_wq; ++ + return 0; + + err_hw_gro_data: +- mlx5e_rq_shampo_hd_info_free(rq); +-err_shampo_info: + mlx5_core_destroy_mkey(mdev, rq->mpwqe.shampo->mkey); +-err_shampo_hd: +- mlx5e_rq_shampo_hd_free(rq); +-out: ++err_umr_mkey: ++ mlx5e_rq_shampo_hd_info_free(rq); ++err_shampo_hd_info_alloc: ++ kvfree(rq->mpwqe.shampo); + return err; + } + +@@ -811,7 +808,7 @@ static void mlx5e_rq_free_shampo(struct mlx5e_rq *rq) + kvfree(rq->hw_gro_data); + mlx5e_rq_shampo_hd_info_free(rq); + mlx5_core_destroy_mkey(rq->mdev, rq->mpwqe.shampo->mkey); +- mlx5e_rq_shampo_hd_free(rq); ++ kvfree(rq->mpwqe.shampo); + } + + static int mlx5e_alloc_rq(struct mlx5e_params *params, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1406-net-mlx5e-shampo-remove-redundant-params.patch b/SOURCES/1406-net-mlx5e-shampo-remove-redundant-params.patch new file mode 100644 index 000000000..4eb8f63f9 --- /dev/null +++ b/SOURCES/1406-net-mlx5e-shampo-remove-redundant-params.patch @@ -0,0 +1,113 @@ +From 675c166094cf502d5b037ae4a692e7c5c6f6f9fa Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:18 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO: Remove redundant params + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 16142defd304d5a8e591781efe24da498ccfa51f +Author: Saeed Mahameed +Date: Mon Jun 16 17:14:34 2025 +0300 + + net/mlx5e: SHAMPO: Remove redundant params + + Two SHAMPO params are static and always the same, remove them from the + global mlx5e_params struct. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Dragos Tatulea + Signed-off-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 211ea429ea89..581eef34f512 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -278,10 +278,6 @@ enum packet_merge { + struct mlx5e_packet_merge_param { + enum packet_merge type; + u32 timeout; +- struct { +- u8 match_criteria_type; +- u8 alignment_granularity; +- } shampo; + }; + + struct mlx5e_params { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +index 58ec5e44aa7a..fc945bce933a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +@@ -901,6 +901,7 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev, + { + void *rqc = param->rqc; + void *wq = MLX5_ADDR_OF(rqc, rqc, wq); ++ u32 lro_timeout; + int ndsegs = 1; + int err; + +@@ -926,22 +927,25 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev, + MLX5_SET(wq, wq, log_wqe_stride_size, + log_wqe_stride_size - MLX5_MPWQE_LOG_STRIDE_SZ_BASE); + MLX5_SET(wq, wq, log_wq_sz, mlx5e_mpwqe_get_log_rq_size(mdev, params, xsk)); +- if (params->packet_merge.type == MLX5E_PACKET_MERGE_SHAMPO) { +- MLX5_SET(wq, wq, shampo_enable, true); +- MLX5_SET(wq, wq, log_reservation_size, +- mlx5e_shampo_get_log_rsrv_size(mdev, params)); +- MLX5_SET(wq, wq, +- log_max_num_of_packets_per_reservation, +- mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params)); +- MLX5_SET(wq, wq, log_headers_entry_size, +- mlx5e_shampo_get_log_hd_entry_size(mdev, params)); +- MLX5_SET(rqc, rqc, reservation_timeout, +- mlx5e_choose_lro_timeout(mdev, MLX5E_DEFAULT_SHAMPO_TIMEOUT)); +- MLX5_SET(rqc, rqc, shampo_match_criteria_type, +- params->packet_merge.shampo.match_criteria_type); +- MLX5_SET(rqc, rqc, shampo_no_match_alignment_granularity, +- params->packet_merge.shampo.alignment_granularity); +- } ++ if (params->packet_merge.type != MLX5E_PACKET_MERGE_SHAMPO) ++ break; ++ ++ MLX5_SET(wq, wq, shampo_enable, true); ++ MLX5_SET(wq, wq, log_reservation_size, ++ mlx5e_shampo_get_log_rsrv_size(mdev, params)); ++ MLX5_SET(wq, wq, ++ log_max_num_of_packets_per_reservation, ++ mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params)); ++ MLX5_SET(wq, wq, log_headers_entry_size, ++ mlx5e_shampo_get_log_hd_entry_size(mdev, params)); ++ lro_timeout = ++ mlx5e_choose_lro_timeout(mdev, ++ MLX5E_DEFAULT_SHAMPO_TIMEOUT); ++ MLX5_SET(rqc, rqc, reservation_timeout, lro_timeout); ++ MLX5_SET(rqc, rqc, shampo_match_criteria_type, ++ MLX5_RQC_SHAMPO_MATCH_CRITERIA_TYPE_EXTENDED); ++ MLX5_SET(rqc, rqc, shampo_no_match_alignment_granularity, ++ MLX5_RQC_SHAMPO_NO_MATCH_ALIGNMENT_GRANULARITY_STRIDE); + break; + } + default: /* MLX5_WQ_TYPE_CYCLIC */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index a074f1eac3f4..4809fc9e3522 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -4026,10 +4026,6 @@ static int set_feature_hw_gro(struct net_device *netdev, bool enable) + + if (enable) { + new_params.packet_merge.type = MLX5E_PACKET_MERGE_SHAMPO; +- new_params.packet_merge.shampo.match_criteria_type = +- MLX5_RQC_SHAMPO_MATCH_CRITERIA_TYPE_EXTENDED; +- new_params.packet_merge.shampo.alignment_granularity = +- MLX5_RQC_SHAMPO_NO_MATCH_ALIGNMENT_GRANULARITY_STRIDE; + } else if (new_params.packet_merge.type == MLX5E_PACKET_MERGE_SHAMPO) { + new_params.packet_merge.type = MLX5E_PACKET_MERGE_NONE; + } else { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1407-net-mlx5e-shampo-improve-hw-gro-capability-checking.patch b/SOURCES/1407-net-mlx5e-shampo-improve-hw-gro-capability-checking.patch new file mode 100644 index 000000000..e470644ee --- /dev/null +++ b/SOURCES/1407-net-mlx5e-shampo-improve-hw-gro-capability-checking.patch @@ -0,0 +1,68 @@ +From ce0ae8e829ab57ae2acd173f5053cf63c391a4e1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:18 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO: Improve hw gro capability checking + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d2760abdedde635b055a214b3a45dce3e4ecbfce +Author: Saeed Mahameed +Date: Mon Jun 16 17:14:35 2025 +0300 + + net/mlx5e: SHAMPO: Improve hw gro capability checking + + Add missing HW capabilities, declare the feature in + netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev. + No functional change here as all by default disabled features are + explicitly disabled at the bottom of the function. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Dragos Tatulea + Signed-off-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250616141441.1243044-7-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 4809fc9e3522..e552dcf8f13a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -77,7 +77,8 @@ + + static bool mlx5e_hw_gro_supported(struct mlx5_core_dev *mdev) + { +- if (!MLX5_CAP_GEN(mdev, shampo)) ++ if (!MLX5_CAP_GEN(mdev, shampo) || ++ !MLX5_CAP_SHAMPO(mdev, shampo_header_split_data_merge)) + return false; + + /* Our HW-GRO implementation relies on "KSM Mkey" for +@@ -5489,17 +5490,17 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) + MLX5E_MPWRQ_UMR_MODE_ALIGNED)) + netdev->vlan_features |= NETIF_F_LRO; + ++ if (mlx5e_hw_gro_supported(mdev) && ++ mlx5e_check_fragmented_striding_rq_cap(mdev, PAGE_SHIFT, ++ MLX5E_MPWRQ_UMR_MODE_ALIGNED)) ++ netdev->vlan_features |= NETIF_F_GRO_HW; ++ + netdev->hw_features = netdev->vlan_features; + netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX; + netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX; + netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER; + netdev->hw_features |= NETIF_F_HW_VLAN_STAG_TX; + +- if (mlx5e_hw_gro_supported(mdev) && +- mlx5e_check_fragmented_striding_rq_cap(mdev, PAGE_SHIFT, +- MLX5E_MPWRQ_UMR_MODE_ALIGNED)) +- netdev->hw_features |= NETIF_F_GRO_HW; +- + if (mlx5e_tunnel_any_tx_proto_supported(mdev)) { + netdev->hw_enc_features |= NETIF_F_HW_CSUM; + netdev->hw_enc_features |= NETIF_F_TSO; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1408-net-mlx5e-shampo-separate-pool-for-headers.patch b/SOURCES/1408-net-mlx5e-shampo-separate-pool-for-headers.patch new file mode 100644 index 000000000..a341e05f5 --- /dev/null +++ b/SOURCES/1408-net-mlx5e-shampo-separate-pool-for-headers.patch @@ -0,0 +1,304 @@ +From 65f74d0b4614133bc8ed318ed25a4532182f50fc Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:18 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO: Separate pool for headers + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e225d9bd93ed0bb84014f5f8e241e8e456533e30 +Author: Saeed Mahameed +Date: Mon Jun 16 17:14:36 2025 +0300 + + net/mlx5e: SHAMPO: Separate pool for headers + + Allow allocating a separate page pool for headers when SHAMPO is on. + This will be useful for adding support to zc page pool, which has to be + different from the headers page pool. + For now, the pools are the same. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Dragos Tatulea + Signed-off-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250616141441.1243044-8-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 581eef34f512..c329de1d4f0a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -716,7 +716,11 @@ struct mlx5e_rq { + struct bpf_prog __rcu *xdp_prog; + struct mlx5e_xdpsq *xdpsq; + DECLARE_BITMAP(flags, 8); ++ ++ /* page pools */ + struct page_pool *page_pool; ++ struct page_pool *hd_page_pool; ++ + struct mlx5e_xdp_buff mxbuf; + + /* AF_XDP zero-copy */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index e552dcf8f13a..59e845367cfd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -40,6 +40,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -745,6 +746,11 @@ static void mlx5e_rq_shampo_hd_info_free(struct mlx5e_rq *rq) + bitmap_free(rq->mpwqe.shampo->bitmap); + } + ++static bool mlx5_rq_needs_separate_hd_pool(struct mlx5e_rq *rq) ++{ ++ return false; ++} ++ + static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + struct mlx5e_rq_param *rqp, +@@ -753,6 +759,7 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + int node) + { + void *wqc = MLX5_ADDR_OF(rqc, rqp->rqc, wq); ++ u32 hd_pool_size; + u16 hd_per_wq; + int wq_size; + int err; +@@ -780,8 +787,34 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + rq->mpwqe.shampo->hd_per_wqe = + mlx5e_shampo_hd_per_wqe(mdev, params, rqp); + wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz)); +- *pool_size += (rq->mpwqe.shampo->hd_per_wqe * wq_size) / +- MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; ++ hd_pool_size = (rq->mpwqe.shampo->hd_per_wqe * wq_size) / ++ MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; ++ ++ if (mlx5_rq_needs_separate_hd_pool(rq)) { ++ /* Separate page pool for shampo headers */ ++ struct page_pool_params pp_params = { }; ++ ++ pp_params.order = 0; ++ pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV; ++ pp_params.pool_size = hd_pool_size; ++ pp_params.nid = node; ++ pp_params.dev = rq->pdev; ++ pp_params.napi = rq->cq.napi; ++ pp_params.netdev = rq->netdev; ++ pp_params.dma_dir = rq->buff.map_dir; ++ pp_params.max_len = PAGE_SIZE; ++ ++ rq->hd_page_pool = page_pool_create(&pp_params); ++ if (IS_ERR(rq->hd_page_pool)) { ++ err = PTR_ERR(rq->hd_page_pool); ++ rq->hd_page_pool = NULL; ++ goto err_hds_page_pool; ++ } ++ } else { ++ /* Common page pool, reserve space for headers. */ ++ *pool_size += hd_pool_size; ++ rq->hd_page_pool = NULL; ++ } + + /* gro only data structures */ + rq->hw_gro_data = kvzalloc_node(sizeof(*rq->hw_gro_data), GFP_KERNEL, node); +@@ -793,6 +826,8 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + return 0; + + err_hw_gro_data: ++ page_pool_destroy(rq->hd_page_pool); ++err_hds_page_pool: + mlx5_core_destroy_mkey(mdev, rq->mpwqe.shampo->mkey); + err_umr_mkey: + mlx5e_rq_shampo_hd_info_free(rq); +@@ -807,6 +842,8 @@ static void mlx5e_rq_free_shampo(struct mlx5e_rq *rq) + return; + + kvfree(rq->hw_gro_data); ++ if (rq->hd_page_pool != rq->page_pool) ++ page_pool_destroy(rq->hd_page_pool); + mlx5e_rq_shampo_hd_info_free(rq); + mlx5_core_destroy_mkey(rq->mdev, rq->mpwqe.shampo->mkey); + kvfree(rq->mpwqe.shampo); +@@ -938,6 +975,8 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params, + rq->page_pool = NULL; + goto err_free_by_rq_type; + } ++ if (!rq->hd_page_pool) ++ rq->hd_page_pool = rq->page_pool; + if (xdp_rxq_info_is_reg(&rq->xdp_rxq)) + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, + MEM_TYPE_PAGE_POOL, rq->page_pool); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 382679838113..36a4780332d7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -273,12 +273,12 @@ static inline u32 mlx5e_decompress_cqes_start(struct mlx5e_rq *rq, + + #define MLX5E_PAGECNT_BIAS_MAX (PAGE_SIZE / 64) + +-static int mlx5e_page_alloc_fragmented(struct mlx5e_rq *rq, ++static int mlx5e_page_alloc_fragmented(struct page_pool *pool, + struct mlx5e_frag_page *frag_page) + { + struct page *page; + +- page = page_pool_dev_alloc_pages(rq->page_pool); ++ page = page_pool_dev_alloc_pages(pool); + if (unlikely(!page)) + return -ENOMEM; + +@@ -292,14 +292,14 @@ static int mlx5e_page_alloc_fragmented(struct mlx5e_rq *rq, + return 0; + } + +-static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq, ++static void mlx5e_page_release_fragmented(struct page_pool *pool, + struct mlx5e_frag_page *frag_page) + { + u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags; + struct page *page = frag_page->page; + + if (page_pool_unref_page(page, drain_count) == 0) +- page_pool_put_unrefed_page(rq->page_pool, page, -1, true); ++ page_pool_put_unrefed_page(pool, page, -1, true); + } + + static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq, +@@ -313,7 +313,8 @@ static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq, + * offset) should just use the new one without replenishing again + * by themselves. + */ +- err = mlx5e_page_alloc_fragmented(rq, frag->frag_page); ++ err = mlx5e_page_alloc_fragmented(rq->page_pool, ++ frag->frag_page); + + return err; + } +@@ -332,7 +333,7 @@ static inline void mlx5e_put_rx_frag(struct mlx5e_rq *rq, + struct mlx5e_wqe_frag_info *frag) + { + if (mlx5e_frag_can_release(frag)) +- mlx5e_page_release_fragmented(rq, frag->frag_page); ++ mlx5e_page_release_fragmented(rq->page_pool, frag->frag_page); + } + + static inline struct mlx5e_wqe_frag_info *get_frag(struct mlx5e_rq *rq, u16 ix) +@@ -586,7 +587,8 @@ mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi) + struct mlx5e_frag_page *frag_page; + + frag_page = &wi->alloc_units.frag_pages[i]; +- mlx5e_page_release_fragmented(rq, frag_page); ++ mlx5e_page_release_fragmented(rq->page_pool, ++ frag_page); + } + } + } +@@ -681,11 +683,10 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, index); + u64 addr; + +- err = mlx5e_page_alloc_fragmented(rq, frag_page); ++ err = mlx5e_page_alloc_fragmented(rq->hd_page_pool, frag_page); + if (unlikely(err)) + goto err_unmap; + +- + addr = page_pool_get_dma_addr(frag_page->page); + + for (int j = 0; j < MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; j++) { +@@ -717,7 +718,8 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + if (!header_offset) { + struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, index); + +- mlx5e_page_release_fragmented(rq, frag_page); ++ mlx5e_page_release_fragmented(rq->hd_page_pool, ++ frag_page); + } + } + +@@ -793,7 +795,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix) + for (i = 0; i < rq->mpwqe.pages_per_wqe; i++, frag_page++) { + dma_addr_t addr; + +- err = mlx5e_page_alloc_fragmented(rq, frag_page); ++ err = mlx5e_page_alloc_fragmented(rq->page_pool, frag_page); + if (unlikely(err)) + goto err_unmap; + addr = page_pool_get_dma_addr(frag_page->page); +@@ -838,7 +840,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix) + err_unmap: + while (--i >= 0) { + frag_page--; +- mlx5e_page_release_fragmented(rq, frag_page); ++ mlx5e_page_release_fragmented(rq->page_pool, frag_page); + } + + bitmap_fill(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe); +@@ -857,7 +859,7 @@ mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index) + if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) { + struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index); + +- mlx5e_page_release_fragmented(rq, frag_page); ++ mlx5e_page_release_fragmented(rq->hd_page_pool, frag_page); + } + clear_bit(header_index, shampo->bitmap); + } +@@ -1102,6 +1104,8 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq) + + if (rq->page_pool) + page_pool_nid_changed(rq->page_pool, numa_mem_id()); ++ if (rq->hd_page_pool) ++ page_pool_nid_changed(rq->hd_page_pool, numa_mem_id()); + + head = rq->mpwqe.actual_wq_head; + i = missing; +@@ -2010,7 +2014,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + if (prog) { + /* area for bpf_xdp_[store|load]_bytes */ + net_prefetchw(page_address(frag_page->page) + frag_offset); +- if (unlikely(mlx5e_page_alloc_fragmented(rq, &wi->linear_page))) { ++ if (unlikely(mlx5e_page_alloc_fragmented(rq->page_pool, ++ &wi->linear_page))) { + rq->stats->buff_alloc_err++; + return NULL; + } +@@ -2074,7 +2079,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + + wi->linear_page.frags++; + } +- mlx5e_page_release_fragmented(rq, &wi->linear_page); ++ mlx5e_page_release_fragmented(rq->page_pool, ++ &wi->linear_page); + return NULL; /* page/packet was consumed by XDP */ + } + +@@ -2083,13 +2089,14 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0, + mxbuf->xdp.data - mxbuf->xdp.data_meta); + if (unlikely(!skb)) { +- mlx5e_page_release_fragmented(rq, &wi->linear_page); ++ mlx5e_page_release_fragmented(rq->page_pool, ++ &wi->linear_page); + return NULL; + } + + skb_mark_for_recycle(skb); + wi->linear_page.frags++; +- mlx5e_page_release_fragmented(rq, &wi->linear_page); ++ mlx5e_page_release_fragmented(rq->page_pool, &wi->linear_page); + + if (xdp_buff_has_frags(&mxbuf->xdp)) { + struct mlx5e_frag_page *pagep; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1409-net-mlx5e-implement-queue-mgmt-ops-and-single-channel-swap.patch b/SOURCES/1409-net-mlx5e-implement-queue-mgmt-ops-and-single-channel-swap.patch new file mode 100644 index 000000000..a0bf02b42 --- /dev/null +++ b/SOURCES/1409-net-mlx5e-implement-queue-mgmt-ops-and-single-channel-swap.patch @@ -0,0 +1,150 @@ +From d17c81e1bebf992831b10750cc1ee7c6fdc04339 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:18 -0400 +Subject: [PATCH] net/mlx5e: Implement queue mgmt ops and single channel swap + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b2588ea40ec9472688289c1a644627c0f4a1f33f +Author: Saeed Mahameed +Date: Mon Jun 16 17:14:39 2025 +0300 + + net/mlx5e: Implement queue mgmt ops and single channel swap + + The bulk of the work is done in mlx5e_queue_mem_alloc, where we allocate + and create the new channel resources, similar to + mlx5e_safe_switch_params, but here we do it for a single channel using + existing params, sort of a clone channel. + To swap the old channel with the new one, we deactivate and close the + old channel then replace it with the new one, since the swap procedure + doesn't fail in mlx5, we do it all in one place (mlx5e_queue_start). + + Signed-off-by: Saeed Mahameed + Reviewed-by: Dragos Tatulea + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Acked-by: Mina Almasry + Link: https://patch.msgid.link/20250616141441.1243044-11-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 59e845367cfd..9330d90c1f03 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -5479,6 +5479,103 @@ static const struct netdev_stat_ops mlx5e_stat_ops = { + .get_base_stats = mlx5e_get_base_stats, + }; + ++struct mlx5_qmgmt_data { ++ struct mlx5e_channel *c; ++ struct mlx5e_channel_param cparam; ++}; ++ ++static int mlx5e_queue_mem_alloc(struct net_device *dev, void *newq, ++ int queue_index) ++{ ++ struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq; ++ struct mlx5e_priv *priv = netdev_priv(dev); ++ struct mlx5e_channels *chs = &priv->channels; ++ struct mlx5e_params params = chs->params; ++ struct mlx5_core_dev *mdev; ++ int err; ++ ++ mutex_lock(&priv->state_lock); ++ if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { ++ err = -ENODEV; ++ goto unlock; ++ } ++ ++ if (queue_index >= chs->num) { ++ err = -ERANGE; ++ goto unlock; ++ } ++ ++ if (MLX5E_GET_PFLAG(&chs->params, MLX5E_PFLAG_TX_PORT_TS) || ++ chs->params.ptp_rx || ++ chs->params.xdp_prog || ++ priv->htb) { ++ netdev_err(priv->netdev, ++ "Cloning channels with Port/rx PTP, XDP or HTB is not supported\n"); ++ err = -EOPNOTSUPP; ++ goto unlock; ++ } ++ ++ mdev = mlx5_sd_ch_ix_get_dev(priv->mdev, queue_index); ++ err = mlx5e_build_channel_param(mdev, ¶ms, &new->cparam); ++ if (err) ++ goto unlock; ++ ++ err = mlx5e_open_channel(priv, queue_index, ¶ms, NULL, &new->c); ++unlock: ++ mutex_unlock(&priv->state_lock); ++ return err; ++} ++ ++static void mlx5e_queue_mem_free(struct net_device *dev, void *mem) ++{ ++ struct mlx5_qmgmt_data *data = (struct mlx5_qmgmt_data *)mem; ++ ++ /* not supposed to happen since mlx5e_queue_start never fails ++ * but this is how this should be implemented just in case ++ */ ++ if (data->c) ++ mlx5e_close_channel(data->c); ++} ++ ++static int mlx5e_queue_stop(struct net_device *dev, void *oldq, int queue_index) ++{ ++ /* In mlx5 a txq cannot be simply stopped in isolation, only restarted. ++ * mlx5e_queue_start does not fail, we stop the old queue there. ++ * TODO: Improve this. ++ */ ++ return 0; ++} ++ ++static int mlx5e_queue_start(struct net_device *dev, void *newq, ++ int queue_index) ++{ ++ struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq; ++ struct mlx5e_priv *priv = netdev_priv(dev); ++ struct mlx5e_channel *old; ++ ++ mutex_lock(&priv->state_lock); ++ ++ /* stop and close the old */ ++ old = priv->channels.c[queue_index]; ++ mlx5e_deactivate_priv_channels(priv); ++ /* close old before activating new, to avoid napi conflict */ ++ mlx5e_close_channel(old); ++ ++ /* start the new */ ++ priv->channels.c[queue_index] = new->c; ++ mlx5e_activate_priv_channels(priv); ++ mutex_unlock(&priv->state_lock); ++ return 0; ++} ++ ++static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = { ++ .ndo_queue_mem_size = sizeof(struct mlx5_qmgmt_data), ++ .ndo_queue_mem_alloc = mlx5e_queue_mem_alloc, ++ .ndo_queue_mem_free = mlx5e_queue_mem_free, ++ .ndo_queue_start = mlx5e_queue_start, ++ .ndo_queue_stop = mlx5e_queue_stop, ++}; ++ + static void mlx5e_build_nic_netdev(struct net_device *netdev) + { + struct mlx5e_priv *priv = netdev_priv(netdev); +@@ -5489,6 +5586,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) + SET_NETDEV_DEV(netdev, mdev->device); + + netdev->netdev_ops = &mlx5e_netdev_ops; ++ netdev->queue_mgmt_ops = &mlx5e_queue_mgmt_ops; + netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops; + netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1410-net-mlx5e-support-ethtool-tcp-data-split-settings.patch b/SOURCES/1410-net-mlx5e-support-ethtool-tcp-data-split-settings.patch new file mode 100644 index 000000000..4df96dc76 --- /dev/null +++ b/SOURCES/1410-net-mlx5e-support-ethtool-tcp-data-split-settings.patch @@ -0,0 +1,141 @@ +From ccdbf67ee58fe08fc65b7fa79731868afdf21c63 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Wed, 22 Apr 2026 09:42:15 -0400 +Subject: [PATCH] net/mlx5e: Support ethtool tcp-data-split settings + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Add "#include " to avoid build failure. + +commit 46bcce5dfd330c233e59cd5efd7eb43f049b0a82 +Author: Saeed Mahameed +Date: Mon Jun 16 17:14:40 2025 +0300 + + net/mlx5e: Support ethtool tcp-data-split settings + + In mlx5, tcp header-data split requires HW GRO to be on. + + Enabling it fails when HW GRO is off. + mlx5e_fix_features now keeps HW GRO on when tcp data split is enabled. + Finally, when tcp data split is disabled, features are updated to maybe + remove the forced HW GRO. + + Signed-off-by: Saeed Mahameed + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250616141441.1243044-12-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index e6c9338ddae8..ff0b9ab2daa0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -32,6 +32,7 @@ + + #include + #include ++#include + + #include "en.h" + #include "en/channels.h" +@@ -365,11 +366,6 @@ void mlx5e_ethtool_get_ringparam(struct mlx5e_priv *priv, + param->tx_max_pending = 1 << MLX5E_PARAMS_MAXIMUM_LOG_SQ_SIZE; + param->rx_pending = 1 << priv->channels.params.log_rq_mtu_frames; + param->tx_pending = 1 << priv->channels.params.log_sq_size; +- +- kernel_param->tcp_data_split = +- (priv->channels.params.packet_merge.type == MLX5E_PACKET_MERGE_SHAMPO) ? +- ETHTOOL_TCP_DATA_SPLIT_ENABLED : +- ETHTOOL_TCP_DATA_SPLIT_DISABLED; + } + + static void mlx5e_get_ringparam(struct net_device *dev, +@@ -382,6 +378,27 @@ static void mlx5e_get_ringparam(struct net_device *dev, + mlx5e_ethtool_get_ringparam(priv, param, kernel_param); + } + ++static bool mlx5e_ethtool_set_tcp_data_split(struct mlx5e_priv *priv, ++ u8 tcp_data_split, ++ struct netlink_ext_ack *extack) ++{ ++ struct net_device *dev = priv->netdev; ++ ++ if (tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED && ++ !(dev->features & NETIF_F_GRO_HW)) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "TCP-data-split is not supported when GRO HW is disabled"); ++ return false; ++ } ++ ++ /* Might need to disable HW-GRO if it was kept on due to hds. */ ++ if (tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED && ++ dev->cfg->hds_config == ETHTOOL_TCP_DATA_SPLIT_ENABLED) ++ netdev_update_features(priv->netdev); ++ ++ return true; ++} ++ + int mlx5e_ethtool_set_ringparam(struct mlx5e_priv *priv, + struct ethtool_ringparam *param, + struct netlink_ext_ack *extack) +@@ -440,6 +457,11 @@ static int mlx5e_set_ringparam(struct net_device *dev, + { + struct mlx5e_priv *priv = netdev_priv(dev); + ++ if (!mlx5e_ethtool_set_tcp_data_split(priv, ++ kernel_param->tcp_data_split, ++ extack)) ++ return -EINVAL; ++ + return mlx5e_ethtool_set_ringparam(priv, param, extack); + } + +@@ -2645,6 +2667,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = { + ETHTOOL_COALESCE_USE_ADAPTIVE | + ETHTOOL_COALESCE_USE_CQE, + .supported_input_xfrm = RXH_XFRM_SYM_OR_XOR, ++ .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT, + .get_drvinfo = mlx5e_get_drvinfo, + .get_link = ethtool_op_get_link, + .get_link_ext_state = mlx5e_get_link_ext_state, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 9330d90c1f03..4bbf10174fe8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -39,6 +39,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -4392,6 +4393,7 @@ static netdev_features_t mlx5e_fix_uplink_rep_features(struct net_device *netdev + static netdev_features_t mlx5e_fix_features(struct net_device *netdev, + netdev_features_t features) + { ++ struct netdev_config *cfg = netdev->cfg_pending; + struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5e_vlan_table *vlan; + struct mlx5e_params *params; +@@ -4458,6 +4460,13 @@ static netdev_features_t mlx5e_fix_features(struct net_device *netdev, + } + } + ++ /* The header-data split ring param requires HW GRO to stay enabled. */ ++ if (cfg && cfg->hds_config == ETHTOOL_TCP_DATA_SPLIT_ENABLED && ++ !(features & NETIF_F_GRO_HW)) { ++ netdev_warn(netdev, "Keeping HW-GRO enabled, TCP header-data split depends on it\n"); ++ features |= NETIF_F_GRO_HW; ++ } ++ + if (mlx5e_is_uplink_rep(priv)) { + features = mlx5e_fix_uplink_rep_features(netdev, features); + netdev->netns_immutable = true; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1411-net-mlx5-fs-add-multiple-prios-to-rdma-transport-steering-do.patch b/SOURCES/1411-net-mlx5-fs-add-multiple-prios-to-rdma-transport-steering-do.patch new file mode 100644 index 000000000..ab045f171 --- /dev/null +++ b/SOURCES/1411-net-mlx5-fs-add-multiple-prios-to-rdma-transport-steering-do.patch @@ -0,0 +1,103 @@ +From d243df0a0b8c463819ec69b746ed501da46f66e6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:18 -0400 +Subject: [PATCH] net/mlx5: fs, add multiple prios to RDMA TRANSPORT steering + domain + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 52931f55159ea5c27ad4fe66fc0cb8ad75ab795b +Author: Patrisious Haddad +Date: Tue Jun 17 11:19:15 2025 +0300 + + net/mlx5: fs, add multiple prios to RDMA TRANSPORT steering domain + + RDMA TRANSPORT domains were initially limited to a single priority. + This change allows the domains to have multiple priorities, making + it possible to add several rules and control the order in which + they're evaluated. + + Signed-off-by: Patrisious Haddad + Reviewed-by: Mark Bloch + Link: https://patch.msgid.link/b299cbb4c8678a33da6e6b6988b5bf6145c54b88.1750148083.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index b29e67466701..2a855e50be95 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -3250,34 +3250,48 @@ static int + init_rdma_transport_rx_root_ns_one(struct mlx5_flow_steering *steering, + int vport_idx) + { ++ struct mlx5_flow_root_namespace *root_ns; + struct fs_prio *prio; ++ int i; + + steering->rdma_transport_rx_root_ns[vport_idx] = + create_root_ns(steering, FS_FT_RDMA_TRANSPORT_RX); + if (!steering->rdma_transport_rx_root_ns[vport_idx]) + return -ENOMEM; + +- /* create 1 prio*/ +- prio = fs_create_prio(&steering->rdma_transport_rx_root_ns[vport_idx]->ns, +- MLX5_RDMA_TRANSPORT_BYPASS_PRIO, 1); +- return PTR_ERR_OR_ZERO(prio); ++ root_ns = steering->rdma_transport_rx_root_ns[vport_idx]; ++ ++ for (i = 0; i < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; i++) { ++ prio = fs_create_prio(&root_ns->ns, i, 1); ++ if (IS_ERR(prio)) ++ return PTR_ERR(prio); ++ } ++ set_prio_attrs(root_ns); ++ return 0; + } + + static int + init_rdma_transport_tx_root_ns_one(struct mlx5_flow_steering *steering, + int vport_idx) + { ++ struct mlx5_flow_root_namespace *root_ns; + struct fs_prio *prio; ++ int i; + + steering->rdma_transport_tx_root_ns[vport_idx] = + create_root_ns(steering, FS_FT_RDMA_TRANSPORT_TX); + if (!steering->rdma_transport_tx_root_ns[vport_idx]) + return -ENOMEM; + +- /* create 1 prio*/ +- prio = fs_create_prio(&steering->rdma_transport_tx_root_ns[vport_idx]->ns, +- MLX5_RDMA_TRANSPORT_BYPASS_PRIO, 1); +- return PTR_ERR_OR_ZERO(prio); ++ root_ns = steering->rdma_transport_tx_root_ns[vport_idx]; ++ ++ for (i = 0; i < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; i++) { ++ prio = fs_create_prio(&root_ns->ns, i, 1); ++ if (IS_ERR(prio)) ++ return PTR_ERR(prio); ++ } ++ set_prio_attrs(root_ns); ++ return 0; + } + + static int init_rdma_transport_rx_root_ns(struct mlx5_flow_steering *steering) +diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h +index fb5f98fcc726..6ac76a0c3827 100644 +--- a/include/linux/mlx5/fs.h ++++ b/include/linux/mlx5/fs.h +@@ -40,7 +40,7 @@ + + #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) + +-#define MLX5_RDMA_TRANSPORT_BYPASS_PRIO 0 ++#define MLX5_RDMA_TRANSPORT_BYPASS_PRIO 16 + #define MLX5_FS_MAX_POOL_SIZE BIT(30) + + enum mlx5_flow_destination_type { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1412-net-mlx5-small-refactor-for-general-object-capabilities.patch b/SOURCES/1412-net-mlx5-small-refactor-for-general-object-capabilities.patch new file mode 100644 index 000000000..289c6cb8c --- /dev/null +++ b/SOURCES/1412-net-mlx5-small-refactor-for-general-object-capabilities.patch @@ -0,0 +1,75 @@ +From a8ad8f1ee332d30aea6afc35f434dc416e5b574a Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:18 -0400 +Subject: [PATCH] net/mlx5: Small refactor for general object capabilities + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ebf8d47121b6ef3f38425a343a72f37c60fd6dbc +Author: Dragos Tatulea +Date: Thu Jun 19 14:37:17 2025 +0300 + + net/mlx5: Small refactor for general object capabilities + + Make enum for capability bits of general object types depend on + the type definitions themselves. + + Make sure that capabilities in the [64,127] bit range are + properly calculated (type id - 64). + + Signed-off-by: Dragos Tatulea + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250619113721.60201-2-mbloch@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 9521159b0857..4077f0921039 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -12500,17 +12500,6 @@ struct mlx5_ifc_affiliated_event_header_bits { + u8 obj_id[0x20]; + }; + +-enum { +- MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = BIT_ULL(0xc), +- MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC = BIT_ULL(0x13), +- MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT_ULL(0x20), +- MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = BIT_ULL(0x24), +-}; +- +-enum { +- MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL = BIT_ULL(0x13), +-}; +- + enum { + MLX5_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = 0xc, + MLX5_GENERAL_OBJECT_TYPES_IPSEC = 0x13, +@@ -12522,6 +12511,22 @@ enum { + MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS = 0xff15, + }; + ++enum { ++ MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = ++ BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY), ++ MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC = ++ BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_IPSEC), ++ MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = ++ BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_SAMPLER), ++ MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = ++ BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_FLOW_METER_ASO), ++}; ++ ++enum { ++ MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL = ++ BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_RDMA_CTRL - 0x40), ++}; ++ + enum { + MLX5_IPSEC_OBJECT_ICV_LEN_16B, + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1413-net-mlx5-add-ifc-bits-for-pcie-congestion-event-object.patch b/SOURCES/1413-net-mlx5-add-ifc-bits-for-pcie-congestion-event-object.patch new file mode 100644 index 000000000..a4fd88198 --- /dev/null +++ b/SOURCES/1413-net-mlx5-add-ifc-bits-for-pcie-congestion-event-object.patch @@ -0,0 +1,90 @@ +From 1da5dd223dd8bf1658881b80fc0124839b3b927e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:18 -0400 +Subject: [PATCH] net/mlx5: Add IFC bits for PCIe Congestion Event object + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1f6da56679d33c733aaee929fd9af962ad66edbd +Author: Dragos Tatulea +Date: Thu Jun 19 14:37:18 2025 +0300 + + net/mlx5: Add IFC bits for PCIe Congestion Event object + + Add definitions for the PCIe Congestion Event object + and the relevant FW command structures. + + Signed-off-by: Dragos Tatulea + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250619113721.60201-3-mbloch@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 4077f0921039..3ab683020fab 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -12508,6 +12508,7 @@ enum { + MLX5_GENERAL_OBJECT_TYPES_MACSEC = 0x27, + MLX5_GENERAL_OBJECT_TYPES_INT_KEK = 0x47, + MLX5_GENERAL_OBJECT_TYPES_RDMA_CTRL = 0x53, ++ MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT = 0x58, + MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS = 0xff15, + }; + +@@ -12525,6 +12526,8 @@ enum { + enum { + MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL = + BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_RDMA_CTRL - 0x40), ++ MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT = ++ BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT - 0x40), + }; + + enum { +@@ -13283,4 +13286,41 @@ struct mlx5_ifc_mrtcq_reg_bits { + u8 reserved_at_80[0x180]; + }; + ++struct mlx5_ifc_pcie_cong_event_obj_bits { ++ u8 modify_select_field[0x40]; ++ ++ u8 inbound_event_en[0x1]; ++ u8 outbound_event_en[0x1]; ++ u8 reserved_at_42[0x1e]; ++ ++ u8 reserved_at_60[0x1]; ++ u8 inbound_cong_state[0x3]; ++ u8 reserved_at_64[0x1]; ++ u8 outbound_cong_state[0x3]; ++ u8 reserved_at_68[0x18]; ++ ++ u8 inbound_cong_low_threshold[0x10]; ++ u8 inbound_cong_high_threshold[0x10]; ++ ++ u8 outbound_cong_low_threshold[0x10]; ++ u8 outbound_cong_high_threshold[0x10]; ++ ++ u8 reserved_at_e0[0x340]; ++}; ++ ++struct mlx5_ifc_pcie_cong_event_cmd_in_bits { ++ struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr; ++ struct mlx5_ifc_pcie_cong_event_obj_bits cong_obj; ++}; ++ ++struct mlx5_ifc_pcie_cong_event_cmd_out_bits { ++ struct mlx5_ifc_general_obj_out_cmd_hdr_bits hdr; ++ struct mlx5_ifc_pcie_cong_event_obj_bits cong_obj; ++}; ++ ++enum mlx5e_pcie_cong_event_mod_field { ++ MLX5_PCIE_CONG_EVENT_MOD_EVENT_EN = BIT(0), ++ MLX5_PCIE_CONG_EVENT_MOD_THRESH = BIT(2), ++}; ++ + #endif /* MLX5_IFC_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1414-rdma-mlx5-allocate-ib-device-with-net-namespace-supplied-fro.patch b/SOURCES/1414-rdma-mlx5-allocate-ib-device-with-net-namespace-supplied-fro.patch new file mode 100644 index 000000000..7f82b01b0 --- /dev/null +++ b/SOURCES/1414-rdma-mlx5-allocate-ib-device-with-net-namespace-supplied-fro.patch @@ -0,0 +1,99 @@ +From 6fb75c0ba22d1f3c68b6c8be29545b85aae66123 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] RDMA/mlx5: Allocate IB device with net namespace supplied + from core dev + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 611d08207d313500d010d8792424346ce70d0cfb +Author: Mark Bloch +Date: Tue Jun 17 11:44:02 2025 +0300 + + RDMA/mlx5: Allocate IB device with net namespace supplied from core dev + + Use the new ib_alloc_device_with_net() API to allocate the IB device + so that it is properly bound to the network namespace obtained via + mlx5_core_net(). This change ensures correct namespace association + (e.g., for containerized setups). + + Additionally, expose mlx5_core_net so that RDMA driver can use it. + + Signed-off-by: Shay Drory + Signed-off-by: Mark Bloch + Reviewed-by: Parav Pandit + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/ib_rep.c b/drivers/infiniband/hw/mlx5/ib_rep.c +index 49af1cfbe6d1..cc8859d3c2f5 100644 +--- a/drivers/infiniband/hw/mlx5/ib_rep.c ++++ b/drivers/infiniband/hw/mlx5/ib_rep.c +@@ -88,7 +88,8 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep) + else + return mlx5_ib_set_vport_rep(lag_master, rep, vport_index); + +- ibdev = ib_alloc_device(mlx5_ib_dev, ib_dev); ++ ibdev = ib_alloc_device_with_net(mlx5_ib_dev, ib_dev, ++ mlx5_core_net(lag_master)); + if (!ibdev) + return -ENOMEM; + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index f463b8e7cfca..c03743daeaa8 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -4823,7 +4823,8 @@ static struct ib_device *mlx5_ib_add_sub_dev(struct ib_device *parent, + !MLX5_CAP_GEN_2(mparent->mdev, multiplane_qp_ud)) + return ERR_PTR(-EOPNOTSUPP); + +- mplane = ib_alloc_device(mlx5_ib_dev, ib_dev); ++ mplane = ib_alloc_device_with_net(mlx5_ib_dev, ib_dev, ++ mlx5_core_net(mparent->mdev)); + if (!mplane) + return ERR_PTR(-ENOMEM); + +@@ -4937,7 +4938,8 @@ static int mlx5r_probe(struct auxiliary_device *adev, + + num_ports = max(MLX5_CAP_GEN(mdev, num_ports), + MLX5_CAP_GEN(mdev, num_vhca_ports)); +- dev = ib_alloc_device(mlx5_ib_dev, ib_dev); ++ dev = ib_alloc_device_with_net(mlx5_ib_dev, ib_dev, ++ mlx5_core_net(mdev)); + if (!dev) + return -ENOMEM; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h +index 37d5f445598c..b111ccd03b02 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h +@@ -45,11 +45,6 @@ int mlx5_crdump_enable(struct mlx5_core_dev *dev); + void mlx5_crdump_disable(struct mlx5_core_dev *dev); + int mlx5_crdump_collect(struct mlx5_core_dev *dev, u32 *cr_data); + +-static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev) +-{ +- return devlink_net(priv_to_devlink(dev)); +-} +- + static inline struct net_device *mlx5_uplink_netdev_get(struct mlx5_core_dev *mdev) + { + return mdev->mlx5e_res.uplink_netdev; +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index df76aece6be9..39e9146e079d 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -1350,4 +1350,9 @@ enum { + }; + + bool mlx5_wc_support_get(struct mlx5_core_dev *mdev); ++ ++static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev) ++{ ++ return devlink_net(priv_to_devlink(dev)); ++} + #endif /* MLX5_DRIVER_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1415-net-mlx5e-fix-error-handling-in-rq-memory-model-registration.patch b/SOURCES/1415-net-mlx5e-fix-error-handling-in-rq-memory-model-registration.patch new file mode 100644 index 000000000..e219acfd3 --- /dev/null +++ b/SOURCES/1415-net-mlx5e-fix-error-handling-in-rq-memory-model-registration.patch @@ -0,0 +1,57 @@ +From 531f0dfc0d82b91578d110d5c244fa6da3fd2777 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5e: Fix error handling in RQ memory model registration + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7012d4f3c7a82008113974108bf0c9c0553b424a +Author: Fushuai Wang +Date: Thu Jun 26 13:30:03 2025 +0800 + + net/mlx5e: Fix error handling in RQ memory model registration + + Currently when xdp_rxq_info_reg_mem_model() fails in the XSK path, the + error handling incorrectly jumps to err_destroy_page_pool. While this + may not cause errors, we should make it jump to the correct location. + + Signed-off-by: Fushuai Wang + Reviewed-by: Zhu Yanjun + Acked-by: Dragos Tatulea + Signed-off-by: David S. Miller + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 4bbf10174fe8..62db56b5251f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -950,6 +950,8 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params, + if (xsk) { + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, + MEM_TYPE_XSK_BUFF_POOL, NULL); ++ if (err) ++ goto err_free_by_rq_type; + xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq); + } else { + /* Create a page_pool and register it with rxq */ +@@ -978,12 +980,13 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params, + } + if (!rq->hd_page_pool) + rq->hd_page_pool = rq->page_pool; +- if (xdp_rxq_info_is_reg(&rq->xdp_rxq)) ++ if (xdp_rxq_info_is_reg(&rq->xdp_rxq)) { + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, + MEM_TYPE_PAGE_POOL, rq->page_pool); ++ if (err) ++ goto err_destroy_page_pool; ++ } + } +- if (err) +- goto err_destroy_page_pool; + + for (i = 0; i < wq_sz; i++) { + if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1416-net-mlx5-fs-fix-rdma-transport-init-cleanup-flow.patch b/SOURCES/1416-net-mlx5-fs-fix-rdma-transport-init-cleanup-flow.patch new file mode 100644 index 000000000..e25a5deb0 --- /dev/null +++ b/SOURCES/1416-net-mlx5-fs-fix-rdma-transport-init-cleanup-flow.patch @@ -0,0 +1,89 @@ +From a17a4fe081ef8bc70b2779e3c6582c8874c19fe9 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: fs, fix RDMA TRANSPORT init cleanup flow + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 02943ac2f6fbba8fc5e57c57e7cbc2d7c67ebf0d +Author: Patrisious Haddad +Date: Wed Jul 2 13:24:04 2025 +0300 + + net/mlx5: fs, fix RDMA TRANSPORT init cleanup flow + + Failing during the initialization of root_namespace didn't cleanup + the priorities of the namespace on which the failure occurred. + + Properly cleanup said priorities on failure. + + Fixes: 52931f55159e ("net/mlx5: fs, add multiple prios to RDMA TRANSPORT steering domain") + Signed-off-by: Patrisious Haddad + Link: https://patch.msgid.link/78cf89b5d8452caf1e979350b30ada6904362f66.1751451780.git.leon@kernel.org + Reviewed-by: Simon Horman + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 2a855e50be95..02808be0e88b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -3252,6 +3252,7 @@ init_rdma_transport_rx_root_ns_one(struct mlx5_flow_steering *steering, + { + struct mlx5_flow_root_namespace *root_ns; + struct fs_prio *prio; ++ int ret; + int i; + + steering->rdma_transport_rx_root_ns[vport_idx] = +@@ -3263,11 +3264,17 @@ init_rdma_transport_rx_root_ns_one(struct mlx5_flow_steering *steering, + + for (i = 0; i < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; i++) { + prio = fs_create_prio(&root_ns->ns, i, 1); +- if (IS_ERR(prio)) +- return PTR_ERR(prio); ++ if (IS_ERR(prio)) { ++ ret = PTR_ERR(prio); ++ goto err; ++ } + } + set_prio_attrs(root_ns); + return 0; ++ ++err: ++ cleanup_root_ns(root_ns); ++ return ret; + } + + static int +@@ -3276,6 +3283,7 @@ init_rdma_transport_tx_root_ns_one(struct mlx5_flow_steering *steering, + { + struct mlx5_flow_root_namespace *root_ns; + struct fs_prio *prio; ++ int ret; + int i; + + steering->rdma_transport_tx_root_ns[vport_idx] = +@@ -3287,11 +3295,17 @@ init_rdma_transport_tx_root_ns_one(struct mlx5_flow_steering *steering, + + for (i = 0; i < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; i++) { + prio = fs_create_prio(&root_ns->ns, i, 1); +- if (IS_ERR(prio)) +- return PTR_ERR(prio); ++ if (IS_ERR(prio)) { ++ ret = PTR_ERR(prio); ++ goto err; ++ } + } + set_prio_attrs(root_ns); + return 0; ++ ++err: ++ cleanup_root_ns(root_ns); ++ return ret; + } + + static int init_rdma_transport_rx_root_ns(struct mlx5_flow_steering *steering) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1417-net-mlx5-check-device-memory-pointer-before-usage.patch b/SOURCES/1417-net-mlx5-check-device-memory-pointer-before-usage.patch new file mode 100644 index 000000000..0f45deab4 --- /dev/null +++ b/SOURCES/1417-net-mlx5-check-device-memory-pointer-before-usage.patch @@ -0,0 +1,75 @@ +From 3df9c01eaa62ae6dc8508bc6e068aebf01b66645 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: Check device memory pointer before usage + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 70f238c902b8c0461ae6fbb8d1a0bbddc4350eea +Author: Stav Aviram +Date: Tue Jul 1 15:08:12 2025 +0300 + + net/mlx5: Check device memory pointer before usage + + Add a NULL check before accessing device memory to prevent a crash if + dev->dm allocation in mlx5_init_once() fails. + + Fixes: c9b9dcb430b3 ("net/mlx5: Move device memory management to mlx5_core") + Signed-off-by: Stav Aviram + Link: https://patch.msgid.link/c88711327f4d74d5cebc730dc629607e989ca187.1751370035.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c +index b4c97fb62abf..9ded2b7c1e31 100644 +--- a/drivers/infiniband/hw/mlx5/dm.c ++++ b/drivers/infiniband/hw/mlx5/dm.c +@@ -282,7 +282,7 @@ static struct ib_dm *handle_alloc_dm_memic(struct ib_ucontext *ctx, + int err; + u64 address; + +- if (!MLX5_CAP_DEV_MEM(dm_db->dev, memic)) ++ if (!dm_db || !MLX5_CAP_DEV_MEM(dm_db->dev, memic)) + return ERR_PTR(-EOPNOTSUPP); + + dm = kzalloc(sizeof(*dm), GFP_KERNEL); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c +index 7c5516b0a844..8115071c34a4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c +@@ -30,7 +30,7 @@ struct mlx5_dm *mlx5_dm_create(struct mlx5_core_dev *dev) + + dm = kzalloc(sizeof(*dm), GFP_KERNEL); + if (!dm) +- return ERR_PTR(-ENOMEM); ++ return NULL; + + spin_lock_init(&dm->lock); + +@@ -96,7 +96,7 @@ struct mlx5_dm *mlx5_dm_create(struct mlx5_core_dev *dev) + err_steering: + kfree(dm); + +- return ERR_PTR(-ENOMEM); ++ return NULL; + } + + void mlx5_dm_cleanup(struct mlx5_core_dev *dev) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 250f7005e79f..42daaf8387da 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1102,9 +1102,6 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + } + + dev->dm = mlx5_dm_create(dev); +- if (IS_ERR(dev->dm)) +- mlx5_core_warn(dev, "Failed to init device memory %ld\n", PTR_ERR(dev->dm)); +- + dev->tracer = mlx5_fw_tracer_create(dev); + dev->hv_vhca = mlx5_hv_vhca_create(dev); + dev->rsc_dump = mlx5_rsc_dump_create(dev); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1418-net-mlx5-add-no-op-implementation-for-setting-tc-bw-on-rate-.patch b/SOURCES/1418-net-mlx5-add-no-op-implementation-for-setting-tc-bw-on-rate-.patch new file mode 100644 index 000000000..9d4a619c3 --- /dev/null +++ b/SOURCES/1418-net-mlx5-add-no-op-implementation-for-setting-tc-bw-on-rate-.patch @@ -0,0 +1,95 @@ +From 6e33a254f6ca43f7cccbb069278193c54713188e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: Add no-op implementation for setting tc-bw on rate + objects + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 71092821244a6a8e5bce7eb6154b4e5012302194 +Author: Carolina Jubran +Date: Sun Jun 29 17:21:34 2025 +0300 + + net/mlx5: Add no-op implementation for setting tc-bw on rate objects + + Introduce `mlx5_esw_devlink_rate_node_tc_bw_set()` and + `mlx5_esw_devlink_rate_leaf_tc_bw_set()` with no-op logic. + + Future patches will add support for setting traffic class bandwidth + on rate objects. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250629142138.361537-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index 4b536b384fc0..204055be51c0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -376,6 +376,8 @@ static const struct devlink_ops mlx5_devlink_ops = { + .eswitch_encap_mode_get = mlx5_devlink_eswitch_encap_mode_get, + .rate_leaf_tx_share_set = mlx5_esw_devlink_rate_leaf_tx_share_set, + .rate_leaf_tx_max_set = mlx5_esw_devlink_rate_leaf_tx_max_set, ++ .rate_leaf_tc_bw_set = mlx5_esw_devlink_rate_leaf_tc_bw_set, ++ .rate_node_tc_bw_set = mlx5_esw_devlink_rate_node_tc_bw_set, + .rate_node_tx_share_set = mlx5_esw_devlink_rate_node_tx_share_set, + .rate_node_tx_max_set = mlx5_esw_devlink_rate_node_tx_max_set, + .rate_node_new = mlx5_esw_devlink_rate_node_new, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index ad9f6fca9b6a..9da5f94b687d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -906,6 +906,26 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void * + return err; + } + ++int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, ++ void *priv, ++ u32 *tc_bw, ++ struct netlink_ext_ack *extack) ++{ ++ NL_SET_ERR_MSG_MOD(extack, ++ "TC bandwidth shares are not supported on leafs"); ++ return -EOPNOTSUPP; ++} ++ ++int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, ++ void *priv, ++ u32 *tc_bw, ++ struct netlink_ext_ack *extack) ++{ ++ NL_SET_ERR_MSG_MOD(extack, ++ "TC bandwidth shares are not supported on nodes"); ++ return -EOPNOTSUPP; ++} ++ + int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv, + u64 tx_share, struct netlink_ext_ack *extack) + { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h +index ed40ec8f027e..0a50982b0e27 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h +@@ -21,6 +21,14 @@ int mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void + u64 tx_share, struct netlink_ext_ack *extack); + int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *priv, + u64 tx_max, struct netlink_ext_ack *extack); ++int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_node, ++ void *priv, ++ u32 *tc_bw, ++ struct netlink_ext_ack *extack); ++int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, ++ void *priv, ++ u32 *tc_bw, ++ struct netlink_ext_ack *extack); + int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv, + u64 tx_share, struct netlink_ext_ack *extack); + int mlx5_esw_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, void *priv, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1419-net-mlx5-add-support-for-setting-tc-bw-on-nodes.patch b/SOURCES/1419-net-mlx5-add-support-for-setting-tc-bw-on-nodes.patch new file mode 100644 index 000000000..8b3b5ba19 --- /dev/null +++ b/SOURCES/1419-net-mlx5-add-support-for-setting-tc-bw-on-nodes.patch @@ -0,0 +1,467 @@ +From b098b72d471acc38ae06ebd56ec5eb24ee758514 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: Add support for setting tc-bw on nodes +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 96619c485fa69195ea61a9f288a00383f40b5280 +Author: Carolina Jubran +Date: Sun Jun 29 17:21:35 2025 +0300 + + net/mlx5: Add support for setting tc-bw on nodes + + Introduce support for enabling and disabling Traffic Class (TC) + arbitration for existing devlink rate nodes. This patch adds support + for a new scheduling node type, `SCHED_NODE_TYPE_TC_ARBITER_TSAR`. + + Key changes include: + + - New helper functions for transitioning existing rate nodes to TC + arbiter nodes and vice versa. These functions handle the allocation + of TC arbiter nodes, copying of child nodes, and restoring vport QoS + settings when TC arbitration is disabled. + + - Implementation of `mlx5_esw_devlink_rate_node_tc_bw_set()` to manage + tc-bw configuration on nodes. + + - Introduced stubs for `esw_qos_tc_arbiter_scheduling_setup()` and + `esw_qos_tc_arbiter_scheduling_teardown()`, which will be extended in + future patches to provide full support for tc-bw on devlink rate + objects. + + - Validation functions for tc-bw settings, allowing graceful handling + of unsupported traffic class bandwidth configurations. + + - Updated `__esw_qos_alloc_node()` to insert the new node into the + parent’s children list only if the parent is not NULL. For the root + TSAR, the new node is inserted directly after the allocation call. + + - Don't allow `tc-bw` configuration for nodes containing non-leaf + children. + + This patch lays the groundwork for future support for configuring tc-bw + on devlink rate nodes. Although the infrastructure is in place, full + support for tc-bw is not yet implemented; attempts to set tc-bw on + nodes will return `-EOPNOTSUPP`. + + No functional changes are introduced at this stage. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250629142138.361537-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 9da5f94b687d..5394f6bb499e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -64,11 +64,13 @@ static void esw_qos_domain_release(struct mlx5_eswitch *esw) + enum sched_node_type { + SCHED_NODE_TYPE_VPORTS_TSAR, + SCHED_NODE_TYPE_VPORT, ++ SCHED_NODE_TYPE_TC_ARBITER_TSAR, + }; + + static const char * const sched_node_type_str[] = { + [SCHED_NODE_TYPE_VPORTS_TSAR] = "vports TSAR", + [SCHED_NODE_TYPE_VPORT] = "vport", ++ [SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR", + }; + + struct mlx5_esw_sched_node { +@@ -106,6 +108,13 @@ static void esw_qos_node_attach_to_parent(struct mlx5_esw_sched_node *node) + } + } + ++static int esw_qos_num_tcs(struct mlx5_core_dev *dev) ++{ ++ int num_tcs = mlx5_max_tc(dev) + 1; ++ ++ return num_tcs < DEVLINK_RATE_TCS_MAX ? num_tcs : DEVLINK_RATE_TCS_MAX; ++} ++ + static void + esw_qos_node_set_parent(struct mlx5_esw_sched_node *node, struct mlx5_esw_sched_node *parent) + { +@@ -116,6 +125,27 @@ esw_qos_node_set_parent(struct mlx5_esw_sched_node *node, struct mlx5_esw_sched_ + esw_qos_node_attach_to_parent(node); + } + ++static void esw_qos_nodes_set_parent(struct list_head *nodes, ++ struct mlx5_esw_sched_node *parent) ++{ ++ struct mlx5_esw_sched_node *node, *tmp; ++ ++ list_for_each_entry_safe(node, tmp, nodes, entry) { ++ esw_qos_node_set_parent(node, parent); ++ if (!list_empty(&node->children) && ++ parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { ++ struct mlx5_esw_sched_node *child; ++ ++ list_for_each_entry(child, &node->children, entry) { ++ struct mlx5_vport *vport = child->vport; ++ ++ if (vport) ++ vport->qos.sched_node->parent = parent; ++ } ++ } ++ } ++} ++ + void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport) + { + kfree(vport->qos.sched_node); +@@ -141,16 +171,24 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport *vport) + + static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op) + { +- if (node->vport) { ++ switch (node->type) { ++ case SCHED_NODE_TYPE_VPORT: + esw_warn(node->esw->dev, + "E-Switch %s %s scheduling element failed (vport=%d,err=%d)\n", + op, sched_node_type_str[node->type], node->vport->vport, err); +- return; ++ break; ++ case SCHED_NODE_TYPE_TC_ARBITER_TSAR: ++ case SCHED_NODE_TYPE_VPORTS_TSAR: ++ esw_warn(node->esw->dev, ++ "E-Switch %s %s scheduling element failed (err=%d)\n", ++ op, sched_node_type_str[node->type], err); ++ break; ++ default: ++ esw_warn(node->esw->dev, ++ "E-Switch %s scheduling element failed (err=%d)\n", ++ op, err); ++ break; + } +- +- esw_warn(node->esw->dev, +- "E-Switch %s %s scheduling element failed (err=%d)\n", +- op, sched_node_type_str[node->type], err); + } + + static int esw_qos_node_create_sched_element(struct mlx5_esw_sched_node *node, void *ctx, +@@ -388,6 +426,14 @@ __esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsar_ix, enum sched_node_type + node->parent = parent; + INIT_LIST_HEAD(&node->children); + esw_qos_node_attach_to_parent(node); ++ if (!parent) { ++ /* The caller is responsible for inserting the node into the ++ * parent list if necessary. This function can also be used with ++ * a NULL parent, which doesn't necessarily indicate that it ++ * refers to the root scheduling element. ++ */ ++ list_del_init(&node->entry); ++ } + + return node; + } +@@ -426,6 +472,7 @@ __esw_qos_create_vports_sched_node(struct mlx5_eswitch *esw, struct mlx5_esw_sch + goto err_alloc_node; + } + ++ list_add_tail(&node->entry, &esw->qos.domain->nodes); + esw_qos_normalize_min_rate(esw, NULL, extack); + trace_mlx5_esw_node_qos_create(esw->dev, node, node->ix); + +@@ -498,6 +545,9 @@ static int esw_qos_create(struct mlx5_eswitch *esw, struct netlink_ext_ack *exta + SCHED_NODE_TYPE_VPORTS_TSAR, + NULL)) + esw->qos.node0 = ERR_PTR(-ENOMEM); ++ else ++ list_add_tail(&esw->qos.node0->entry, ++ &esw->qos.domain->nodes); + } + if (IS_ERR(esw->qos.node0)) { + err = PTR_ERR(esw->qos.node0); +@@ -555,6 +605,18 @@ static void esw_qos_put(struct mlx5_eswitch *esw) + esw_qos_destroy(esw); + } + ++static void ++esw_qos_tc_arbiter_scheduling_teardown(struct mlx5_esw_sched_node *node, ++ struct netlink_ext_ack *extack) ++{} ++ ++static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node, ++ struct netlink_ext_ack *extack) ++{ ++ NL_SET_ERR_MSG_MOD(extack, "TC arbiter elements are not supported."); ++ return -EOPNOTSUPP; ++} ++ + static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack) + { + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; +@@ -723,6 +785,195 @@ static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw + return err; + } + ++static void ++esw_qos_switch_vport_tcs_to_vport(struct mlx5_esw_sched_node *tc_arbiter_node, ++ struct mlx5_esw_sched_node *node, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vports_tc_node, *vport_tc_node, *tmp; ++ ++ vports_tc_node = list_first_entry(&tc_arbiter_node->children, ++ struct mlx5_esw_sched_node, ++ entry); ++ ++ list_for_each_entry_safe(vport_tc_node, tmp, &vports_tc_node->children, ++ entry) ++ esw_qos_vport_update_parent(vport_tc_node->vport, node, extack); ++} ++ ++static int esw_qos_switch_tc_arbiter_node_to_vports( ++ struct mlx5_esw_sched_node *tc_arbiter_node, ++ struct mlx5_esw_sched_node *node, ++ struct netlink_ext_ack *extack) ++{ ++ u32 parent_tsar_ix = node->parent ? ++ node->parent->ix : node->esw->qos.root_tsar_ix; ++ int err; ++ ++ err = esw_qos_create_node_sched_elem(node->esw->dev, parent_tsar_ix, ++ node->max_rate, node->bw_share, ++ &node->ix); ++ if (err) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Failed to create scheduling element for vports node when disabliing vports TC QoS"); ++ return err; ++ } ++ ++ node->type = SCHED_NODE_TYPE_VPORTS_TSAR; ++ ++ /* Disable TC QoS for vports in the arbiter node. */ ++ esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, extack); ++ ++ return 0; ++} ++ ++static int esw_qos_switch_vports_node_to_tc_arbiter( ++ struct mlx5_esw_sched_node *node, ++ struct mlx5_esw_sched_node *tc_arbiter_node, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vport_node, *tmp; ++ struct mlx5_vport *vport; ++ int err; ++ ++ /* Enable TC QoS for each vport in the node. */ ++ list_for_each_entry_safe(vport_node, tmp, &node->children, entry) { ++ vport = vport_node->vport; ++ err = esw_qos_vport_update_parent(vport, tc_arbiter_node, ++ extack); ++ if (err) ++ goto err_out; ++ } ++ ++ /* Destroy the current vports node TSAR. */ ++ err = mlx5_destroy_scheduling_element_cmd(node->esw->dev, ++ SCHEDULING_HIERARCHY_E_SWITCH, ++ node->ix); ++ if (err) ++ goto err_out; ++ ++ return 0; ++err_out: ++ /* Restore vports back into the node if an error occurs. */ ++ esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, NULL); ++ ++ return err; ++} ++ ++static struct mlx5_esw_sched_node * ++esw_qos_move_node(struct mlx5_esw_sched_node *curr_node) ++{ ++ struct mlx5_esw_sched_node *new_node; ++ ++ new_node = __esw_qos_alloc_node(curr_node->esw, curr_node->ix, ++ curr_node->type, NULL); ++ if (!IS_ERR(new_node)) ++ esw_qos_nodes_set_parent(&curr_node->children, new_node); ++ ++ return new_node; ++} ++ ++static int esw_qos_node_disable_tc_arbitration(struct mlx5_esw_sched_node *node, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *curr_node; ++ int err; ++ ++ if (node->type != SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ return 0; ++ ++ /* Allocate a new rate node to hold the current state, which will allow ++ * for restoring the vports back to this node after disabling TC ++ * arbitration. ++ */ ++ curr_node = esw_qos_move_node(node); ++ if (IS_ERR(curr_node)) { ++ NL_SET_ERR_MSG_MOD(extack, "Failed setting up vports node"); ++ return PTR_ERR(curr_node); ++ } ++ ++ /* Disable TC QoS for all vports, and assign them back to the node. */ ++ err = esw_qos_switch_tc_arbiter_node_to_vports(curr_node, node, extack); ++ if (err) ++ goto err_out; ++ ++ /* Clean up the TC arbiter node after disabling TC QoS for vports. */ ++ esw_qos_tc_arbiter_scheduling_teardown(curr_node, extack); ++ goto out; ++err_out: ++ esw_qos_nodes_set_parent(&curr_node->children, node); ++out: ++ __esw_qos_free_node(curr_node); ++ return err; ++} ++ ++static int esw_qos_node_enable_tc_arbitration(struct mlx5_esw_sched_node *node, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *curr_node, *child; ++ int err, new_level, max_level; ++ ++ if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ return 0; ++ ++ /* Increase the hierarchy level by one to account for the additional ++ * vports TC scheduling node, and verify that the new level does not ++ * exceed the maximum allowed depth. ++ */ ++ new_level = node->level + 1; ++ max_level = 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); ++ if (new_level > max_level) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "TC arbitration on nodes is not supported beyond max scheduling depth"); ++ return -EOPNOTSUPP; ++ } ++ ++ /* Ensure the node does not contain non-leaf children before assigning ++ * TC bandwidth. ++ */ ++ if (!list_empty(&node->children)) { ++ list_for_each_entry(child, &node->children, entry) { ++ if (!child->vport) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Cannot configure TC bandwidth on a node with non-leaf children"); ++ return -EOPNOTSUPP; ++ } ++ } ++ } ++ ++ /* Allocate a new node that will store the information of the current ++ * node. This will be used later to restore the node if necessary. ++ */ ++ curr_node = esw_qos_move_node(node); ++ if (IS_ERR(curr_node)) { ++ NL_SET_ERR_MSG_MOD(extack, "Failed setting up node TC QoS"); ++ return PTR_ERR(curr_node); ++ } ++ ++ /* Initialize the TC arbiter node for QoS management. ++ * This step prepares the node for handling Traffic Class arbitration. ++ */ ++ err = esw_qos_tc_arbiter_scheduling_setup(node, extack); ++ if (err) ++ goto err_setup; ++ ++ /* Enable TC QoS for each vport within the current node. */ ++ err = esw_qos_switch_vports_node_to_tc_arbiter(curr_node, node, extack); ++ if (err) ++ goto err_switch_vports; ++ goto out; ++ ++err_switch_vports: ++ esw_qos_tc_arbiter_scheduling_teardown(node, NULL); ++ node->ix = curr_node->ix; ++ node->type = curr_node->type; ++err_setup: ++ esw_qos_nodes_set_parent(&curr_node->children, node); ++out: ++ __esw_qos_free_node(curr_node); ++ return err; ++} ++ + static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *mdev) + { + struct ethtool_link_ksettings lksettings; +@@ -848,6 +1099,31 @@ static int esw_qos_devlink_rate_to_mbps(struct mlx5_core_dev *mdev, const char * + return 0; + } + ++static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, ++ u32 *tc_bw) ++{ ++ int i, num_tcs = esw_qos_num_tcs(esw->dev); ++ ++ for (i = num_tcs; i < DEVLINK_RATE_TCS_MAX; i++) { ++ if (tc_bw[i]) ++ return false; ++ } ++ ++ return true; ++} ++ ++static bool esw_qos_tc_bw_disabled(u32 *tc_bw) ++{ ++ int i; ++ ++ for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) { ++ if (tc_bw[i]) ++ return false; ++ } ++ ++ return true; ++} ++ + int mlx5_esw_qos_init(struct mlx5_eswitch *esw) + { + if (esw->qos.domain) +@@ -921,9 +1197,28 @@ int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, + u32 *tc_bw, + struct netlink_ext_ack *extack) + { +- NL_SET_ERR_MSG_MOD(extack, +- "TC bandwidth shares are not supported on nodes"); +- return -EOPNOTSUPP; ++ struct mlx5_esw_sched_node *node = priv; ++ struct mlx5_eswitch *esw = node->esw; ++ bool disable; ++ int err; ++ ++ if (!esw_qos_validate_unsupported_tc_bw(esw, tc_bw)) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "E-Switch traffic classes number is not supported"); ++ return -EOPNOTSUPP; ++ } ++ ++ disable = esw_qos_tc_bw_disabled(tc_bw); ++ esw_qos_lock(esw); ++ if (disable) { ++ err = esw_qos_node_disable_tc_arbitration(node, extack); ++ goto unlock; ++ } ++ ++ err = esw_qos_node_enable_tc_arbitration(node, extack); ++unlock: ++ esw_qos_unlock(esw); ++ return err; + } + + int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1420-net-mlx5-add-traffic-class-scheduling-support-for-vport-qos.patch b/SOURCES/1420-net-mlx5-add-traffic-class-scheduling-support-for-vport-qos.patch new file mode 100644 index 000000000..cdb8e71d8 --- /dev/null +++ b/SOURCES/1420-net-mlx5-add-traffic-class-scheduling-support-for-vport-qos.patch @@ -0,0 +1,687 @@ +From ef32c9409c6ac247f0f571e1c8aea433fee8e101 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: Add traffic class scheduling support for vport QoS + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 97733d1e00a001b1708247af366280154df83b93 +Author: Carolina Jubran +Date: Sun Jun 29 17:21:36 2025 +0300 + + net/mlx5: Add traffic class scheduling support for vport QoS + + Introduce support for traffic class (TC) scheduling on vports by + allowing the vport to own multiple TC scheduling nodes. This patch + enables more granular control of QoS by defining three distinct QoS + states for vports, each providing unique scheduling behavior: + + 1. Regular QoS: The `sched_node` represents the vport directly, + handling QoS as a single scheduling entity. + 2. TC QoS on the vport: The `sched_node` acts as a TC arbiter, enabling + TC scheduling directly on the vport. + 3. TC QoS on the parent node: The `sched_node` functions as a rate + limiter, with TC arbitration enabled at the parent level, associating + multiple scheduling nodes with each vport. + + Key changes include: + + - Added support for new scheduling elements, vport traffic class and + rate limiter. + + - New helper functions for creating, destroying, and restoring vport TC + scheduling nodes, handling transitions between regular QoS and TC + arbitration states. + + - Updated `esw_qos_vport_enable()` and `esw_qos_vport_disable()` to + support both regular QoS and TC arbitration states, ensuring consistent + transitions between scheduling modes. + + - Introduced a `sched_nodes` array under `vport->qos` to store multiple + TC scheduling nodes per vport, enabling finer control over per-TC QoS. + + - Enhanced `esw_qos_vport_update_parent()` to handle transitions between + the three QoS states based on the current and new parent node types. + + This patch lays the groundwork for future support for configuring tc-bw + on vports. Although the infrastructure is in place, full support for + tc-bw is not yet implemented; attempts to set tc-bw on vports will + return `-EOPNOTSUPP`. + + No functional changes are introduced at this stage. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250629142138.361537-7-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 5394f6bb499e..120d19eeb46b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -65,12 +65,16 @@ enum sched_node_type { + SCHED_NODE_TYPE_VPORTS_TSAR, + SCHED_NODE_TYPE_VPORT, + SCHED_NODE_TYPE_TC_ARBITER_TSAR, ++ SCHED_NODE_TYPE_RATE_LIMITER, ++ SCHED_NODE_TYPE_VPORT_TC, + }; + + static const char * const sched_node_type_str[] = { + [SCHED_NODE_TYPE_VPORTS_TSAR] = "vports TSAR", + [SCHED_NODE_TYPE_VPORT] = "vport", + [SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR", ++ [SCHED_NODE_TYPE_RATE_LIMITER] = "Rate Limiter", ++ [SCHED_NODE_TYPE_VPORT_TC] = "vport TC", + }; + + struct mlx5_esw_sched_node { +@@ -94,6 +98,8 @@ struct mlx5_esw_sched_node { + struct mlx5_vport *vport; + /* Level in the hierarchy. The root node level is 1. */ + u8 level; ++ /* Valid only when this node represents a traffic class. */ ++ u8 tc; + }; + + static void esw_qos_node_attach_to_parent(struct mlx5_esw_sched_node *node) +@@ -148,6 +154,15 @@ static void esw_qos_nodes_set_parent(struct list_head *nodes, + + void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport) + { ++ if (vport->qos.sched_nodes) { ++ int num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev); ++ int i; ++ ++ for (i = 0; i < num_tcs; i++) ++ kfree(vport->qos.sched_nodes[i]); ++ kfree(vport->qos.sched_nodes); ++ } ++ + kfree(vport->qos.sched_node); + memset(&vport->qos, 0, sizeof(vport->qos)); + } +@@ -172,11 +187,19 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport *vport) + static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op) + { + switch (node->type) { ++ case SCHED_NODE_TYPE_VPORT_TC: ++ esw_warn(node->esw->dev, ++ "E-Switch %s %s scheduling element failed (vport=%d,tc=%d,err=%d)\n", ++ op, ++ sched_node_type_str[node->type], ++ node->vport->vport, node->tc, err); ++ break; + case SCHED_NODE_TYPE_VPORT: + esw_warn(node->esw->dev, + "E-Switch %s %s scheduling element failed (vport=%d,err=%d)\n", + op, sched_node_type_str[node->type], node->vport->vport, err); + break; ++ case SCHED_NODE_TYPE_RATE_LIMITER: + case SCHED_NODE_TYPE_TC_ARBITER_TSAR: + case SCHED_NODE_TYPE_VPORTS_TSAR: + esw_warn(node->esw->dev, +@@ -271,6 +294,24 @@ static int esw_qos_sched_elem_config(struct mlx5_esw_sched_node *node, u32 max_r + return 0; + } + ++static int esw_qos_create_rate_limit_element(struct mlx5_esw_sched_node *node, ++ struct netlink_ext_ack *extack) ++{ ++ u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; ++ ++ if (!mlx5_qos_element_type_supported( ++ node->esw->dev, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT, ++ SCHEDULING_HIERARCHY_E_SWITCH)) ++ return -EOPNOTSUPP; ++ ++ MLX5_SET(scheduling_context, sched_ctx, max_average_bw, node->max_rate); ++ MLX5_SET(scheduling_context, sched_ctx, element_type, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT); ++ ++ return esw_qos_node_create_sched_element(node, sched_ctx, extack); ++} ++ + static u32 esw_qos_calculate_min_rate_divider(struct mlx5_eswitch *esw, + struct mlx5_esw_sched_node *parent) + { +@@ -388,28 +429,64 @@ esw_qos_create_node_sched_elem(struct mlx5_core_dev *dev, u32 parent_element_id, + tsar_ix); + } + +-static int esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_node, +- struct netlink_ext_ack *extack) ++static int ++esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_node, ++ struct netlink_ext_ack *extack) + { + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; + struct mlx5_core_dev *dev = vport_node->esw->dev; + void *attr; + +- if (!mlx5_qos_element_type_supported(dev, +- SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT, +- SCHEDULING_HIERARCHY_E_SWITCH)) ++ if (!mlx5_qos_element_type_supported( ++ dev, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT, ++ SCHEDULING_HIERARCHY_E_SWITCH)) + return -EOPNOTSUPP; + + MLX5_SET(scheduling_context, sched_ctx, element_type, + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT); + attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); + MLX5_SET(vport_element, attr, vport_number, vport_node->vport->vport); +- MLX5_SET(scheduling_context, sched_ctx, parent_element_id, vport_node->parent->ix); +- MLX5_SET(scheduling_context, sched_ctx, max_average_bw, vport_node->max_rate); ++ MLX5_SET(scheduling_context, sched_ctx, parent_element_id, ++ vport_node->parent->ix); ++ MLX5_SET(scheduling_context, sched_ctx, max_average_bw, ++ vport_node->max_rate); + + return esw_qos_node_create_sched_element(vport_node, sched_ctx, extack); + } + ++static int ++esw_qos_vport_tc_create_sched_element(struct mlx5_esw_sched_node *vport_tc_node, ++ u32 rate_limit_elem_ix, ++ struct netlink_ext_ack *extack) ++{ ++ u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; ++ struct mlx5_core_dev *dev = vport_tc_node->esw->dev; ++ void *attr; ++ ++ if (!mlx5_qos_element_type_supported( ++ dev, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC, ++ SCHEDULING_HIERARCHY_E_SWITCH)) ++ return -EOPNOTSUPP; ++ ++ MLX5_SET(scheduling_context, sched_ctx, element_type, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC); ++ attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); ++ MLX5_SET(vport_tc_element, attr, vport_number, ++ vport_tc_node->vport->vport); ++ MLX5_SET(vport_tc_element, attr, traffic_class, vport_tc_node->tc); ++ MLX5_SET(scheduling_context, sched_ctx, max_bw_obj_id, ++ rate_limit_elem_ix); ++ MLX5_SET(scheduling_context, sched_ctx, parent_element_id, ++ vport_tc_node->parent->ix); ++ MLX5_SET(scheduling_context, sched_ctx, bw_share, ++ vport_tc_node->bw_share); ++ ++ return esw_qos_node_create_sched_element(vport_tc_node, sched_ctx, ++ extack); ++} ++ + static struct mlx5_esw_sched_node * + __esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsar_ix, enum sched_node_type type, + struct mlx5_esw_sched_node *parent) +@@ -617,12 +694,202 @@ static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node, + return -EOPNOTSUPP; + } + ++static int ++esw_qos_create_vport_tc_sched_node(struct mlx5_vport *vport, ++ u32 rate_limit_elem_ix, ++ struct mlx5_esw_sched_node *vports_tc_node, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; ++ struct mlx5_esw_sched_node *vport_tc_node; ++ u8 tc = vports_tc_node->tc; ++ int err; ++ ++ vport_tc_node = __esw_qos_alloc_node(vport_node->esw, 0, ++ SCHED_NODE_TYPE_VPORT_TC, ++ vports_tc_node); ++ if (!vport_tc_node) ++ return -ENOMEM; ++ ++ vport_tc_node->min_rate = vport_node->min_rate; ++ vport_tc_node->tc = tc; ++ vport_tc_node->vport = vport; ++ err = esw_qos_vport_tc_create_sched_element(vport_tc_node, ++ rate_limit_elem_ix, ++ extack); ++ if (err) ++ goto err_out; ++ ++ vport->qos.sched_nodes[tc] = vport_tc_node; ++ ++ return 0; ++err_out: ++ __esw_qos_free_node(vport_tc_node); ++ return err; ++} ++ ++static void ++esw_qos_destroy_vport_tc_sched_elements(struct mlx5_vport *vport, ++ struct netlink_ext_ack *extack) ++{ ++ int i, num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev); ++ ++ for (i = 0; i < num_tcs; i++) { ++ if (vport->qos.sched_nodes[i]) { ++ __esw_qos_destroy_node(vport->qos.sched_nodes[i], ++ extack); ++ } ++ } ++ ++ kfree(vport->qos.sched_nodes); ++ vport->qos.sched_nodes = NULL; ++} ++ ++static int ++esw_qos_create_vport_tc_sched_elements(struct mlx5_vport *vport, ++ enum sched_node_type type, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; ++ struct mlx5_esw_sched_node *tc_arbiter_node, *vports_tc_node; ++ int err, num_tcs = esw_qos_num_tcs(vport_node->esw->dev); ++ u32 rate_limit_elem_ix; ++ ++ vport->qos.sched_nodes = kcalloc(num_tcs, ++ sizeof(struct mlx5_esw_sched_node *), ++ GFP_KERNEL); ++ if (!vport->qos.sched_nodes) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Allocating the vport TC scheduling elements failed."); ++ return -ENOMEM; ++ } ++ ++ rate_limit_elem_ix = type == SCHED_NODE_TYPE_RATE_LIMITER ? ++ vport_node->ix : 0; ++ tc_arbiter_node = type == SCHED_NODE_TYPE_RATE_LIMITER ? ++ vport_node->parent : vport_node; ++ list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) { ++ err = esw_qos_create_vport_tc_sched_node(vport, ++ rate_limit_elem_ix, ++ vports_tc_node, ++ extack); ++ if (err) ++ goto err_create_vport_tc; ++ } ++ ++ return 0; ++ ++err_create_vport_tc: ++ esw_qos_destroy_vport_tc_sched_elements(vport, NULL); ++ ++ return err; ++} ++ ++static int ++esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type type, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; ++ int err, new_level, max_level; ++ ++ if (type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { ++ /* Increase the parent's level by 2 to account for both the ++ * TC arbiter and the vports TC scheduling element. ++ */ ++ new_level = vport_node->parent->level + 2; ++ max_level = 1 << MLX5_CAP_QOS(vport_node->esw->dev, ++ log_esw_max_sched_depth); ++ if (new_level > max_level) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "TC arbitration on leafs is not supported beyond max scheduling depth"); ++ return -EOPNOTSUPP; ++ } ++ } ++ ++ esw_assert_qos_lock_held(vport->dev->priv.eswitch); ++ ++ if (type == SCHED_NODE_TYPE_RATE_LIMITER) ++ err = esw_qos_create_rate_limit_element(vport_node, extack); ++ else ++ err = esw_qos_tc_arbiter_scheduling_setup(vport_node, extack); ++ if (err) ++ return err; ++ ++ /* Rate limiters impact multiple nodes not directly connected to them ++ * and are not direct members of the QoS hierarchy. ++ * Unlink it from the parent to reflect that. ++ */ ++ if (type == SCHED_NODE_TYPE_RATE_LIMITER) { ++ list_del_init(&vport_node->entry); ++ vport_node->level = 0; ++ } ++ ++ err = esw_qos_create_vport_tc_sched_elements(vport, type, extack); ++ if (err) ++ goto err_sched_nodes; ++ ++ return 0; ++ ++err_sched_nodes: ++ if (type == SCHED_NODE_TYPE_RATE_LIMITER) { ++ esw_qos_node_destroy_sched_element(vport_node, NULL); ++ list_add_tail(&vport_node->entry, ++ &vport_node->parent->children); ++ vport_node->level = vport_node->parent->level + 1; ++ } else { ++ esw_qos_tc_arbiter_scheduling_teardown(vport_node, NULL); ++ } ++ return err; ++} ++ ++static void esw_qos_vport_tc_disable(struct mlx5_vport *vport, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; ++ enum sched_node_type curr_type = vport_node->type; ++ ++ esw_qos_destroy_vport_tc_sched_elements(vport, extack); ++ ++ if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER) ++ esw_qos_node_destroy_sched_element(vport_node, extack); ++ else ++ esw_qos_tc_arbiter_scheduling_teardown(vport_node, extack); ++} ++ ++static int esw_qos_set_vport_tcs_min_rate(struct mlx5_vport *vport, ++ u32 min_rate, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; ++ int err, i, num_tcs = esw_qos_num_tcs(vport_node->esw->dev); ++ ++ for (i = 0; i < num_tcs; i++) { ++ err = esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], ++ min_rate, extack); ++ if (err) ++ goto err_out; ++ } ++ vport_node->min_rate = min_rate; ++ ++ return 0; ++err_out: ++ for (--i; i >= 0; i--) { ++ esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], ++ vport_node->min_rate, extack); ++ } ++ return err; ++} ++ + static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack) + { + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; + struct mlx5_esw_sched_node *parent = vport_node->parent; ++ enum sched_node_type curr_type = vport_node->type; + +- esw_qos_node_destroy_sched_element(vport_node, extack); ++ if (curr_type == SCHED_NODE_TYPE_VPORT) ++ esw_qos_node_destroy_sched_element(vport_node, extack); ++ else ++ esw_qos_vport_tc_disable(vport, extack); + + vport_node->bw_share = 0; + list_del_init(&vport_node->entry); +@@ -631,7 +898,9 @@ static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_a + trace_mlx5_esw_vport_qos_destroy(vport_node->esw->dev, vport); + } + +-static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent, ++static int esw_qos_vport_enable(struct mlx5_vport *vport, ++ enum sched_node_type type, ++ struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) + { + int err; +@@ -639,10 +908,16 @@ static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_sched_ + esw_assert_qos_lock_held(vport->dev->priv.eswitch); + + esw_qos_node_set_parent(vport->qos.sched_node, parent); +- err = esw_qos_vport_create_sched_element(vport->qos.sched_node, extack); ++ if (type == SCHED_NODE_TYPE_VPORT) { ++ err = esw_qos_vport_create_sched_element(vport->qos.sched_node, ++ extack); ++ } else { ++ err = esw_qos_vport_tc_enable(vport, type, extack); ++ } + if (err) + return err; + ++ vport->qos.sched_node->type = type; + esw_qos_normalize_min_rate(parent->esw, parent, extack); + trace_mlx5_esw_vport_qos_create(vport->dev, vport, + vport->qos.sched_node->max_rate, +@@ -673,9 +948,8 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t + sched_node->min_rate = min_rate; + sched_node->vport = vport; + vport->qos.sched_node = sched_node; +- err = esw_qos_vport_enable(vport, parent, extack); ++ err = esw_qos_vport_enable(vport, type, parent, extack); + if (err) { +- __esw_qos_free_node(sched_node); + esw_qos_put(esw); + vport->qos.sched_node = NULL; + } +@@ -728,6 +1002,8 @@ static int mlx5_esw_qos_set_vport_min_rate(struct mlx5_vport *vport, u32 min_rat + if (!vport_node) + return mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, NULL, 0, min_rate, + extack); ++ else if (vport_node->type == SCHED_NODE_TYPE_RATE_LIMITER) ++ return esw_qos_set_vport_tcs_min_rate(vport, min_rate, extack); + else + return esw_qos_set_node_min_rate(vport_node, min_rate, extack); + } +@@ -760,12 +1036,60 @@ bool mlx5_esw_qos_get_vport_rate(struct mlx5_vport *vport, u32 *max_rate, u32 *m + return enabled; + } + ++static int esw_qos_vport_tc_check_type(enum sched_node_type curr_type, ++ enum sched_node_type new_type, ++ struct netlink_ext_ack *extack) ++{ ++ if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && ++ new_type == SCHED_NODE_TYPE_RATE_LIMITER) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Cannot switch from vport-level TC arbitration to node-level TC arbitration"); ++ return -EOPNOTSUPP; ++ } ++ ++ if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER && ++ new_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Cannot switch from node-level TC arbitration to vport-level TC arbitration"); ++ return -EOPNOTSUPP; ++ } ++ ++ return 0; ++} ++ ++static int esw_qos_vport_update(struct mlx5_vport *vport, ++ enum sched_node_type type, ++ struct mlx5_esw_sched_node *parent, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent; ++ enum sched_node_type curr_type = vport->qos.sched_node->type; ++ int err; ++ ++ esw_assert_qos_lock_held(vport->dev->priv.eswitch); ++ parent = parent ?: curr_parent; ++ if (curr_type == type && curr_parent == parent) ++ return 0; ++ ++ err = esw_qos_vport_tc_check_type(curr_type, type, extack); ++ if (err) ++ return err; ++ ++ esw_qos_vport_disable(vport, extack); ++ ++ err = esw_qos_vport_enable(vport, type, parent, extack); ++ if (err) ++ esw_qos_vport_enable(vport, curr_type, curr_parent, NULL); ++ ++ return err; ++} ++ + static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) + { + struct mlx5_eswitch *esw = vport->dev->priv.eswitch; + struct mlx5_esw_sched_node *curr_parent; +- int err; ++ enum sched_node_type type; + + esw_assert_qos_lock_held(esw); + curr_parent = vport->qos.sched_node->parent; +@@ -773,16 +1097,17 @@ static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw + if (curr_parent == parent) + return 0; + +- esw_qos_vport_disable(vport, extack); +- +- err = esw_qos_vport_enable(vport, parent, extack); +- if (err) { +- if (esw_qos_vport_enable(vport, curr_parent, NULL)) +- esw_warn(parent->esw->dev, "vport restore QoS failed (vport=%d)\n", +- vport->vport); +- } ++ /* Set vport QoS type based on parent node type if different from ++ * default QoS; otherwise, use the vport's current QoS type. ++ */ ++ if (parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ type = SCHED_NODE_TYPE_RATE_LIMITER; ++ else if (curr_parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ type = SCHED_NODE_TYPE_VPORT; ++ else ++ type = vport->qos.sched_node->type; + +- return err; ++ return esw_qos_vport_update(vport, type, parent, extack); + } + + static void +@@ -1112,6 +1437,16 @@ static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, + return true; + } + ++static bool esw_qos_vport_validate_unsupported_tc_bw(struct mlx5_vport *vport, ++ u32 *tc_bw) ++{ ++ struct mlx5_eswitch *esw = vport->qos.sched_node ? ++ vport->qos.sched_node->parent->esw : ++ vport->dev->priv.eswitch; ++ ++ return esw_qos_validate_unsupported_tc_bw(esw, tc_bw); ++} ++ + static bool esw_qos_tc_bw_disabled(u32 *tc_bw) + { + int i; +@@ -1187,9 +1522,50 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, + u32 *tc_bw, + struct netlink_ext_ack *extack) + { +- NL_SET_ERR_MSG_MOD(extack, +- "TC bandwidth shares are not supported on leafs"); +- return -EOPNOTSUPP; ++ struct mlx5_esw_sched_node *vport_node; ++ struct mlx5_vport *vport = priv; ++ struct mlx5_eswitch *esw; ++ bool disable; ++ int err = 0; ++ ++ esw = vport->dev->priv.eswitch; ++ if (!mlx5_esw_allowed(esw)) ++ return -EPERM; ++ ++ disable = esw_qos_tc_bw_disabled(tc_bw); ++ esw_qos_lock(esw); ++ ++ if (!esw_qos_vport_validate_unsupported_tc_bw(vport, tc_bw)) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "E-Switch traffic classes number is not supported"); ++ err = -EOPNOTSUPP; ++ goto unlock; ++ } ++ ++ vport_node = vport->qos.sched_node; ++ if (disable && !vport_node) ++ goto unlock; ++ ++ if (disable) { ++ if (vport_node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_VPORT, ++ NULL, extack); ++ goto unlock; ++ } ++ ++ if (!vport_node) { ++ err = mlx5_esw_qos_vport_enable(vport, ++ SCHED_NODE_TYPE_TC_ARBITER_TSAR, ++ NULL, 0, 0, extack); ++ vport_node = vport->qos.sched_node; ++ } else { ++ err = esw_qos_vport_update(vport, ++ SCHED_NODE_TYPE_TC_ARBITER_TSAR, ++ NULL, extack); ++ } ++unlock: ++ esw_qos_unlock(esw); ++ return err; + } + + int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, +@@ -1311,10 +1687,16 @@ int mlx5_esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_s + } + + esw_qos_lock(esw); +- if (!vport->qos.sched_node && parent) +- err = mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, parent, 0, 0, extack); +- else if (vport->qos.sched_node) ++ if (!vport->qos.sched_node && parent) { ++ enum sched_node_type type; ++ ++ type = parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR ? ++ SCHED_NODE_TYPE_RATE_LIMITER : SCHED_NODE_TYPE_VPORT; ++ err = mlx5_esw_qos_vport_enable(vport, type, parent, 0, 0, ++ extack); ++ } else if (vport->qos.sched_node) { + err = esw_qos_vport_update_parent(vport, parent, extack); ++ } + esw_qos_unlock(esw); + return err; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 8573d36785f4..d59fdcb29cb8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -212,10 +212,20 @@ struct mlx5_vport { + + struct mlx5_vport_info info; + +- /* Protected with the E-Switch qos domain lock. */ ++ /* Protected with the E-Switch qos domain lock. The Vport QoS can ++ * either be disabled (sched_node is NULL) or in one of three states: ++ * 1. Regular QoS (sched_node is a vport node). ++ * 2. TC QoS enabled on the vport (sched_node is a TC arbiter). ++ * 3. TC QoS enabled on the vport's parent node ++ * (sched_node is a rate limit node). ++ * When TC is enabled in either mode, the vport owns vport TC scheduling ++ * nodes. ++ */ + struct { +- /* Vport scheduling element node. */ ++ /* Vport scheduling node. */ + struct mlx5_esw_sched_node *sched_node; ++ /* Array of vport traffic class scheduling nodes. */ ++ struct mlx5_esw_sched_node **sched_nodes; + } qos; + + u16 vport; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1421-net-mlx5-manage-tc-arbiter-nodes-and-implement-full-support-.patch b/SOURCES/1421-net-mlx5-manage-tc-arbiter-nodes-and-implement-full-support-.patch new file mode 100644 index 000000000..dfff97348 --- /dev/null +++ b/SOURCES/1421-net-mlx5-manage-tc-arbiter-nodes-and-implement-full-support-.patch @@ -0,0 +1,507 @@ +From 68428c109ec29f2fbc9421137b13c0a91bc041d1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: Manage TC arbiter nodes and implement full support + for tc-bw + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cf7e73770d1bc0492b20e2b0222a59c6bafbd8ff +Author: Carolina Jubran +Date: Sun Jun 29 17:21:37 2025 +0300 + + net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw + + Introduce support for managing Traffic Class (TC) arbiter nodes and + associated vports TC nodes within the E-Switch QoS hierarchy. This + patch adds support for the new scheduling node type, + `SCHED_NODE_TYPE_VPORTS_TC_TSAR`, and implements full support for + setting tc-bw on both vports and nodes. + + Key changes include: + + - Introduced the new scheduling node type, + `SCHED_NODE_TYPE_VPORTS_TC_TSAR`, for managing vports within the TC + arbiter node. + + - New helper functions for creating and destroying vports TC nodes + under the TC arbiter. + + - Updated the minimum rate normalization function to skip nodes of type + `SCHED_NODE_TYPE_VPORTS_TC_TSAR`. Vports TC TSARs have bandwidth + shares configured on them but not minimum rates, so their `min_rate` + cannot be normalized. + + - Implementation of `esw_qos_tc_arbiter_scheduling_setup()` and + `esw_qos_tc_arbiter_scheduling_teardown()` for initializing and + cleaning up TC arbiter scheduling elements. These functions now fully + support tc-bw configuration on TC arbiter nodes. + + - Introduced a new helper `esw_qos_calculate_tc_bw_divider()` to + compute the total TC bandwidth share, which is used as a divider for + normalizing each TC's share. + + - Added `esw_qos_tc_arbiter_get_bw_shares()` and + `esw_qos_set_tc_arbiter_bw_shares()` to handle the settings of + bandwidth shares for vports traffic class TSARs. + + - `esw_qos_set_tc_arbiter_bw_shares()` normalizes each TC share based + on the total and the firmware's maximum allowed TSAR bandwidth share. + + - Refactored `mlx5_esw_devlink_rate_node_tc_bw_set()` and + `mlx5_esw_devlink_rate_leaf_tc_bw_set()` to fully support configuring + tc-bw on devlink rate nodes and vports, respectively. + + - Refactored `mlx5_esw_qos_node_update_parent()` to ensure that tc-bw + configuration remains compatible with setting a parent on a rate + node, preserving level hierarchy functionality. + + - Refactored `esw_qos_calc_bw_share()` to generalize its input so it + can be used for both minimum rate and bandwidth share calculations. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250629142138.361537-8-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 120d19eeb46b..c24d1f584a46 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -67,6 +67,7 @@ enum sched_node_type { + SCHED_NODE_TYPE_TC_ARBITER_TSAR, + SCHED_NODE_TYPE_RATE_LIMITER, + SCHED_NODE_TYPE_VPORT_TC, ++ SCHED_NODE_TYPE_VPORTS_TC_TSAR, + }; + + static const char * const sched_node_type_str[] = { +@@ -75,6 +76,7 @@ static const char * const sched_node_type_str[] = { + [SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR", + [SCHED_NODE_TYPE_RATE_LIMITER] = "Rate Limiter", + [SCHED_NODE_TYPE_VPORT_TC] = "vport TC", ++ [SCHED_NODE_TYPE_VPORTS_TC_TSAR] = "vports TC TSAR", + }; + + struct mlx5_esw_sched_node { +@@ -187,6 +189,11 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport *vport) + static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op) + { + switch (node->type) { ++ case SCHED_NODE_TYPE_VPORTS_TC_TSAR: ++ esw_warn(node->esw->dev, ++ "E-Switch %s %s scheduling element failed (tc=%d,err=%d)\n", ++ op, sched_node_type_str[node->type], node->tc, err); ++ break; + case SCHED_NODE_TYPE_VPORT_TC: + esw_warn(node->esw->dev, + "E-Switch %s %s scheduling element failed (vport=%d,tc=%d,err=%d)\n", +@@ -345,11 +352,13 @@ static u32 esw_qos_calculate_min_rate_divider(struct mlx5_eswitch *esw, + return 0; + } + +-static u32 esw_qos_calc_bw_share(u32 min_rate, u32 divider, u32 fw_max) ++static u32 esw_qos_calc_bw_share(u32 value, u32 divider, u32 fw_max) + { + if (!divider) + return 0; +- return min_t(u32, max_t(u32, DIV_ROUND_UP(min_rate, divider), MLX5_MIN_BW_SHARE), fw_max); ++ return min_t(u32, fw_max, ++ max_t(u32, ++ DIV_ROUND_UP(value, divider), MLX5_MIN_BW_SHARE)); + } + + static void esw_qos_update_sched_node_bw_share(struct mlx5_esw_sched_node *node, +@@ -376,7 +385,13 @@ static void esw_qos_normalize_min_rate(struct mlx5_eswitch *esw, + if (node->esw != esw || node->ix == esw->qos.root_tsar_ix) + continue; + +- esw_qos_update_sched_node_bw_share(node, divider, extack); ++ /* Vports TC TSARs don't have a minimum rate configured, ++ * so there's no need to update the bw_share on them. ++ */ ++ if (node->type != SCHED_NODE_TYPE_VPORTS_TC_TSAR) { ++ esw_qos_update_sched_node_bw_share(node, divider, ++ extack); ++ } + + if (list_empty(&node->children)) + continue; +@@ -385,6 +400,20 @@ static void esw_qos_normalize_min_rate(struct mlx5_eswitch *esw, + } + } + ++static u32 esw_qos_calculate_tc_bw_divider(u32 *tc_bw) ++{ ++ u32 total = 0; ++ int i; ++ ++ for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) ++ total += tc_bw[i]; ++ ++ /* If total is zero, tc-bw config is disabled and we shouldn't reach ++ * here. ++ */ ++ return WARN_ON(!total) ? 1 : total; ++} ++ + static int esw_qos_set_node_min_rate(struct mlx5_esw_sched_node *node, + u32 min_rate, struct netlink_ext_ack *extack) + { +@@ -527,6 +556,149 @@ static void esw_qos_destroy_node(struct mlx5_esw_sched_node *node, struct netlin + __esw_qos_free_node(node); + } + ++static int esw_qos_create_vports_tc_node(struct mlx5_esw_sched_node *parent, ++ u8 tc, struct netlink_ext_ack *extack) ++{ ++ u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; ++ struct mlx5_core_dev *dev = parent->esw->dev; ++ struct mlx5_esw_sched_node *vports_tc_node; ++ void *attr; ++ int err; ++ ++ if (!mlx5_qos_element_type_supported( ++ dev, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR, ++ SCHEDULING_HIERARCHY_E_SWITCH) || ++ !mlx5_qos_tsar_type_supported(dev, ++ TSAR_ELEMENT_TSAR_TYPE_DWRR, ++ SCHEDULING_HIERARCHY_E_SWITCH)) ++ return -EOPNOTSUPP; ++ ++ vports_tc_node = __esw_qos_alloc_node(parent->esw, 0, ++ SCHED_NODE_TYPE_VPORTS_TC_TSAR, ++ parent); ++ if (!vports_tc_node) { ++ NL_SET_ERR_MSG_MOD(extack, "E-Switch alloc node failed"); ++ esw_warn(dev, "Failed to alloc vports TC node (tc=%d)\n", tc); ++ return -ENOMEM; ++ } ++ ++ attr = MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes); ++ MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_DWRR); ++ MLX5_SET(tsar_element, attr, traffic_class, tc); ++ MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, parent->ix); ++ MLX5_SET(scheduling_context, tsar_ctx, element_type, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR); ++ ++ err = esw_qos_node_create_sched_element(vports_tc_node, tsar_ctx, ++ extack); ++ if (err) ++ goto err_create_sched_element; ++ ++ vports_tc_node->tc = tc; ++ ++ return 0; ++ ++err_create_sched_element: ++ __esw_qos_free_node(vports_tc_node); ++ return err; ++} ++ ++static void ++esw_qos_tc_arbiter_get_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node, ++ u32 *tc_bw) ++{ ++ struct mlx5_esw_sched_node *vports_tc_node; ++ ++ list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) ++ tc_bw[vports_tc_node->tc] = vports_tc_node->bw_share; ++} ++ ++static void ++esw_qos_set_tc_arbiter_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node, ++ u32 *tc_bw, struct netlink_ext_ack *extack) ++{ ++ struct mlx5_eswitch *esw = tc_arbiter_node->esw; ++ struct mlx5_esw_sched_node *vports_tc_node; ++ u32 divider, fw_max_bw_share; ++ ++ fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share); ++ divider = esw_qos_calculate_tc_bw_divider(tc_bw); ++ list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) { ++ u8 tc = vports_tc_node->tc; ++ u32 bw_share; ++ ++ bw_share = tc_bw[tc] * fw_max_bw_share; ++ bw_share = esw_qos_calc_bw_share(bw_share, divider, ++ fw_max_bw_share); ++ esw_qos_sched_elem_config(vports_tc_node, 0, bw_share, extack); ++ } ++} ++ ++static void ++esw_qos_destroy_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_node, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *vports_tc_node, *tmp; ++ ++ list_for_each_entry_safe(vports_tc_node, tmp, ++ &tc_arbiter_node->children, entry) ++ esw_qos_destroy_node(vports_tc_node, extack); ++} ++ ++static int ++esw_qos_create_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_node, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_eswitch *esw = tc_arbiter_node->esw; ++ int err, i, num_tcs = esw_qos_num_tcs(esw->dev); ++ ++ for (i = 0; i < num_tcs; i++) { ++ err = esw_qos_create_vports_tc_node(tc_arbiter_node, i, extack); ++ if (err) ++ goto err_tc_node_create; ++ } ++ ++ return 0; ++ ++err_tc_node_create: ++ esw_qos_destroy_vports_tc_nodes(tc_arbiter_node, NULL); ++ return err; ++} ++ ++static int esw_qos_create_tc_arbiter_sched_elem( ++ struct mlx5_esw_sched_node *tc_arbiter_node, ++ struct netlink_ext_ack *extack) ++{ ++ u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; ++ u32 tsar_parent_ix; ++ void *attr; ++ ++ if (!mlx5_qos_tsar_type_supported(tc_arbiter_node->esw->dev, ++ TSAR_ELEMENT_TSAR_TYPE_TC_ARB, ++ SCHEDULING_HIERARCHY_E_SWITCH)) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "E-Switch TC Arbiter scheduling element is not supported"); ++ return -EOPNOTSUPP; ++ } ++ ++ attr = MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes); ++ MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_TC_ARB); ++ tsar_parent_ix = tc_arbiter_node->parent ? tc_arbiter_node->parent->ix : ++ tc_arbiter_node->esw->qos.root_tsar_ix; ++ MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, ++ tsar_parent_ix); ++ MLX5_SET(scheduling_context, tsar_ctx, element_type, ++ SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR); ++ MLX5_SET(scheduling_context, tsar_ctx, max_average_bw, ++ tc_arbiter_node->max_rate); ++ MLX5_SET(scheduling_context, tsar_ctx, bw_share, ++ tc_arbiter_node->bw_share); ++ ++ return esw_qos_node_create_sched_element(tc_arbiter_node, tsar_ctx, ++ extack); ++} ++ + static struct mlx5_esw_sched_node * + __esw_qos_create_vports_sched_node(struct mlx5_eswitch *esw, struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) +@@ -591,6 +763,9 @@ static void __esw_qos_destroy_node(struct mlx5_esw_sched_node *node, struct netl + { + struct mlx5_eswitch *esw = node->esw; + ++ if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ esw_qos_destroy_vports_tc_nodes(node, extack); ++ + trace_mlx5_esw_node_qos_destroy(esw->dev, node, node->ix); + esw_qos_destroy_node(node, extack); + esw_qos_normalize_min_rate(esw, NULL, extack); +@@ -685,13 +860,38 @@ static void esw_qos_put(struct mlx5_eswitch *esw) + static void + esw_qos_tc_arbiter_scheduling_teardown(struct mlx5_esw_sched_node *node, + struct netlink_ext_ack *extack) +-{} ++{ ++ /* Clean up all Vports TC nodes within the TC arbiter node. */ ++ esw_qos_destroy_vports_tc_nodes(node, extack); ++ /* Destroy the scheduling element for the TC arbiter node itself. */ ++ esw_qos_node_destroy_sched_element(node, extack); ++} + + static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node, + struct netlink_ext_ack *extack) + { +- NL_SET_ERR_MSG_MOD(extack, "TC arbiter elements are not supported."); +- return -EOPNOTSUPP; ++ u32 curr_ix = node->ix; ++ int err; ++ ++ err = esw_qos_create_tc_arbiter_sched_elem(node, extack); ++ if (err) ++ return err; ++ /* Initialize the vports TC nodes within created TC arbiter TSAR. */ ++ err = esw_qos_create_vports_tc_nodes(node, extack); ++ if (err) ++ goto err_vports_tc_nodes; ++ ++ node->type = SCHED_NODE_TYPE_TC_ARBITER_TSAR; ++ ++ return 0; ++ ++err_vports_tc_nodes: ++ /* If initialization fails, clean up the scheduling element ++ * for the TC arbiter node. ++ */ ++ esw_qos_node_destroy_sched_element(node, NULL); ++ node->ix = curr_ix; ++ return err; + } + + static int +@@ -1064,6 +1264,7 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, + { + struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent; + enum sched_node_type curr_type = vport->qos.sched_node->type; ++ u32 curr_tc_bw[DEVLINK_RATE_TCS_MAX] = {0}; + int err; + + esw_assert_qos_lock_held(vport->dev->priv.eswitch); +@@ -1075,11 +1276,23 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, + if (err) + return err; + ++ if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) { ++ esw_qos_tc_arbiter_get_bw_shares(vport->qos.sched_node, ++ curr_tc_bw); ++ } ++ + esw_qos_vport_disable(vport, extack); + + err = esw_qos_vport_enable(vport, type, parent, extack); +- if (err) ++ if (err) { + esw_qos_vport_enable(vport, curr_type, curr_parent, NULL); ++ extack = NULL; ++ } ++ ++ if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) { ++ esw_qos_set_tc_arbiter_bw_shares(vport->qos.sched_node, ++ curr_tc_bw, extack); ++ } + + return err; + } +@@ -1563,6 +1776,8 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, + SCHED_NODE_TYPE_TC_ARBITER_TSAR, + NULL, extack); + } ++ if (!err) ++ esw_qos_set_tc_arbiter_bw_shares(vport_node, tc_bw, extack); + unlock: + esw_qos_unlock(esw); + return err; +@@ -1592,6 +1807,8 @@ int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, + } + + err = esw_qos_node_enable_tc_arbitration(node, extack); ++ if (!err) ++ esw_qos_set_tc_arbiter_bw_shares(node, tc_bw, extack); + unlock: + esw_qos_unlock(esw); + return err; +@@ -1716,6 +1933,20 @@ int mlx5_esw_devlink_rate_leaf_parent_set(struct devlink_rate *devlink_rate, + return mlx5_esw_qos_vport_update_parent(vport, node, extack); + } + ++static bool esw_qos_is_node_empty(struct mlx5_esw_sched_node *node) ++{ ++ if (list_empty(&node->children)) ++ return true; ++ ++ if (node->type != SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ return false; ++ ++ node = list_first_entry(&node->children, struct mlx5_esw_sched_node, ++ entry); ++ ++ return esw_qos_is_node_empty(node); ++} ++ + static int + mlx5_esw_qos_node_validate_set_parent(struct mlx5_esw_sched_node *node, + struct mlx5_esw_sched_node *parent, +@@ -1729,13 +1960,26 @@ mlx5_esw_qos_node_validate_set_parent(struct mlx5_esw_sched_node *node, + return -EOPNOTSUPP; + } + +- if (!list_empty(&node->children)) { ++ if (!esw_qos_is_node_empty(node)) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot reassign a node that contains rate objects"); + return -EOPNOTSUPP; + } + ++ if (parent && parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Cannot attach a node to a parent with TC bandwidth configured"); ++ return -EOPNOTSUPP; ++ } ++ + new_level = parent ? parent->level + 1 : 2; ++ if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { ++ /* Increase by one to account for the vports TC scheduling ++ * element. ++ */ ++ new_level += 1; ++ } ++ + max_level = 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); + if (new_level > max_level) { + NL_SET_ERR_MSG_MOD(extack, +@@ -1746,6 +1990,32 @@ mlx5_esw_qos_node_validate_set_parent(struct mlx5_esw_sched_node *node, + return 0; + } + ++static int ++esw_qos_tc_arbiter_node_update_parent(struct mlx5_esw_sched_node *node, ++ struct mlx5_esw_sched_node *parent, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_esw_sched_node *curr_parent = node->parent; ++ u32 curr_tc_bw[DEVLINK_RATE_TCS_MAX] = {0}; ++ struct mlx5_eswitch *esw = node->esw; ++ int err; ++ ++ esw_qos_tc_arbiter_get_bw_shares(node, curr_tc_bw); ++ esw_qos_tc_arbiter_scheduling_teardown(node, extack); ++ esw_qos_node_set_parent(node, parent); ++ err = esw_qos_tc_arbiter_scheduling_setup(node, extack); ++ if (err) { ++ esw_qos_node_set_parent(node, curr_parent); ++ if (esw_qos_tc_arbiter_scheduling_setup(node, extack)) { ++ esw_warn(esw->dev, "Node restore QoS failed\n"); ++ return err; ++ } ++ } ++ esw_qos_set_tc_arbiter_bw_shares(node, curr_tc_bw, extack); ++ ++ return err; ++} ++ + static int esw_qos_vports_node_update_parent(struct mlx5_esw_sched_node *node, + struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) +@@ -1792,7 +2062,13 @@ static int mlx5_esw_qos_node_update_parent(struct mlx5_esw_sched_node *node, + + esw_qos_lock(esw); + curr_parent = node->parent; +- err = esw_qos_vports_node_update_parent(node, parent, extack); ++ if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { ++ err = esw_qos_tc_arbiter_node_update_parent(node, parent, ++ extack); ++ } else { ++ err = esw_qos_vports_node_update_parent(node, parent, extack); ++ } ++ + if (err) + goto out; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1422-net-mlx5-hws-remove-unused-create-dest-array-parameter.patch b/SOURCES/1422-net-mlx5-hws-remove-unused-create-dest-array-parameter.patch new file mode 100644 index 000000000..258d8b561 --- /dev/null +++ b/SOURCES/1422-net-mlx5-hws-remove-unused-create-dest-array-parameter.patch @@ -0,0 +1,115 @@ +From b46008ad0339b5ac38a1db24879b4e9304e368c1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: HWS, remove unused create_dest_array parameter + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 60afb51c89414b3d0061226415651f29a7eaf932 +Author: Vlad Dogaru +Date: Thu Jul 3 21:54:22 2025 +0300 + + net/mlx5: HWS, remove unused create_dest_array parameter + + `flow_source` is not used anywhere in mlx5hws_action_create_dest_array. + + Signed-off-by: Vlad Dogaru + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Simon Horman + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250703185431.445571-2-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index 447ea3f8722c..396804369b00 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1358,12 +1358,9 @@ mlx5hws_action_create_modify_header(struct mlx5hws_context *ctx, + } + + struct mlx5hws_action * +-mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, +- size_t num_dest, ++mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, size_t num_dest, + struct mlx5hws_action_dest_attr *dests, +- bool ignore_flow_level, +- u32 flow_source, +- u32 flags) ++ bool ignore_flow_level, u32 flags) + { + struct mlx5hws_cmd_set_fte_dest *dest_list = NULL; + struct mlx5hws_cmd_ft_create_attr ft_attr = {0}; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +index bf4643d0ce17..57592b92e24b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +@@ -571,14 +571,12 @@ static void mlx5_fs_put_dest_action_sampler(struct mlx5_fs_hws_context *fs_ctx, + static struct mlx5hws_action * + mlx5_fs_create_action_dest_array(struct mlx5hws_context *ctx, + struct mlx5hws_action_dest_attr *dests, +- u32 num_of_dests, bool ignore_flow_level, +- u32 flow_source) ++ u32 num_of_dests, bool ignore_flow_level) + { + u32 flags = MLX5HWS_ACTION_FLAG_HWS_FDB | MLX5HWS_ACTION_FLAG_SHARED; + + return mlx5hws_action_create_dest_array(ctx, num_of_dests, dests, +- ignore_flow_level, +- flow_source, flags); ++ ignore_flow_level, flags); + } + + static struct mlx5hws_action * +@@ -1015,7 +1013,6 @@ static int mlx5_fs_fte_get_hws_actions(struct mlx5_flow_root_namespace *ns, + } + (*ractions)[num_actions++].action = dest_actions->dest; + } else if (num_dest_actions > 1) { +- u32 flow_source = fte->act_dests.flow_context.flow_source; + bool ignore_flow_level; + + if (num_actions == MLX5_FLOW_CONTEXT_ACTION_MAX || +@@ -1025,10 +1022,10 @@ static int mlx5_fs_fte_get_hws_actions(struct mlx5_flow_root_namespace *ns, + } + ignore_flow_level = + !!(fte_action->flags & FLOW_ACT_IGNORE_FLOW_LEVEL); +- tmp_action = mlx5_fs_create_action_dest_array(ctx, dest_actions, +- num_dest_actions, +- ignore_flow_level, +- flow_source); ++ tmp_action = ++ mlx5_fs_create_action_dest_array(ctx, dest_actions, ++ num_dest_actions, ++ ignore_flow_level); + if (!tmp_action) { + err = -EOPNOTSUPP; + goto free_actions; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index d8ac6c196211..a1295a311b70 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -727,18 +727,14 @@ mlx5hws_action_create_push_vlan(struct mlx5hws_context *ctx, u32 flags); + * @dests: The destination array. Each contains a destination action and can + * have additional actions. + * @ignore_flow_level: Whether to turn on 'ignore_flow_level' for this dest. +- * @flow_source: Source port of the traffic for this actions. + * @flags: Action creation flags (enum mlx5hws_action_flags). + * + * Return: pointer to mlx5hws_action on success NULL otherwise. + */ + struct mlx5hws_action * +-mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, +- size_t num_dest, ++mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, size_t num_dest, + struct mlx5hws_action_dest_attr *dests, +- bool ignore_flow_level, +- u32 flow_source, +- u32 flags); ++ bool ignore_flow_level, u32 flags); + + /** + * mlx5hws_action_create_insert_header - Create insert header action. +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1423-net-mlx5-hws-remove-incorrect-comment.patch b/SOURCES/1423-net-mlx5-hws-remove-incorrect-comment.patch new file mode 100644 index 000000000..f03117416 --- /dev/null +++ b/SOURCES/1423-net-mlx5-hws-remove-incorrect-comment.patch @@ -0,0 +1,41 @@ +From f894d6ec31f57af4bcee260cb6ec1df09596a8df Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:19 -0400 +Subject: [PATCH] net/mlx5: HWS, remove incorrect comment + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 26b06579d50d5f0462c45b63291b90ea85664237 +Author: Yevgeny Kliteynik +Date: Thu Jul 3 21:54:23 2025 +0300 + + net/mlx5: HWS, remove incorrect comment + + Removing incorrect comment section that is probably some + copy-paste artifact. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Reviewed-by: Simon Horman + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250703185431.445571-3-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 9e057f808ea5..665e6e285db5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -876,8 +876,6 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + + /* At this point the rule wasn't added. + * It could be because there was collision, or some other problem. +- * If we don't dive deeper than API, the only thing we know is that +- * the status of completion is RTE_FLOW_OP_ERROR. + * Try rehash by size and insert rule again - last chance. + */ + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1424-net-mlx5-hws-export-rule-skip-logic.patch b/SOURCES/1424-net-mlx5-hws-export-rule-skip-logic.patch new file mode 100644 index 000000000..4eefa950f --- /dev/null +++ b/SOURCES/1424-net-mlx5-hws-export-rule-skip-logic.patch @@ -0,0 +1,69 @@ +From 67fe87adf964e4b80fc63dbdebc34853c49aae5e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: HWS, Export rule skip logic + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d8e7ab591b506b56dc96021674938d0c7905e656 +Author: Vlad Dogaru +Date: Thu Jul 3 21:54:24 2025 +0300 + + net/mlx5: HWS, Export rule skip logic + + The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of + RX and TX rules individually, so export this function for future usage. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250703185431.445571-4-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +index 5342a4cc7194..4883e4e1d251 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +@@ -3,10 +3,8 @@ + + #include "internal.h" + +-static void hws_rule_skip(struct mlx5hws_matcher *matcher, +- struct mlx5hws_match_template *mt, +- u32 flow_source, +- bool *skip_rx, bool *skip_tx) ++void mlx5hws_rule_skip(struct mlx5hws_matcher *matcher, u32 flow_source, ++ bool *skip_rx, bool *skip_tx) + { + /* By default FDB rules are added to both RX and TX */ + *skip_rx = false; +@@ -66,7 +64,8 @@ static void hws_rule_init_dep_wqe(struct mlx5hws_send_ring_dep_wqe *dep_wqe, + attr->rule_idx : 0; + + if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) { +- hws_rule_skip(matcher, mt, attr->flow_source, &skip_rx, &skip_tx); ++ mlx5hws_rule_skip(matcher, attr->flow_source, ++ &skip_rx, &skip_tx); + + if (!skip_rx) { + dep_wqe->rtc_0 = matcher->match_ste.rtc_0_id; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h +index 1c47a9c11572..d0f082b8dbf5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h +@@ -69,6 +69,9 @@ struct mlx5hws_rule { + */ + }; + ++void mlx5hws_rule_skip(struct mlx5hws_matcher *matcher, u32 flow_source, ++ bool *skip_rx, bool *skip_tx); ++ + void mlx5hws_rule_free_action_ste(struct mlx5hws_action_ste_chunk *action_ste); + + int mlx5hws_rule_move_hws_remove(struct mlx5hws_rule *rule, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1425-net-mlx5-hws-refactor-rule-skip-logic.patch b/SOURCES/1425-net-mlx5-hws-refactor-rule-skip-logic.patch new file mode 100644 index 000000000..1c76c607a --- /dev/null +++ b/SOURCES/1425-net-mlx5-hws-refactor-rule-skip-logic.patch @@ -0,0 +1,65 @@ +From 25f4e39ec27d87699dc411f887d1b118c76f99b8 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: HWS, Refactor rule skip logic + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3dcac700d20b9e426386d7f59f1601db038fbf1c +Author: Vlad Dogaru +Date: Thu Jul 3 21:54:25 2025 +0300 + + net/mlx5: HWS, Refactor rule skip logic + + Reduce nesting by adding a couple of early return statements. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250703185431.445571-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +index 4883e4e1d251..a94f094e72ba 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c +@@ -12,20 +12,21 @@ void mlx5hws_rule_skip(struct mlx5hws_matcher *matcher, u32 flow_source, + + if (flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_LOCAL_VPORT) { + *skip_rx = true; +- } else if (flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK) { ++ return; ++ } ++ ++ if (flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK) { + *skip_tx = true; +- } else { +- /* If no flow source was set for current rule, +- * check for flow source in matcher attributes. +- */ +- if (matcher->attr.optimize_flow_src) { +- *skip_tx = +- matcher->attr.optimize_flow_src == MLX5HWS_MATCHER_FLOW_SRC_WIRE; +- *skip_rx = +- matcher->attr.optimize_flow_src == MLX5HWS_MATCHER_FLOW_SRC_VPORT; +- return; +- } ++ return; + } ++ ++ /* If no flow source was set for current rule, ++ * check for flow source in matcher attributes. ++ */ ++ *skip_tx = matcher->attr.optimize_flow_src == ++ MLX5HWS_MATCHER_FLOW_SRC_WIRE; ++ *skip_rx = matcher->attr.optimize_flow_src == ++ MLX5HWS_MATCHER_FLOW_SRC_VPORT; + } + + static void +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1426-net-mlx5-hws-create-stes-directly-from-matcher.patch b/SOURCES/1426-net-mlx5-hws-create-stes-directly-from-matcher.patch new file mode 100644 index 000000000..b182aa9a6 --- /dev/null +++ b/SOURCES/1426-net-mlx5-hws-create-stes-directly-from-matcher.patch @@ -0,0 +1,201 @@ +From b1464c79682d1805a33f13c4bb4448ef2ef54697 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: HWS, Create STEs directly from matcher + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 59807d071724f4e639fa9ebf841b12fb97e5dbf2 +Author: Vlad Dogaru +Date: Thu Jul 3 21:54:26 2025 +0300 + + net/mlx5: HWS, Create STEs directly from matcher + + Matchers were using the pool abstraction solely as a convenience + to allocate two STE ranges. The pool's core functionality, that + of allocating individual items from the range, was unused. + Matchers rely either on the hardware to hash rules into a table, + or on a user-provided index. + + Remove the STE pool from the matcher and allocate the STE ranges + manually instead. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Simon Horman + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250703185431.445571-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +index 91568d6c1dac..f9b75aefcaa7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +@@ -118,7 +118,6 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma + { + enum mlx5hws_table_type tbl_type = matcher->tbl->type; + struct mlx5hws_cmd_ft_query_attr ft_attr = {0}; +- struct mlx5hws_pool *ste_pool; + u64 icm_addr_0 = 0; + u64 icm_addr_1 = 0; + u32 ste_0_id = -1; +@@ -133,12 +132,9 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma + matcher->end_ft_id, + matcher->col_matcher ? HWS_PTR_TO_ID(matcher->col_matcher) : 0); + +- ste_pool = matcher->match_ste.pool; +- if (ste_pool) { +- ste_0_id = mlx5hws_pool_get_base_id(ste_pool); +- if (tbl_type == MLX5HWS_TABLE_TYPE_FDB) +- ste_1_id = mlx5hws_pool_get_base_mirror_id(ste_pool); +- } ++ ste_0_id = matcher->match_ste.ste_0_base; ++ if (tbl_type == MLX5HWS_TABLE_TYPE_FDB) ++ ste_1_id = matcher->match_ste.ste_1_base; + + seq_printf(f, ",%d,%d,%d,%d", + matcher->match_ste.rtc_0_id, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index ce28ee1c0e41..b0fcaf508e06 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -507,10 +507,8 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher) + } + } + +- obj_id = mlx5hws_pool_get_base_id(matcher->match_ste.pool); +- + rtc_attr.pd = ctx->pd_num; +- rtc_attr.ste_base = obj_id; ++ rtc_attr.ste_base = matcher->match_ste.ste_0_base; + rtc_attr.reparse_mode = mlx5hws_context_get_reparse_mode(ctx); + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, false); + hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, false); +@@ -527,9 +525,7 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher) + } + + if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) { +- obj_id = mlx5hws_pool_get_base_mirror_id( +- matcher->match_ste.pool); +- rtc_attr.ste_base = obj_id; ++ rtc_attr.ste_base = matcher->match_ste.ste_1_base; + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, true); + + obj_id = mlx5hws_pool_get_base_mirror_id(ctx->stc_pool); +@@ -588,21 +584,6 @@ hws_matcher_check_attr_sz(struct mlx5hws_cmd_query_caps *caps, + return 0; + } + +-static void hws_matcher_set_pool_attr(struct mlx5hws_pool_attr *attr, +- struct mlx5hws_matcher *matcher) +-{ +- switch (matcher->attr.optimize_flow_src) { +- case MLX5HWS_MATCHER_FLOW_SRC_VPORT: +- attr->opt_type = MLX5HWS_POOL_OPTIMIZE_ORIG; +- break; +- case MLX5HWS_MATCHER_FLOW_SRC_WIRE: +- attr->opt_type = MLX5HWS_POOL_OPTIMIZE_MIRROR; +- break; +- default: +- break; +- } +-} +- + static int hws_matcher_check_and_process_at(struct mlx5hws_matcher *matcher, + struct mlx5hws_action_template *at) + { +@@ -683,8 +664,8 @@ static void hws_matcher_set_ip_version_match(struct mlx5hws_matcher *matcher) + + static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + { ++ struct mlx5hws_cmd_ste_create_attr ste_attr = {}; + struct mlx5hws_context *ctx = matcher->tbl->ctx; +- struct mlx5hws_pool_attr pool_attr = {0}; + int ret; + + /* Calculate match, range and hash definers */ +@@ -699,22 +680,39 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + + hws_matcher_set_ip_version_match(matcher); + +- /* Create an STE pool per matcher*/ +- pool_attr.table_type = matcher->tbl->type; +- pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE; +- pool_attr.alloc_log_sz = matcher->attr.table.sz_col_log + +- matcher->attr.table.sz_row_log; +- hws_matcher_set_pool_attr(&pool_attr, matcher); +- +- matcher->match_ste.pool = mlx5hws_pool_create(ctx, &pool_attr); +- if (!matcher->match_ste.pool) { +- mlx5hws_err(ctx, "Failed to allocate matcher STE pool\n"); +- ret = -EOPNOTSUPP; ++ /* Create an STE range each for RX and TX. */ ++ ste_attr.table_type = FS_FT_FDB_RX; ++ ste_attr.log_obj_range = ++ matcher->attr.optimize_flow_src == ++ MLX5HWS_MATCHER_FLOW_SRC_VPORT ? ++ 0 : matcher->attr.table.sz_col_log + ++ matcher->attr.table.sz_row_log; ++ ++ ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr, ++ &matcher->match_ste.ste_0_base); ++ if (ret) { ++ mlx5hws_err(ctx, "Failed to allocate RX STE range (%d)\n", ret); + goto uninit_match_definer; + } + ++ ste_attr.table_type = FS_FT_FDB_TX; ++ ste_attr.log_obj_range = ++ matcher->attr.optimize_flow_src == ++ MLX5HWS_MATCHER_FLOW_SRC_WIRE ? ++ 0 : matcher->attr.table.sz_col_log + ++ matcher->attr.table.sz_row_log; ++ ++ ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr, ++ &matcher->match_ste.ste_1_base); ++ if (ret) { ++ mlx5hws_err(ctx, "Failed to allocate TX STE range (%d)\n", ret); ++ goto destroy_rx_ste_range; ++ } ++ + return 0; + ++destroy_rx_ste_range: ++ mlx5hws_cmd_ste_destroy(ctx->mdev, matcher->match_ste.ste_0_base); + uninit_match_definer: + if (!(matcher->flags & MLX5HWS_MATCHER_FLAGS_COLLISION)) + mlx5hws_definer_mt_uninit(ctx, matcher->mt); +@@ -723,9 +721,12 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + + static void hws_matcher_unbind_mt(struct mlx5hws_matcher *matcher) + { +- mlx5hws_pool_destroy(matcher->match_ste.pool); ++ struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ ++ mlx5hws_cmd_ste_destroy(ctx->mdev, matcher->match_ste.ste_1_base); ++ mlx5hws_cmd_ste_destroy(ctx->mdev, matcher->match_ste.ste_0_base); + if (!(matcher->flags & MLX5HWS_MATCHER_FLAGS_COLLISION)) +- mlx5hws_definer_mt_uninit(matcher->tbl->ctx, matcher->mt); ++ mlx5hws_definer_mt_uninit(ctx, matcher->mt); + } + + static int +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +index 32e83cddcd60..ae20bcebfdde 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h +@@ -48,7 +48,8 @@ struct mlx5hws_match_template { + struct mlx5hws_matcher_match_ste { + u32 rtc_0_id; + u32 rtc_1_id; +- struct mlx5hws_pool *pool; ++ u32 ste_0_base; ++ u32 ste_1_base; + }; + + enum { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1427-net-mlx5-hws-decouple-matcher-rx-and-tx-sizes.patch b/SOURCES/1427-net-mlx5-hws-decouple-matcher-rx-and-tx-sizes.patch new file mode 100644 index 000000000..f49866b1e --- /dev/null +++ b/SOURCES/1427-net-mlx5-hws-decouple-matcher-rx-and-tx-sizes.patch @@ -0,0 +1,362 @@ +From 4c392e9f4c088dd99889101b5410690187003a43 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: HWS, Decouple matcher RX and TX sizes + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c8332ce096913bc6624cdbd5276fa49dc92fa532 +Author: Vlad Dogaru +Date: Thu Jul 3 21:54:27 2025 +0300 + + net/mlx5: HWS, Decouple matcher RX and TX sizes + + Kernel HWS only uses FDB tables and, as such, creates two lower level + containers (RTCs) for each matcher: one for RX and one for TX. Allow + these RTCs to differ in size by converting the size part of the matcher + attribute to a two element array. + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Reviewed-by: Simon Horman + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250703185431.445571-7-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 665e6e285db5..009641e6c874 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -48,7 +48,7 @@ static void hws_bwc_unlock_all_queues(struct mlx5hws_context *ctx) + + static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + u32 priority, +- u8 size_log, ++ u8 size_log_rx, u8 size_log_tx, + struct mlx5hws_matcher_attr *attr) + { + struct mlx5hws_bwc_matcher *first_matcher = +@@ -62,7 +62,8 @@ static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + attr->optimize_flow_src = MLX5HWS_MATCHER_FLOW_SRC_ANY; + attr->insert_mode = MLX5HWS_MATCHER_INSERT_BY_HASH; + attr->distribute_mode = MLX5HWS_MATCHER_DISTRIBUTE_BY_HASH; +- attr->rule.num_log = size_log; ++ attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX].rule.num_log = size_log_rx; ++ attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX].rule.num_log = size_log_tx; + attr->resizable = true; + attr->max_num_of_at_attach = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM; + +@@ -93,6 +94,7 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + hws_bwc_matcher_init_attr(bwc_matcher, + priority, + MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG, ++ MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG, + &attr); + + bwc_matcher->priority = priority; +@@ -696,6 +698,7 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) + hws_bwc_matcher_init_attr(bwc_matcher, + bwc_matcher->priority, + bwc_matcher->size_log, ++ bwc_matcher->size_log, + &matcher_attr); + + old_matcher = bwc_matcher->matcher; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +index f9b75aefcaa7..2ec8cb10139a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c +@@ -99,17 +99,19 @@ hws_debug_dump_matcher_attr(struct seq_file *f, struct mlx5hws_matcher *matcher) + { + struct mlx5hws_matcher_attr *attr = &matcher->attr; + +- seq_printf(f, "%d,0x%llx,%d,%d,%d,%d,%d,%d,%d,%d\n", ++ seq_printf(f, "%d,0x%llx,%d,%d,%d,%d,%d,%d,%d,%d,-1,-1,%d,%d\n", + MLX5HWS_DEBUG_RES_TYPE_MATCHER_ATTR, + HWS_PTR_TO_ID(matcher), + attr->priority, + attr->mode, +- attr->table.sz_row_log, +- attr->table.sz_col_log, ++ attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX].table.sz_row_log, ++ attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX].table.sz_col_log, + attr->optimize_using_rule_idx, + attr->optimize_flow_src, + attr->insert_mode, +- attr->distribute_mode); ++ attr->distribute_mode, ++ attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX].table.sz_row_log, ++ attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX].table.sz_col_log); + + return 0; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index b0fcaf508e06..f3ea09caba2b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -468,12 +468,16 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher) + struct mlx5hws_cmd_rtc_create_attr rtc_attr = {0}; + struct mlx5hws_match_template *mt = matcher->mt; + struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ union mlx5hws_matcher_size *size_rx, *size_tx; + struct mlx5hws_table *tbl = matcher->tbl; + u32 obj_id; + int ret; + +- rtc_attr.log_size = attr->table.sz_row_log; +- rtc_attr.log_depth = attr->table.sz_col_log; ++ size_rx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX]; ++ size_tx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX]; ++ ++ rtc_attr.log_size = size_rx->table.sz_row_log; ++ rtc_attr.log_depth = size_rx->table.sz_col_log; + rtc_attr.is_frst_jumbo = mlx5hws_matcher_mt_is_jumbo(mt); + rtc_attr.is_scnd_range = 0; + rtc_attr.miss_ft_id = matcher->end_ft_id; +@@ -525,6 +529,8 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher) + } + + if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) { ++ rtc_attr.log_size = size_tx->table.sz_row_log; ++ rtc_attr.log_depth = size_tx->table.sz_col_log; + rtc_attr.ste_base = matcher->match_ste.ste_1_base; + rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, true); + +@@ -562,23 +568,33 @@ hws_matcher_check_attr_sz(struct mlx5hws_cmd_query_caps *caps, + struct mlx5hws_matcher *matcher) + { + struct mlx5hws_matcher_attr *attr = &matcher->attr; ++ struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ union mlx5hws_matcher_size *size; ++ int i; + +- if (attr->table.sz_col_log > caps->rtc_log_depth_max) { +- mlx5hws_err(matcher->tbl->ctx, "Matcher depth exceeds limit %d\n", +- caps->rtc_log_depth_max); +- return -EOPNOTSUPP; +- } ++ for (i = 0; i < 2; i++) { ++ size = &attr->size[i]; + +- if (attr->table.sz_col_log + attr->table.sz_row_log > caps->ste_alloc_log_max) { +- mlx5hws_err(matcher->tbl->ctx, "Total matcher size exceeds limit %d\n", +- caps->ste_alloc_log_max); +- return -EOPNOTSUPP; +- } ++ if (size->table.sz_col_log > caps->rtc_log_depth_max) { ++ mlx5hws_err(ctx, "Matcher depth exceeds limit %d\n", ++ caps->rtc_log_depth_max); ++ return -EOPNOTSUPP; ++ } + +- if (attr->table.sz_col_log + attr->table.sz_row_log < caps->ste_alloc_log_gran) { +- mlx5hws_err(matcher->tbl->ctx, "Total matcher size below limit %d\n", +- caps->ste_alloc_log_gran); +- return -EOPNOTSUPP; ++ if (size->table.sz_col_log + size->table.sz_row_log > ++ caps->ste_alloc_log_max) { ++ mlx5hws_err(ctx, ++ "Total matcher size exceeds limit %d\n", ++ caps->ste_alloc_log_max); ++ return -EOPNOTSUPP; ++ } ++ ++ if (size->table.sz_col_log + size->table.sz_row_log < ++ caps->ste_alloc_log_gran) { ++ mlx5hws_err(ctx, "Total matcher size below limit %d\n", ++ caps->ste_alloc_log_gran); ++ return -EOPNOTSUPP; ++ } + } + + return 0; +@@ -666,6 +682,7 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + { + struct mlx5hws_cmd_ste_create_attr ste_attr = {}; + struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ union mlx5hws_matcher_size *size; + int ret; + + /* Calculate match, range and hash definers */ +@@ -682,11 +699,11 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + + /* Create an STE range each for RX and TX. */ + ste_attr.table_type = FS_FT_FDB_RX; ++ size = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_RX]; + ste_attr.log_obj_range = + matcher->attr.optimize_flow_src == +- MLX5HWS_MATCHER_FLOW_SRC_VPORT ? +- 0 : matcher->attr.table.sz_col_log + +- matcher->attr.table.sz_row_log; ++ MLX5HWS_MATCHER_FLOW_SRC_VPORT ? ++ 0 : size->table.sz_col_log + size->table.sz_row_log; + + ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr, + &matcher->match_ste.ste_0_base); +@@ -696,11 +713,11 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher) + } + + ste_attr.table_type = FS_FT_FDB_TX; ++ size = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_TX]; + ste_attr.log_obj_range = + matcher->attr.optimize_flow_src == +- MLX5HWS_MATCHER_FLOW_SRC_WIRE ? +- 0 : matcher->attr.table.sz_col_log + +- matcher->attr.table.sz_row_log; ++ MLX5HWS_MATCHER_FLOW_SRC_WIRE ? ++ 0 : size->table.sz_col_log + size->table.sz_row_log; + + ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr, + &matcher->match_ste.ste_1_base); +@@ -735,6 +752,10 @@ hws_matcher_validate_insert_mode(struct mlx5hws_cmd_query_caps *caps, + { + struct mlx5hws_matcher_attr *attr = &matcher->attr; + struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ union mlx5hws_matcher_size *size_rx, *size_tx; ++ ++ size_rx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_RX]; ++ size_tx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_TX]; + + switch (attr->insert_mode) { + case MLX5HWS_MATCHER_INSERT_BY_HASH: +@@ -745,7 +766,7 @@ hws_matcher_validate_insert_mode(struct mlx5hws_cmd_query_caps *caps, + break; + + case MLX5HWS_MATCHER_INSERT_BY_INDEX: +- if (attr->table.sz_col_log) { ++ if (size_rx->table.sz_col_log || size_tx->table.sz_col_log) { + mlx5hws_err(ctx, "Matcher with INSERT_BY_INDEX supports only Nx1 table size\n"); + return -EOPNOTSUPP; + } +@@ -765,7 +786,10 @@ hws_matcher_validate_insert_mode(struct mlx5hws_cmd_query_caps *caps, + return -EOPNOTSUPP; + } + +- if (attr->table.sz_row_log > MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX) { ++ if (size_rx->table.sz_row_log > ++ MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX || ++ size_tx->table.sz_row_log > ++ MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX) { + mlx5hws_err(ctx, "Matcher with linear distribute: rows exceed limit %d", + MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX); + return -EOPNOTSUPP; +@@ -789,6 +813,10 @@ hws_matcher_process_attr(struct mlx5hws_cmd_query_caps *caps, + struct mlx5hws_matcher *matcher) + { + struct mlx5hws_matcher_attr *attr = &matcher->attr; ++ union mlx5hws_matcher_size *size_rx, *size_tx; ++ ++ size_rx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX]; ++ size_tx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX]; + + if (hws_matcher_validate_insert_mode(caps, matcher)) + return -EOPNOTSUPP; +@@ -800,8 +828,12 @@ hws_matcher_process_attr(struct mlx5hws_cmd_query_caps *caps, + + /* Convert number of rules to the required depth */ + if (attr->mode == MLX5HWS_MATCHER_RESOURCE_MODE_RULE && +- attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_HASH) +- attr->table.sz_col_log = hws_matcher_rules_to_tbl_depth(attr->rule.num_log); ++ attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_HASH) { ++ size_rx->table.sz_col_log = ++ hws_matcher_rules_to_tbl_depth(size_rx->rule.num_log); ++ size_tx->table.sz_col_log = ++ hws_matcher_rules_to_tbl_depth(size_tx->rule.num_log); ++ } + + matcher->flags |= attr->resizable ? MLX5HWS_MATCHER_FLAGS_RESIZABLE : 0; + matcher->flags |= attr->isolated_matcher_end_ft_id ? +@@ -862,14 +894,19 @@ static int + hws_matcher_create_col_matcher(struct mlx5hws_matcher *matcher) + { + struct mlx5hws_context *ctx = matcher->tbl->ctx; ++ union mlx5hws_matcher_size *size_rx, *size_tx; + struct mlx5hws_matcher *col_matcher; +- int ret; ++ int i, ret; ++ ++ size_rx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_RX]; ++ size_tx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_TX]; + + if (matcher->attr.mode != MLX5HWS_MATCHER_RESOURCE_MODE_RULE || + matcher->attr.insert_mode == MLX5HWS_MATCHER_INSERT_BY_INDEX) + return 0; + +- if (!hws_matcher_requires_col_tbl(matcher->attr.rule.num_log)) ++ if (!hws_matcher_requires_col_tbl(size_rx->rule.num_log) && ++ !hws_matcher_requires_col_tbl(size_tx->rule.num_log)) + return 0; + + col_matcher = kzalloc(sizeof(*matcher), GFP_KERNEL); +@@ -886,10 +923,16 @@ hws_matcher_create_col_matcher(struct mlx5hws_matcher *matcher) + col_matcher->flags |= MLX5HWS_MATCHER_FLAGS_COLLISION; + col_matcher->attr.mode = MLX5HWS_MATCHER_RESOURCE_MODE_HTABLE; + col_matcher->attr.optimize_flow_src = matcher->attr.optimize_flow_src; +- col_matcher->attr.table.sz_row_log = matcher->attr.rule.num_log; +- col_matcher->attr.table.sz_col_log = MLX5HWS_MATCHER_ASSURED_COL_TBL_DEPTH; +- if (col_matcher->attr.table.sz_row_log > MLX5HWS_MATCHER_ASSURED_ROW_RATIO) +- col_matcher->attr.table.sz_row_log -= MLX5HWS_MATCHER_ASSURED_ROW_RATIO; ++ for (i = 0; i < 2; i++) { ++ union mlx5hws_matcher_size *dst = &col_matcher->attr.size[i]; ++ union mlx5hws_matcher_size *src = &matcher->attr.size[i]; ++ ++ dst->table.sz_row_log = src->rule.num_log; ++ dst->table.sz_col_log = MLX5HWS_MATCHER_ASSURED_COL_TBL_DEPTH; ++ if (dst->table.sz_row_log > MLX5HWS_MATCHER_ASSURED_ROW_RATIO) ++ dst->table.sz_row_log -= ++ MLX5HWS_MATCHER_ASSURED_ROW_RATIO; ++ } + + col_matcher->attr.max_num_of_at_attach = matcher->attr.max_num_of_at_attach; + col_matcher->attr.isolated_matcher_end_ft_id = +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index a1295a311b70..59c14745ed0c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -93,6 +93,23 @@ enum mlx5hws_matcher_distribute_mode { + MLX5HWS_MATCHER_DISTRIBUTE_BY_LINEAR = 0x1, + }; + ++enum mlx5hws_matcher_size_type { ++ MLX5HWS_MATCHER_SIZE_TYPE_RX, ++ MLX5HWS_MATCHER_SIZE_TYPE_TX, ++ MLX5HWS_MATCHER_SIZE_TYPE_MAX, ++}; ++ ++union mlx5hws_matcher_size { ++ struct { ++ u8 sz_row_log; ++ u8 sz_col_log; ++ } table; ++ ++ struct { ++ u8 num_log; ++ } rule; ++}; ++ + struct mlx5hws_matcher_attr { + /* Processing priority inside table */ + u32 priority; +@@ -107,16 +124,7 @@ struct mlx5hws_matcher_attr { + enum mlx5hws_matcher_distribute_mode distribute_mode; + /* Define whether the created matcher supports resizing into a bigger matcher */ + bool resizable; +- union { +- struct { +- u8 sz_row_log; +- u8 sz_col_log; +- } table; +- +- struct { +- u8 num_log; +- } rule; +- }; ++ union mlx5hws_matcher_size size[MLX5HWS_MATCHER_SIZE_TYPE_MAX]; + /* Optional AT attach configuration - Max number of additional AT */ + u8 max_num_of_at_attach; + /* Optional end FT (miss FT ID) for match RTC (for isolated matcher) */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1428-net-mlx5-hws-track-matcher-sizes-individually.patch b/SOURCES/1428-net-mlx5-hws-track-matcher-sizes-individually.patch new file mode 100644 index 000000000..1bf60aef5 --- /dev/null +++ b/SOURCES/1428-net-mlx5-hws-track-matcher-sizes-individually.patch @@ -0,0 +1,472 @@ +From 913991117a976a1753529cf7bc98f969cf0f7b91 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: HWS, Track matcher sizes individually + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6b44fffdc7b792e1d32b104c76504c3722e9704f +Author: Vlad Dogaru +Date: Thu Jul 3 21:54:28 2025 +0300 + + net/mlx5: HWS, Track matcher sizes individually + + Track and grow matcher sizes individually for RX and TX RTCs. This + allows RX-only or TX-only use cases to effectively halve the device + resources they use. + + For testing we used a simple module that inserts 1M RX-only rules and + measured the number of pages the device requests, and memory usage as + reported by `free -h`. + + Pages Memory + Before this patch: 300k 1.5GiB + After this patch: 160k 900MiB + + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250703185431.445571-8-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 009641e6c874..516634237cb8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -93,12 +93,11 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + + hws_bwc_matcher_init_attr(bwc_matcher, + priority, +- MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG, +- MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG, ++ bwc_matcher->rx_size.size_log, ++ bwc_matcher->tx_size.size_log, + &attr); + + bwc_matcher->priority = priority; +- bwc_matcher->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG; + + bwc_matcher->size_of_at_array = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM; + bwc_matcher->at = kcalloc(bwc_matcher->size_of_at_array, +@@ -150,6 +149,20 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + return -EINVAL; + } + ++static void ++hws_bwc_matcher_init_size_rxtx(struct mlx5hws_bwc_matcher_size *size) ++{ ++ size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG; ++ atomic_set(&size->num_of_rules, 0); ++ atomic_set(&size->rehash_required, false); ++} ++ ++static void hws_bwc_matcher_init_size(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ hws_bwc_matcher_init_size_rxtx(&bwc_matcher->rx_size); ++ hws_bwc_matcher_init_size_rxtx(&bwc_matcher->tx_size); ++} ++ + struct mlx5hws_bwc_matcher * + mlx5hws_bwc_matcher_create(struct mlx5hws_table *table, + u32 priority, +@@ -170,8 +183,7 @@ mlx5hws_bwc_matcher_create(struct mlx5hws_table *table, + if (!bwc_matcher) + return NULL; + +- atomic_set(&bwc_matcher->num_of_rules, 0); +- atomic_set(&bwc_matcher->rehash_required, false); ++ hws_bwc_matcher_init_size(bwc_matcher); + + /* Check if the required match params can be all matched + * in single STE, otherwise complex matcher is needed. +@@ -221,12 +233,13 @@ int mlx5hws_bwc_matcher_destroy_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + + int mlx5hws_bwc_matcher_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) + { +- u32 num_of_rules = atomic_read(&bwc_matcher->num_of_rules); ++ u32 rx_rules = atomic_read(&bwc_matcher->rx_size.num_of_rules); ++ u32 tx_rules = atomic_read(&bwc_matcher->tx_size.num_of_rules); + +- if (num_of_rules) ++ if (rx_rules || tx_rules) + mlx5hws_err(bwc_matcher->matcher->tbl->ctx, +- "BWC matcher destroy: matcher still has %d rules\n", +- num_of_rules); ++ "BWC matcher destroy: matcher still has %u RX and %u TX rules\n", ++ rx_rules, tx_rules); + + if (bwc_matcher->complex) + mlx5hws_bwc_matcher_destroy_complex(bwc_matcher); +@@ -386,6 +399,16 @@ hws_bwc_rule_destroy_hws_sync(struct mlx5hws_bwc_rule *bwc_rule, + return 0; + } + ++static void hws_bwc_rule_cnt_dec(struct mlx5hws_bwc_rule *bwc_rule) ++{ ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ ++ if (!bwc_rule->skip_rx) ++ atomic_dec(&bwc_matcher->rx_size.num_of_rules); ++ if (!bwc_rule->skip_tx) ++ atomic_dec(&bwc_matcher->tx_size.num_of_rules); ++} ++ + int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule) + { + struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; +@@ -402,7 +425,7 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule) + mutex_lock(queue_lock); + + ret = hws_bwc_rule_destroy_hws_sync(bwc_rule, &attr); +- atomic_dec(&bwc_matcher->num_of_rules); ++ hws_bwc_rule_cnt_dec(bwc_rule); + hws_bwc_rule_list_remove(bwc_rule); + + mutex_unlock(queue_lock); +@@ -489,25 +512,27 @@ hws_bwc_rule_update_sync(struct mlx5hws_bwc_rule *bwc_rule, + } + + static bool +-hws_bwc_matcher_size_maxed_out(struct mlx5hws_bwc_matcher *bwc_matcher) ++hws_bwc_matcher_size_maxed_out(struct mlx5hws_bwc_matcher *bwc_matcher, ++ struct mlx5hws_bwc_matcher_size *size) + { + struct mlx5hws_cmd_query_caps *caps = bwc_matcher->matcher->tbl->ctx->caps; + + /* check the match RTC size */ +- return (bwc_matcher->size_log + MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH + ++ return (size->size_log + MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH + + MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP) > + (caps->ste_alloc_log_max - 1); + } + + static bool + hws_bwc_matcher_rehash_size_needed(struct mlx5hws_bwc_matcher *bwc_matcher, ++ struct mlx5hws_bwc_matcher_size *size, + u32 num_of_rules) + { +- if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher))) ++ if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher, size))) + return false; + + if (unlikely((num_of_rules * 100 / MLX5HWS_BWC_MATCHER_REHASH_PERCENT_TH) >= +- (1UL << bwc_matcher->size_log))) ++ (1UL << size->size_log))) + return true; + + return false; +@@ -564,20 +589,21 @@ hws_bwc_matcher_extend_at(struct mlx5hws_bwc_matcher *bwc_matcher, + } + + static int +-hws_bwc_matcher_extend_size(struct mlx5hws_bwc_matcher *bwc_matcher) ++hws_bwc_matcher_extend_size(struct mlx5hws_bwc_matcher *bwc_matcher, ++ struct mlx5hws_bwc_matcher_size *size) + { + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; + struct mlx5hws_cmd_query_caps *caps = ctx->caps; + +- if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher))) { ++ if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher, size))) { + mlx5hws_err(ctx, "Can't resize matcher: depth exceeds limit %d\n", + caps->rtc_log_depth_max); + return -ENOMEM; + } + +- bwc_matcher->size_log = +- min(bwc_matcher->size_log + MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP, +- caps->ste_alloc_log_max - MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH); ++ size->size_log = min(size->size_log + MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP, ++ caps->ste_alloc_log_max - ++ MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH); + + return 0; + } +@@ -697,8 +723,8 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) + + hws_bwc_matcher_init_attr(bwc_matcher, + bwc_matcher->priority, +- bwc_matcher->size_log, +- bwc_matcher->size_log, ++ bwc_matcher->rx_size.size_log, ++ bwc_matcher->tx_size.size_log, + &matcher_attr); + + old_matcher = bwc_matcher->matcher; +@@ -736,21 +762,39 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) + static int + hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher) + { ++ bool need_rx_rehash, need_tx_rehash; + int ret; + +- /* If the current matcher size is already at its max size, we can't +- * do the rehash. Skip it and try adding the rule again - perhaps +- * there was some change. ++ need_rx_rehash = atomic_read(&bwc_matcher->rx_size.rehash_required); ++ need_tx_rehash = atomic_read(&bwc_matcher->tx_size.rehash_required); ++ ++ /* It is possible that another rule has already performed rehash. ++ * Need to check again if we really need rehash. + */ +- if (hws_bwc_matcher_size_maxed_out(bwc_matcher)) ++ if (!need_rx_rehash && !need_tx_rehash) + return 0; + +- /* It is possible that other rule has already performed rehash. +- * Need to check again if we really need rehash. ++ /* If the current matcher RX/TX size is already at its max size, ++ * it can't be rehashed. + */ +- if (!atomic_read(&bwc_matcher->rehash_required) && +- !hws_bwc_matcher_rehash_size_needed(bwc_matcher, +- atomic_read(&bwc_matcher->num_of_rules))) ++ if (need_rx_rehash && ++ hws_bwc_matcher_size_maxed_out(bwc_matcher, ++ &bwc_matcher->rx_size)) { ++ atomic_set(&bwc_matcher->rx_size.rehash_required, false); ++ need_rx_rehash = false; ++ } ++ if (need_tx_rehash && ++ hws_bwc_matcher_size_maxed_out(bwc_matcher, ++ &bwc_matcher->tx_size)) { ++ atomic_set(&bwc_matcher->tx_size.rehash_required, false); ++ need_tx_rehash = false; ++ } ++ ++ /* If both RX and TX rehash flags are now off, it means that whatever ++ * we wanted to rehash is now at its max size - no rehash can be done. ++ * Return and try adding the rule again - perhaps there was some change. ++ */ ++ if (!need_rx_rehash && !need_tx_rehash) + return 0; + + /* Now we're done all the checking - do the rehash: +@@ -759,12 +803,22 @@ hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher) + * - move all the rules to the new matcher + * - destroy the old matcher + */ ++ atomic_set(&bwc_matcher->rx_size.rehash_required, false); ++ atomic_set(&bwc_matcher->tx_size.rehash_required, false); + +- atomic_set(&bwc_matcher->rehash_required, false); ++ if (need_rx_rehash) { ++ ret = hws_bwc_matcher_extend_size(bwc_matcher, ++ &bwc_matcher->rx_size); ++ if (ret) ++ return ret; ++ } + +- ret = hws_bwc_matcher_extend_size(bwc_matcher); +- if (ret) +- return ret; ++ if (need_tx_rehash) { ++ ret = hws_bwc_matcher_extend_size(bwc_matcher, ++ &bwc_matcher->tx_size); ++ if (ret) ++ return ret; ++ } + + return hws_bwc_matcher_move(bwc_matcher); + } +@@ -816,6 +870,62 @@ static int hws_bwc_rule_get_at_idx(struct mlx5hws_bwc_rule *bwc_rule, + return at_idx; + } + ++static void hws_bwc_rule_cnt_inc_rxtx(struct mlx5hws_bwc_rule *bwc_rule, ++ struct mlx5hws_bwc_matcher_size *size) ++{ ++ u32 num_of_rules = atomic_inc_return(&size->num_of_rules); ++ ++ if (unlikely(hws_bwc_matcher_rehash_size_needed(bwc_rule->bwc_matcher, ++ size, num_of_rules))) ++ atomic_set(&size->rehash_required, true); ++} ++ ++static void hws_bwc_rule_cnt_inc(struct mlx5hws_bwc_rule *bwc_rule) ++{ ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ ++ if (!bwc_rule->skip_rx) ++ hws_bwc_rule_cnt_inc_rxtx(bwc_rule, &bwc_matcher->rx_size); ++ if (!bwc_rule->skip_tx) ++ hws_bwc_rule_cnt_inc_rxtx(bwc_rule, &bwc_matcher->tx_size); ++} ++ ++static int hws_bwc_rule_cnt_inc_with_rehash(struct mlx5hws_bwc_rule *bwc_rule, ++ u16 bwc_queue_idx) ++{ ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mutex *queue_lock; /* Protect the queue */ ++ int ret; ++ ++ hws_bwc_rule_cnt_inc(bwc_rule); ++ ++ if (!atomic_read(&bwc_matcher->rx_size.rehash_required) && ++ !atomic_read(&bwc_matcher->tx_size.rehash_required)) ++ return 0; ++ ++ queue_lock = hws_bwc_get_queue_lock(ctx, bwc_queue_idx); ++ mutex_unlock(queue_lock); ++ ++ hws_bwc_lock_all_queues(ctx); ++ ret = hws_bwc_matcher_rehash_size(bwc_matcher); ++ hws_bwc_unlock_all_queues(ctx); ++ ++ mutex_lock(queue_lock); ++ ++ if (likely(!ret)) ++ return 0; ++ ++ /* Failed to rehash. Print a diagnostic and rollback the counters. */ ++ mlx5hws_err(ctx, ++ "BWC rule insertion: rehash to sizes [%d, %d] failed (%d)\n", ++ bwc_matcher->rx_size.size_log, ++ bwc_matcher->tx_size.size_log, ret); ++ hws_bwc_rule_cnt_dec(bwc_rule); ++ ++ return ret; ++} ++ + int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + u32 *match_param, + struct mlx5hws_rule_action rule_actions[], +@@ -826,7 +936,6 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; + struct mlx5hws_rule_attr rule_attr; + struct mutex *queue_lock; /* Protect the queue */ +- u32 num_of_rules; + int ret = 0; + int at_idx; + +@@ -844,26 +953,10 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + return -EINVAL; + } + +- /* check if number of rules require rehash */ +- num_of_rules = atomic_inc_return(&bwc_matcher->num_of_rules); +- +- if (unlikely(hws_bwc_matcher_rehash_size_needed(bwc_matcher, num_of_rules))) { ++ ret = hws_bwc_rule_cnt_inc_with_rehash(bwc_rule, bwc_queue_idx); ++ if (unlikely(ret)) { + mutex_unlock(queue_lock); +- +- hws_bwc_lock_all_queues(ctx); +- ret = hws_bwc_matcher_rehash_size(bwc_matcher); +- hws_bwc_unlock_all_queues(ctx); +- +- if (ret) { +- mlx5hws_err(ctx, "BWC rule insertion: rehash size [%d -> %d] failed (%d)\n", +- bwc_matcher->size_log - MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP, +- bwc_matcher->size_log, +- ret); +- atomic_dec(&bwc_matcher->num_of_rules); +- return ret; +- } +- +- mutex_lock(queue_lock); ++ return ret; + } + + ret = hws_bwc_rule_create_sync(bwc_rule, +@@ -881,8 +974,11 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + * It could be because there was collision, or some other problem. + * Try rehash by size and insert rule again - last chance. + */ ++ if (!bwc_rule->skip_rx) ++ atomic_set(&bwc_matcher->rx_size.rehash_required, true); ++ if (!bwc_rule->skip_tx) ++ atomic_set(&bwc_matcher->tx_size.rehash_required, true); + +- atomic_set(&bwc_matcher->rehash_required, true); + mutex_unlock(queue_lock); + + hws_bwc_lock_all_queues(ctx); +@@ -891,7 +987,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + + if (ret) { + mlx5hws_err(ctx, "BWC rule insertion: rehash failed (%d)\n", ret); +- atomic_dec(&bwc_matcher->num_of_rules); ++ hws_bwc_rule_cnt_dec(bwc_rule); + return ret; + } + +@@ -907,7 +1003,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + if (unlikely(ret)) { + mutex_unlock(queue_lock); + mlx5hws_err(ctx, "BWC rule insertion failed (%d)\n", ret); +- atomic_dec(&bwc_matcher->num_of_rules); ++ hws_bwc_rule_cnt_dec(bwc_rule); + return ret; + } + +@@ -937,6 +1033,10 @@ mlx5hws_bwc_rule_create(struct mlx5hws_bwc_matcher *bwc_matcher, + if (unlikely(!bwc_rule)) + return NULL; + ++ bwc_rule->flow_source = flow_source; ++ mlx5hws_rule_skip(bwc_matcher->matcher, flow_source, ++ &bwc_rule->skip_rx, &bwc_rule->skip_tx); ++ + bwc_queue_idx = hws_bwc_gen_queue_idx(ctx); + + if (bwc_matcher->complex) +@@ -972,7 +1072,8 @@ hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + + idx = bwc_rule->bwc_queue_idx; + +- mlx5hws_bwc_rule_fill_attr(bwc_matcher, idx, 0, &rule_attr); ++ mlx5hws_bwc_rule_fill_attr(bwc_matcher, idx, bwc_rule->flow_source, ++ &rule_attr); + queue_lock = hws_bwc_get_queue_lock(ctx, idx); + + mutex_lock(queue_lock); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +index d21fc247a510..af391d70c14f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +@@ -19,6 +19,13 @@ + #define MLX5HWS_BWC_POLLING_TIMEOUT 60 + + struct mlx5hws_bwc_matcher_complex_data; ++ ++struct mlx5hws_bwc_matcher_size { ++ u8 size_log; ++ atomic_t num_of_rules; ++ atomic_t rehash_required; ++}; ++ + struct mlx5hws_bwc_matcher { + struct mlx5hws_matcher *matcher; + struct mlx5hws_match_template *mt; +@@ -27,10 +34,9 @@ struct mlx5hws_bwc_matcher { + struct mlx5hws_bwc_matcher *complex_first_bwc_matcher; + u8 num_of_at; + u8 size_of_at_array; +- u8 size_log; + u32 priority; +- atomic_t num_of_rules; +- atomic_t rehash_required; ++ struct mlx5hws_bwc_matcher_size rx_size; ++ struct mlx5hws_bwc_matcher_size tx_size; + struct list_head *rules; + }; + +@@ -39,7 +45,10 @@ struct mlx5hws_bwc_rule { + struct mlx5hws_rule *rule; + struct mlx5hws_bwc_rule *isolated_bwc_rule; + struct mlx5hws_bwc_complex_rule_hash_node *complex_hash_node; ++ u32 flow_source; + u16 bwc_queue_idx; ++ bool skip_rx; ++ bool skip_tx; + struct list_head list_node; + }; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1429-net-mlx5-hws-rearrange-to-prevent-forward-declaration.patch b/SOURCES/1429-net-mlx5-hws-rearrange-to-prevent-forward-declaration.patch new file mode 100644 index 000000000..36d8dc67a --- /dev/null +++ b/SOURCES/1429-net-mlx5-hws-rearrange-to-prevent-forward-declaration.patch @@ -0,0 +1,293 @@ +From cf75c92f3fa3b78a47f3cbe408f25616a0df6c02 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: HWS, Rearrange to prevent forward declaration + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 29063103f864fb63f7f7c436e670c5804df1b55b +Author: Yevgeny Kliteynik +Date: Thu Jul 3 21:54:29 2025 +0300 + + net/mlx5: HWS, Rearrange to prevent forward declaration + + As a preparation for the following patch that will add support + for shrinking empty matchers, rearrange the code to prevent + forward declaration of functions. + + Signed-off-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250703185431.445571-9-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 516634237cb8..15d817cbcd9d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -71,6 +71,130 @@ static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + first_matcher ? first_matcher->matcher->end_ft_id : 0; + } + ++static int ++hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ bool move_error = false, poll_error = false, drain_error = false; ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mlx5hws_matcher *matcher = bwc_matcher->matcher; ++ u16 bwc_queues = mlx5hws_bwc_queues(ctx); ++ struct mlx5hws_rule_attr rule_attr; ++ struct mlx5hws_bwc_rule *bwc_rule; ++ struct mlx5hws_send_engine *queue; ++ struct list_head *rules_list; ++ u32 pending_rules; ++ int i, ret = 0; ++ ++ mlx5hws_bwc_rule_fill_attr(bwc_matcher, 0, 0, &rule_attr); ++ ++ for (i = 0; i < bwc_queues; i++) { ++ if (list_empty(&bwc_matcher->rules[i])) ++ continue; ++ ++ pending_rules = 0; ++ rule_attr.queue_id = mlx5hws_bwc_get_queue_id(ctx, i); ++ rules_list = &bwc_matcher->rules[i]; ++ ++ list_for_each_entry(bwc_rule, rules_list, list_node) { ++ ret = mlx5hws_matcher_resize_rule_move(matcher, ++ bwc_rule->rule, ++ &rule_attr); ++ if (unlikely(ret && !move_error)) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: move failed (%d), attempting to move rest of the rules\n", ++ ret); ++ move_error = true; ++ } ++ ++ pending_rules++; ++ ret = mlx5hws_bwc_queue_poll(ctx, ++ rule_attr.queue_id, ++ &pending_rules, ++ false); ++ if (unlikely(ret && !poll_error)) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: poll failed (%d), attempting to move rest of the rules\n", ++ ret); ++ poll_error = true; ++ } ++ } ++ ++ if (pending_rules) { ++ queue = &ctx->send_queue[rule_attr.queue_id]; ++ mlx5hws_send_engine_flush_queue(queue); ++ ret = mlx5hws_bwc_queue_poll(ctx, ++ rule_attr.queue_id, ++ &pending_rules, ++ true); ++ if (unlikely(ret && !drain_error)) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: drain failed (%d), attempting to move rest of the rules\n", ++ ret); ++ drain_error = true; ++ } ++ } ++ } ++ ++ if (move_error || poll_error || drain_error) ++ ret = -EINVAL; ++ ++ return ret; ++} ++ ++static int hws_bwc_matcher_move_all(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ if (!bwc_matcher->complex) ++ return hws_bwc_matcher_move_all_simple(bwc_matcher); ++ ++ return mlx5hws_bwc_matcher_move_all_complex(bwc_matcher); ++} ++ ++static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mlx5hws_matcher_attr matcher_attr = {0}; ++ struct mlx5hws_matcher *old_matcher; ++ struct mlx5hws_matcher *new_matcher; ++ int ret; ++ ++ hws_bwc_matcher_init_attr(bwc_matcher, ++ bwc_matcher->priority, ++ bwc_matcher->rx_size.size_log, ++ bwc_matcher->tx_size.size_log, ++ &matcher_attr); ++ ++ old_matcher = bwc_matcher->matcher; ++ new_matcher = mlx5hws_matcher_create(old_matcher->tbl, ++ &bwc_matcher->mt, 1, ++ bwc_matcher->at, ++ bwc_matcher->num_of_at, ++ &matcher_attr); ++ if (!new_matcher) { ++ mlx5hws_err(ctx, "Rehash error: matcher creation failed\n"); ++ return -ENOMEM; ++ } ++ ++ ret = mlx5hws_matcher_resize_set_target(old_matcher, new_matcher); ++ if (ret) { ++ mlx5hws_err(ctx, "Rehash error: failed setting resize target\n"); ++ return ret; ++ } ++ ++ ret = hws_bwc_matcher_move_all(bwc_matcher); ++ if (ret) ++ mlx5hws_err(ctx, "Rehash error: moving rules failed, attempting to remove the old matcher\n"); ++ ++ /* Error during rehash can't be rolled back. ++ * The best option here is to allow the rehash to complete and remove ++ * the old matcher - can't leave the matcher in the 'in_resize' state. ++ */ ++ ++ bwc_matcher->matcher = new_matcher; ++ mlx5hws_matcher_destroy(old_matcher); ++ ++ return ret; ++} ++ + int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + struct mlx5hws_table *table, + u32 priority, +@@ -636,129 +760,6 @@ hws_bwc_matcher_find_at(struct mlx5hws_bwc_matcher *bwc_matcher, + return -1; + } + +-static int hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) +-{ +- bool move_error = false, poll_error = false, drain_error = false; +- struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; +- struct mlx5hws_matcher *matcher = bwc_matcher->matcher; +- u16 bwc_queues = mlx5hws_bwc_queues(ctx); +- struct mlx5hws_rule_attr rule_attr; +- struct mlx5hws_bwc_rule *bwc_rule; +- struct mlx5hws_send_engine *queue; +- struct list_head *rules_list; +- u32 pending_rules; +- int i, ret = 0; +- +- mlx5hws_bwc_rule_fill_attr(bwc_matcher, 0, 0, &rule_attr); +- +- for (i = 0; i < bwc_queues; i++) { +- if (list_empty(&bwc_matcher->rules[i])) +- continue; +- +- pending_rules = 0; +- rule_attr.queue_id = mlx5hws_bwc_get_queue_id(ctx, i); +- rules_list = &bwc_matcher->rules[i]; +- +- list_for_each_entry(bwc_rule, rules_list, list_node) { +- ret = mlx5hws_matcher_resize_rule_move(matcher, +- bwc_rule->rule, +- &rule_attr); +- if (unlikely(ret && !move_error)) { +- mlx5hws_err(ctx, +- "Moving BWC rule: move failed (%d), attempting to move rest of the rules\n", +- ret); +- move_error = true; +- } +- +- pending_rules++; +- ret = mlx5hws_bwc_queue_poll(ctx, +- rule_attr.queue_id, +- &pending_rules, +- false); +- if (unlikely(ret && !poll_error)) { +- mlx5hws_err(ctx, +- "Moving BWC rule: poll failed (%d), attempting to move rest of the rules\n", +- ret); +- poll_error = true; +- } +- } +- +- if (pending_rules) { +- queue = &ctx->send_queue[rule_attr.queue_id]; +- mlx5hws_send_engine_flush_queue(queue); +- ret = mlx5hws_bwc_queue_poll(ctx, +- rule_attr.queue_id, +- &pending_rules, +- true); +- if (unlikely(ret && !drain_error)) { +- mlx5hws_err(ctx, +- "Moving BWC rule: drain failed (%d), attempting to move rest of the rules\n", +- ret); +- drain_error = true; +- } +- } +- } +- +- if (move_error || poll_error || drain_error) +- ret = -EINVAL; +- +- return ret; +-} +- +-static int hws_bwc_matcher_move_all(struct mlx5hws_bwc_matcher *bwc_matcher) +-{ +- if (!bwc_matcher->complex) +- return hws_bwc_matcher_move_all_simple(bwc_matcher); +- +- return mlx5hws_bwc_matcher_move_all_complex(bwc_matcher); +-} +- +-static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) +-{ +- struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; +- struct mlx5hws_matcher_attr matcher_attr = {0}; +- struct mlx5hws_matcher *old_matcher; +- struct mlx5hws_matcher *new_matcher; +- int ret; +- +- hws_bwc_matcher_init_attr(bwc_matcher, +- bwc_matcher->priority, +- bwc_matcher->rx_size.size_log, +- bwc_matcher->tx_size.size_log, +- &matcher_attr); +- +- old_matcher = bwc_matcher->matcher; +- new_matcher = mlx5hws_matcher_create(old_matcher->tbl, +- &bwc_matcher->mt, 1, +- bwc_matcher->at, +- bwc_matcher->num_of_at, +- &matcher_attr); +- if (!new_matcher) { +- mlx5hws_err(ctx, "Rehash error: matcher creation failed\n"); +- return -ENOMEM; +- } +- +- ret = mlx5hws_matcher_resize_set_target(old_matcher, new_matcher); +- if (ret) { +- mlx5hws_err(ctx, "Rehash error: failed setting resize target\n"); +- return ret; +- } +- +- ret = hws_bwc_matcher_move_all(bwc_matcher); +- if (ret) +- mlx5hws_err(ctx, "Rehash error: moving rules failed, attempting to remove the old matcher\n"); +- +- /* Error during rehash can't be rolled back. +- * The best option here is to allow the rehash to complete and remove +- * the old matcher - can't leave the matcher in the 'in_resize' state. +- */ +- +- bwc_matcher->matcher = new_matcher; +- mlx5hws_matcher_destroy(old_matcher); +- +- return ret; +-} +- + static int + hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher) + { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1430-net-mlx5-hws-shrink-empty-matchers.patch b/SOURCES/1430-net-mlx5-hws-shrink-empty-matchers.patch new file mode 100644 index 000000000..b3bfc0ab9 --- /dev/null +++ b/SOURCES/1430-net-mlx5-hws-shrink-empty-matchers.patch @@ -0,0 +1,127 @@ +From e075f373741f00716f58d60a197e0a08abb15ffa Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: HWS, Shrink empty matchers + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 96e4c4a1a5bc6b6bc1ba48c6dfd2246df24f2f63 +Author: Yevgeny Kliteynik +Date: Thu Jul 3 21:54:30 2025 +0300 + + net/mlx5: HWS, Shrink empty matchers + + Matcher size is dynamic: it starts at initial size, and then it grows + through rehash as more and more rules are added to this matcher. + When rules are deleted, matcher's size is not decreased. Rehash + approach is greedy. The idea is: if the matcher got to a certain size + at some point, chances are - it will get to this size again, so it is + better to avoid costly rehash operations whenever possible. + + However, when all the rules of the matcher are deleted, this should + be viewed as special case. If the matcher actually got to the point + where it has zero rules, it might be an indication that some usecase + from the past is no longer happening. This is where some ICM can be + freed. + + This patch handles this case: when a number of rules in a matcher + goes down to zero, the matcher's tables are shrunk to the initial + size. + + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Signed-off-by: Mark Bloch + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250703185431.445571-10-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 15d817cbcd9d..92de4b761a83 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -533,6 +533,70 @@ static void hws_bwc_rule_cnt_dec(struct mlx5hws_bwc_rule *bwc_rule) + atomic_dec(&bwc_matcher->tx_size.num_of_rules); + } + ++static int ++hws_bwc_matcher_rehash_shrink(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ struct mlx5hws_bwc_matcher_size *rx_size = &bwc_matcher->rx_size; ++ struct mlx5hws_bwc_matcher_size *tx_size = &bwc_matcher->tx_size; ++ ++ /* It is possible that another thread has added a rule. ++ * Need to check again if we really need rehash/shrink. ++ */ ++ if (atomic_read(&rx_size->num_of_rules) || ++ atomic_read(&tx_size->num_of_rules)) ++ return 0; ++ ++ /* If the current matcher RX/TX size is already at its initial size. */ ++ if (rx_size->size_log == MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG && ++ tx_size->size_log == MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG) ++ return 0; ++ ++ /* Now we've done all the checking - do the shrinking: ++ * - reset match RTC size to the initial size ++ * - create new matcher ++ * - move the rules, which will not do anything as the matcher is empty ++ * - destroy the old matcher ++ */ ++ ++ rx_size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG; ++ tx_size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG; ++ ++ return hws_bwc_matcher_move(bwc_matcher); ++} ++ ++static int hws_bwc_rule_cnt_dec_with_shrink(struct mlx5hws_bwc_rule *bwc_rule, ++ u16 bwc_queue_idx) ++{ ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mutex *queue_lock; /* Protect the queue */ ++ int ret; ++ ++ hws_bwc_rule_cnt_dec(bwc_rule); ++ ++ if (atomic_read(&bwc_matcher->rx_size.num_of_rules) || ++ atomic_read(&bwc_matcher->tx_size.num_of_rules)) ++ return 0; ++ ++ /* Matcher has no more rules - shrink it to save ICM. */ ++ ++ queue_lock = hws_bwc_get_queue_lock(ctx, bwc_queue_idx); ++ mutex_unlock(queue_lock); ++ ++ hws_bwc_lock_all_queues(ctx); ++ ret = hws_bwc_matcher_rehash_shrink(bwc_matcher); ++ hws_bwc_unlock_all_queues(ctx); ++ ++ mutex_lock(queue_lock); ++ ++ if (unlikely(ret)) ++ mlx5hws_err(ctx, ++ "BWC rule deletion: shrinking empty matcher failed (%d)\n", ++ ret); ++ ++ return ret; ++} ++ + int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule) + { + struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; +@@ -549,8 +613,8 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule) + mutex_lock(queue_lock); + + ret = hws_bwc_rule_destroy_hws_sync(bwc_rule, &attr); +- hws_bwc_rule_cnt_dec(bwc_rule); + hws_bwc_rule_list_remove(bwc_rule); ++ hws_bwc_rule_cnt_dec_with_shrink(bwc_rule, idx); + + mutex_unlock(queue_lock); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1431-net-mlx5-add-hws-as-secondary-steering-mode.patch b/SOURCES/1431-net-mlx5-add-hws-as-secondary-steering-mode.patch new file mode 100644 index 000000000..5287b70dd --- /dev/null +++ b/SOURCES/1431-net-mlx5-add-hws-as-secondary-steering-mode.patch @@ -0,0 +1,43 @@ +From 301cdb8808d0dad4d74a35db3f9b28ddadb76151 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: Add HWS as secondary steering mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a9aec713d0d9d6c3d918df26c61ee42ee2c28676 +Author: Moshe Shemesh +Date: Thu Jul 3 21:54:31 2025 +0300 + + net/mlx5: Add HWS as secondary steering mode + + Add HW Steering (HWS) as a secondary option for device steering mode. If + the device does not support SW Steering (SWS), HW Steering will be used + as the default, provided it is supported. FW Steering will now be + selected as the default only if both HWS and SWS are unavailable. + + Signed-off-by: Moshe Shemesh + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250703185431.445571-11-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 02808be0e88b..0de287392c32 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -3952,6 +3952,8 @@ int mlx5_fs_core_alloc(struct mlx5_core_dev *dev) + + if (mlx5_fs_dr_is_supported(dev)) + steering->mode = MLX5_FLOW_STEERING_MODE_SMFS; ++ else if (mlx5_fs_hws_is_supported(dev)) ++ steering->mode = MLX5_FLOW_STEERING_MODE_HMFS; + else + steering->mode = MLX5_FLOW_STEERING_MODE_DMFS; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1432-net-mlx5-fix-spelling-mistake-disabliing-disabling.patch b/SOURCES/1432-net-mlx5-fix-spelling-mistake-disabliing-disabling.patch new file mode 100644 index 000000000..8f4a81c90 --- /dev/null +++ b/SOURCES/1432-net-mlx5-fix-spelling-mistake-disabliing-disabling.patch @@ -0,0 +1,38 @@ +From b68e58157bbab0aaf4bacdef12cf8b75cd66724d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] net/mlx5: Fix spelling mistake "disabliing" -> "disabling" + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 0e86f3eb83c0d90ec082ceac925a500ec6ad604e +Author: Colin Ian King +Date: Thu Jul 3 11:22:19 2025 +0100 + + net/mlx5: Fix spelling mistake "disabliing" -> "disabling" + + There is a spelling mistake in a NL_SET_ERR_MSG_MOD message. Fix it. + + Signed-off-by: Colin Ian King + Reviewed-by: Mark Bloch + Link: https://patch.msgid.link/20250703102219.1248399-1-colin.i.king@gmail.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index c24d1f584a46..e1cef8dd3b4d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -1353,7 +1353,7 @@ static int esw_qos_switch_tc_arbiter_node_to_vports( + &node->ix); + if (err) { + NL_SET_ERR_MSG_MOD(extack, +- "Failed to create scheduling element for vports node when disabliing vports TC QoS"); ++ "Failed to create scheduling element for vports node when disabling vports TC QoS"); + return err; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1433-eth-mlx5-migrate-to-the-rxfh-context-ops.patch b/SOURCES/1433-eth-mlx5-migrate-to-the-rxfh-context-ops.patch new file mode 100644 index 000000000..f4dbb322d --- /dev/null +++ b/SOURCES/1433-eth-mlx5-migrate-to-the-rxfh-context-ops.patch @@ -0,0 +1,352 @@ +From 8aa720939f44d98068823c9ed8eefadc9bf00fb1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:20 -0400 +Subject: [PATCH] eth: mlx5: migrate to the *_rxfh_context ops + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit afc55a0659a60c6b5cde2907c9454ce8fcb0844f +Author: Jakub Kicinski +Date: Mon Jul 7 11:41:13 2025 -0700 + + eth: mlx5: migrate to the *_rxfh_context ops + + Convert mlx5 to dedicated RXFH ops. This is a fairly shallow + conversion, TBH, most of the driver code stays as is, but we + let the core allocate the context ID for the driver. + + mlx5e_rx_res_rss_get_rxfh() and friends are made void, since + core only calls the driver for context 0. The second call + is right after context creation so it must exist (tm). + + Tested with drivers/net/hw/rss_ctx.py on MCX6. + + Reviewed-by: Gal Pressman + Link: https://patch.msgid.link/20250707184115.2285277-4-kuba@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +index 74cd111ee320..c68ba0e58fa6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +@@ -567,7 +567,8 @@ int mlx5e_rss_packet_merge_set_param(struct mlx5e_rss *rss, + return final_err; + } + +-int mlx5e_rss_get_rxfh(struct mlx5e_rss *rss, u32 *indir, u8 *key, u8 *hfunc, bool *symmetric) ++void mlx5e_rss_get_rxfh(struct mlx5e_rss *rss, u32 *indir, u8 *key, u8 *hfunc, ++ bool *symmetric) + { + if (indir) + memcpy(indir, rss->indir.table, +@@ -582,8 +583,6 @@ int mlx5e_rss_get_rxfh(struct mlx5e_rss *rss, u32 *indir, u8 *key, u8 *hfunc, bo + + if (symmetric) + *symmetric = rss->hash.symmetric; +- +- return 0; + } + + int mlx5e_rss_set_rxfh(struct mlx5e_rss *rss, const u32 *indir, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +index 8ac902190010..c6c1b2847cf5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +@@ -47,7 +47,8 @@ void mlx5e_rss_disable(struct mlx5e_rss *rss); + + int mlx5e_rss_packet_merge_set_param(struct mlx5e_rss *rss, + struct mlx5e_packet_merge_param *pkt_merge_param); +-int mlx5e_rss_get_rxfh(struct mlx5e_rss *rss, u32 *indir, u8 *key, u8 *hfunc, bool *symmetric); ++void mlx5e_rss_get_rxfh(struct mlx5e_rss *rss, u32 *indir, u8 *key, u8 *hfunc, ++ bool *symmetric); + int mlx5e_rss_set_rxfh(struct mlx5e_rss *rss, const u32 *indir, + const u8 *key, const u8 *hfunc, const bool *symmetric, + u32 *rqns, u32 *vhca_ids, unsigned int num_rqns); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +index 5fcbe47337b0..e5cce2df3649 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +@@ -71,17 +71,12 @@ static int mlx5e_rx_res_rss_init_def(struct mlx5e_rx_res *res, + return 0; + } + +-int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 *rss_idx, unsigned int init_nch) ++int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 rss_idx, unsigned int init_nch) + { + bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT; + struct mlx5e_rss *rss; +- int i; +- +- for (i = 1; i < MLX5E_MAX_NUM_RSS; i++) +- if (!res->rss[i]) +- break; + +- if (i == MLX5E_MAX_NUM_RSS) ++ if (WARN_ON_ONCE(res->rss[rss_idx])) + return -ENOSPC; + + rss = mlx5e_rss_init(res->mdev, inner_ft_support, res->drop_rqn, +@@ -97,8 +92,7 @@ int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 *rss_idx, unsigned int i + mlx5e_rss_enable(rss, res->rss_rqns, vhca_ids, res->rss_nch); + } + +- res->rss[i] = rss; +- *rss_idx = i; ++ res->rss[rss_idx] = rss; + + return 0; + } +@@ -193,19 +187,17 @@ void mlx5e_rx_res_rss_set_indir_uniform(struct mlx5e_rx_res *res, unsigned int n + mlx5e_rss_set_indir_uniform(res->rss[0], nch); + } + +-int mlx5e_rx_res_rss_get_rxfh(struct mlx5e_rx_res *res, u32 rss_idx, +- u32 *indir, u8 *key, u8 *hfunc, bool *symmetric) ++void mlx5e_rx_res_rss_get_rxfh(struct mlx5e_rx_res *res, u32 rss_idx, ++ u32 *indir, u8 *key, u8 *hfunc, bool *symmetric) + { +- struct mlx5e_rss *rss; ++ struct mlx5e_rss *rss = NULL; + +- if (rss_idx >= MLX5E_MAX_NUM_RSS) +- return -EINVAL; +- +- rss = res->rss[rss_idx]; +- if (!rss) +- return -ENOENT; ++ if (rss_idx < MLX5E_MAX_NUM_RSS) ++ rss = res->rss[rss_idx]; ++ if (WARN_ON_ONCE(!rss)) ++ return; + +- return mlx5e_rss_get_rxfh(rss, indir, key, hfunc, symmetric); ++ mlx5e_rss_get_rxfh(rss, indir, key, hfunc, symmetric); + } + + int mlx5e_rx_res_rss_set_rxfh(struct mlx5e_rx_res *res, u32 rss_idx, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h +index 3e09d91281af..1d049e2aa264 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h +@@ -48,8 +48,9 @@ void mlx5e_rx_res_xsk_update(struct mlx5e_rx_res *res, struct mlx5e_channels *ch + + /* Configuration API */ + void mlx5e_rx_res_rss_set_indir_uniform(struct mlx5e_rx_res *res, unsigned int nch); +-int mlx5e_rx_res_rss_get_rxfh(struct mlx5e_rx_res *res, u32 rss_idx, +- u32 *indir, u8 *key, u8 *hfunc, bool *symmetric); ++void mlx5e_rx_res_rss_get_rxfh(struct mlx5e_rx_res *res, u32 rss_idx, ++ u32 *indir, u8 *key, u8 *hfunc, ++ bool *symmetric); + int mlx5e_rx_res_rss_set_rxfh(struct mlx5e_rx_res *res, u32 rss_idx, + const u32 *indir, const u8 *key, const u8 *hfunc, + const bool *symmetric); +@@ -61,7 +62,7 @@ int mlx5e_rx_res_rss_set_hash_fields(struct mlx5e_rx_res *res, u32 rss_idx, + int mlx5e_rx_res_packet_merge_set_param(struct mlx5e_rx_res *res, + struct mlx5e_packet_merge_param *pkt_merge_param); + +-int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 *rss_idx, unsigned int init_nch); ++int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 rss_idx, unsigned int init_nch); + int mlx5e_rx_res_rss_destroy(struct mlx5e_rx_res *res, u32 rss_idx); + int mlx5e_rx_res_rss_cnt(struct mlx5e_rx_res *res); + int mlx5e_rx_res_rss_index(struct mlx5e_rx_res *res, struct mlx5e_rss *rss); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index ff0b9ab2daa0..81e819f8722c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -1480,61 +1480,121 @@ static u32 mlx5e_get_rxfh_indir_size(struct net_device *netdev) + static int mlx5e_get_rxfh(struct net_device *netdev, struct ethtool_rxfh_param *rxfh) + { + struct mlx5e_priv *priv = netdev_priv(netdev); +- u32 rss_context = rxfh->rss_context; + bool symmetric; +- int err; + + mutex_lock(&priv->state_lock); +- err = mlx5e_rx_res_rss_get_rxfh(priv->rx_res, rss_context, +- rxfh->indir, rxfh->key, &rxfh->hfunc, &symmetric); ++ mlx5e_rx_res_rss_get_rxfh(priv->rx_res, 0, rxfh->indir, rxfh->key, ++ &rxfh->hfunc, &symmetric); + mutex_unlock(&priv->state_lock); + +- if (err) +- return err; +- + if (symmetric) + rxfh->input_xfrm = RXH_XFRM_SYM_OR_XOR; + + return 0; + } + +-static int mlx5e_set_rxfh(struct net_device *dev, struct ethtool_rxfh_param *rxfh, +- struct netlink_ext_ack *extack) ++static int mlx5e_rxfh_hfunc_check(struct mlx5e_priv *priv, ++ const struct ethtool_rxfh_param *rxfh) + { +- bool symmetric = rxfh->input_xfrm == RXH_XFRM_SYM_OR_XOR; +- struct mlx5e_priv *priv = netdev_priv(dev); +- u32 *rss_context = &rxfh->rss_context; +- u8 hfunc = rxfh->hfunc; + unsigned int count; +- int err; +- +- mutex_lock(&priv->state_lock); + + count = priv->channels.params.num_channels; + +- if (hfunc == ETH_RSS_HASH_XOR) { ++ if (rxfh->hfunc == ETH_RSS_HASH_XOR) { + unsigned int xor8_max_channels = mlx5e_rqt_max_num_channels_allowed_for_xor8(); + + if (count > xor8_max_channels) { +- err = -EINVAL; + netdev_err(priv->netdev, "%s: Cannot set RSS hash function to XOR, current number of channels (%d) exceeds the maximum allowed for XOR8 RSS hfunc (%d)\n", + __func__, count, xor8_max_channels); +- goto unlock; ++ return -EINVAL; + } + } + +- if (*rss_context && rxfh->rss_delete) { +- err = mlx5e_rx_res_rss_destroy(priv->rx_res, *rss_context); ++ return 0; ++} ++ ++static int mlx5e_set_rxfh(struct net_device *dev, ++ struct ethtool_rxfh_param *rxfh, ++ struct netlink_ext_ack *extack) ++{ ++ bool symmetric = rxfh->input_xfrm == RXH_XFRM_SYM_OR_XOR; ++ struct mlx5e_priv *priv = netdev_priv(dev); ++ u8 hfunc = rxfh->hfunc; ++ int err; ++ ++ mutex_lock(&priv->state_lock); ++ ++ err = mlx5e_rxfh_hfunc_check(priv, rxfh); ++ if (err) + goto unlock; +- } + +- if (*rss_context == ETH_RXFH_CONTEXT_ALLOC) { +- err = mlx5e_rx_res_rss_init(priv->rx_res, rss_context, count); +- if (err) +- goto unlock; +- } ++ err = mlx5e_rx_res_rss_set_rxfh(priv->rx_res, rxfh->rss_context, ++ rxfh->indir, rxfh->key, ++ hfunc == ETH_RSS_HASH_NO_CHANGE ? NULL : &hfunc, ++ rxfh->input_xfrm == RXH_XFRM_NO_CHANGE ? NULL : &symmetric); ++ ++unlock: ++ mutex_unlock(&priv->state_lock); ++ return err; ++} ++ ++static int mlx5e_create_rxfh_context(struct net_device *dev, ++ struct ethtool_rxfh_context *ctx, ++ const struct ethtool_rxfh_param *rxfh, ++ struct netlink_ext_ack *extack) ++{ ++ bool symmetric = rxfh->input_xfrm == RXH_XFRM_SYM_OR_XOR; ++ struct mlx5e_priv *priv = netdev_priv(dev); ++ u8 hfunc = rxfh->hfunc; ++ int err; ++ ++ mutex_lock(&priv->state_lock); ++ ++ err = mlx5e_rxfh_hfunc_check(priv, rxfh); ++ if (err) ++ goto unlock; ++ ++ err = mlx5e_rx_res_rss_init(priv->rx_res, rxfh->rss_context, ++ priv->channels.params.num_channels); ++ if (err) ++ goto unlock; ++ ++ err = mlx5e_rx_res_rss_set_rxfh(priv->rx_res, rxfh->rss_context, ++ rxfh->indir, rxfh->key, ++ hfunc == ETH_RSS_HASH_NO_CHANGE ? NULL : &hfunc, ++ rxfh->input_xfrm == RXH_XFRM_NO_CHANGE ? NULL : &symmetric); ++ if (err) ++ goto unlock; ++ ++ mlx5e_rx_res_rss_get_rxfh(priv->rx_res, rxfh->rss_context, ++ ethtool_rxfh_context_indir(ctx), ++ ethtool_rxfh_context_key(ctx), ++ &ctx->hfunc, &symmetric); ++ if (symmetric) ++ ctx->input_xfrm = RXH_XFRM_SYM_OR_XOR; ++ ++unlock: ++ mutex_unlock(&priv->state_lock); ++ return err; ++} + +- err = mlx5e_rx_res_rss_set_rxfh(priv->rx_res, *rss_context, ++static int mlx5e_modify_rxfh_context(struct net_device *dev, ++ struct ethtool_rxfh_context *ctx, ++ const struct ethtool_rxfh_param *rxfh, ++ struct netlink_ext_ack *extack) ++{ ++ bool symmetric = rxfh->input_xfrm == RXH_XFRM_SYM_OR_XOR; ++ struct mlx5e_priv *priv = netdev_priv(dev); ++ u8 hfunc = rxfh->hfunc; ++ int err; ++ ++ mutex_lock(&priv->state_lock); ++ ++ err = mlx5e_rxfh_hfunc_check(priv, rxfh); ++ if (err) ++ goto unlock; ++ ++ err = mlx5e_rx_res_rss_set_rxfh(priv->rx_res, rxfh->rss_context, + rxfh->indir, rxfh->key, + hfunc == ETH_RSS_HASH_NO_CHANGE ? NULL : &hfunc, + rxfh->input_xfrm == RXH_XFRM_NO_CHANGE ? NULL : &symmetric); +@@ -1544,6 +1604,20 @@ static int mlx5e_set_rxfh(struct net_device *dev, struct ethtool_rxfh_param *rxf + return err; + } + ++static int mlx5e_remove_rxfh_context(struct net_device *dev, ++ struct ethtool_rxfh_context *ctx, ++ u32 rss_context, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5e_priv *priv = netdev_priv(dev); ++ int err; ++ ++ mutex_lock(&priv->state_lock); ++ err = mlx5e_rx_res_rss_destroy(priv->rx_res, rss_context); ++ mutex_unlock(&priv->state_lock); ++ return err; ++} ++ + #define MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC 100 + #define MLX5E_PFC_PREVEN_TOUT_MAX_MSEC 8000 + #define MLX5E_PFC_PREVEN_MINOR_PRECENT 85 +@@ -2659,9 +2733,9 @@ static void mlx5e_get_ts_stats(struct net_device *netdev, + + const struct ethtool_ops mlx5e_ethtool_ops = { + .cap_link_lanes_supported = true, +- .cap_rss_ctx_supported = true, + .rxfh_per_ctx_fields = true, + .rxfh_per_ctx_key = true, ++ .rxfh_max_num_contexts = MLX5E_MAX_NUM_RSS, + .supported_coalesce_params = ETHTOOL_COALESCE_USECS | + ETHTOOL_COALESCE_MAX_FRAMES | + ETHTOOL_COALESCE_USE_ADAPTIVE | +@@ -2690,6 +2764,9 @@ const struct ethtool_ops mlx5e_ethtool_ops = { + .set_rxfh = mlx5e_set_rxfh, + .get_rxfh_fields = mlx5e_get_rxfh_fields, + .set_rxfh_fields = mlx5e_set_rxfh_fields, ++ .create_rxfh_context = mlx5e_create_rxfh_context, ++ .modify_rxfh_context = mlx5e_modify_rxfh_context, ++ .remove_rxfh_context = mlx5e_remove_rxfh_context, + .get_rxnfc = mlx5e_get_rxnfc, + .set_rxnfc = mlx5e_set_rxnfc, + .get_tunable = mlx5e_get_tunable, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1434-net-mlx5e-remove-unused-vlan-insertion-logic-in-tx-path.patch b/SOURCES/1434-net-mlx5e-remove-unused-vlan-insertion-logic-in-tx-path.patch new file mode 100644 index 000000000..80c31e8be --- /dev/null +++ b/SOURCES/1434-net-mlx5e-remove-unused-vlan-insertion-logic-in-tx-path.patch @@ -0,0 +1,121 @@ +From 3d587b868f1ba20843807f7b018fb1f51b41e34e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5e: Remove unused VLAN insertion logic in TX path + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ade89d1f2486e5189b3a3f0a9e917defa4ff0779 +Author: Carolina Jubran +Date: Wed Jul 9 00:16:23 2025 +0300 + + net/mlx5e: Remove unused VLAN insertion logic in TX path + + The VLAN insertion capability (`wqe_vlan_insert`) was never enabled on + all mlx5 devices. When VLAN TX offload is advertised but this + capability is not supported, the driver uses inline headers to insert + the VLAN tag. + + To support this, the driver used to set the + `MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE` bit to enforce L2 inline mode + when `wqe_vlan_insert` was not supported. Since the capability is + disabled on all devices, this logic was always active, and the SQ flag + has become redundant. L2 inline is enforced unconditionally for + VLAN-tagged packets. + + The `skb_vlan_tag_present()` check in the else-if block of + `mlx5e_sq_xmit_wqe()` is never true by this point in the TX flow, + as the VLAN tag has already been inserted by the driver using inline + headers. As a result, this code is never executed. + + Remove the redundant SQ state, dead VLAN insertion code block, and + related logic. + + Signed-off-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1752009387-13300-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index c329de1d4f0a..866652575105 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -383,7 +383,6 @@ enum { + MLX5E_SQ_STATE_RECOVERING, + MLX5E_SQ_STATE_IPSEC, + MLX5E_SQ_STATE_DIM, +- MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE, + MLX5E_SQ_STATE_PENDING_XSK_TX, + MLX5E_SQ_STATE_PENDING_TLS_RX_RESYNC, + MLX5E_NUM_SQ_STATES, /* Must be kept last */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +index 131ed97ca997..71fb20f63bc3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +@@ -339,8 +339,6 @@ static int mlx5e_ptp_alloc_txqsq(struct mlx5e_ptp *c, int txq_ix, + sq->stats = &c->priv->ptp_stats.sq[tc]; + sq->ptpsq = ptpsq; + INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work); +- if (!MLX5_CAP_ETH(mdev, wqe_vlan_insert)) +- set_bit(MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE, &sq->state); + sq->stop_room = param->stop_room; + sq->ptp_cyc2time = mlx5_sq_ts_translator(mdev); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +index a107aad01865..2439495e36f8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +@@ -13,7 +13,6 @@ static const char * const sq_sw_state_type_name[] = { + [MLX5E_SQ_STATE_RECOVERING] = "recovering", + [MLX5E_SQ_STATE_IPSEC] = "ipsec", + [MLX5E_SQ_STATE_DIM] = "dim", +- [MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE] = "vlan_need_l2_inline", + [MLX5E_SQ_STATE_PENDING_XSK_TX] = "pending_xsk_tx", + [MLX5E_SQ_STATE_PENDING_TLS_RX_RESYNC] = "pending_tls_rx_resync", + }; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 62db56b5251f..0396ec0928d5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -1670,8 +1670,6 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c, + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + sq->max_sq_mpw_wqebbs = mlx5e_get_max_sq_aligned_wqebbs(mdev); + INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work); +- if (!MLX5_CAP_ETH(mdev, wqe_vlan_insert)) +- set_bit(MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE, &sq->state); + if (mlx5_ipsec_device_caps(c->priv->mdev)) + set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state); + if (param->is_mpw) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +index 55a8629f0792..e6a301ba3254 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +@@ -256,8 +256,7 @@ mlx5e_tx_wqe_inline_mode(struct mlx5e_txqsq *sq, struct sk_buff *skb, + + mode = sq->min_inline_mode; + +- if (skb_vlan_tag_present(skb) && +- test_bit(MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE, &sq->state)) ++ if (skb_vlan_tag_present(skb)) + mode = max_t(u8, MLX5_INLINE_MODE_L2, mode); + + return mode; +@@ -483,12 +482,6 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, + } + eseg->inline_hdr.sz |= cpu_to_be16(ihs); + dseg += wqe_attr->ds_cnt_inl; +- } else if (skb_vlan_tag_present(skb)) { +- eseg->insert.type = cpu_to_be16(MLX5_ETH_WQE_INSERT_VLAN); +- if (skb->vlan_proto == cpu_to_be16(ETH_P_8021AD)) +- eseg->insert.type |= cpu_to_be16(MLX5_ETH_WQE_SVLAN); +- eseg->insert.vlan_tci = cpu_to_be16(skb_vlan_tag_get(skb)); +- stats->added_vlan_packets++; + } + + dseg += wqe_attr->ds_cnt_ids; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1435-net-mlx5e-ct-extract-a-memcmp-from-a-spinlock-section.patch b/SOURCES/1435-net-mlx5e-ct-extract-a-memcmp-from-a-spinlock-section.patch new file mode 100644 index 000000000..80c245c50 --- /dev/null +++ b/SOURCES/1435-net-mlx5e-ct-extract-a-memcmp-from-a-spinlock-section.patch @@ -0,0 +1,57 @@ +From 5a21b2dc63b360eb3cf2a51245c079f2b343085c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5e: CT: extract a memcmp from a spinlock section + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 122d86aa2a0c8950ea958a3d258eccbb5872bd68 +Author: Cosmin Ratiu +Date: Wed Jul 9 00:16:24 2025 +0300 + + net/mlx5e: CT: extract a memcmp from a spinlock section + + This reduces the time the lock is held and reduces contention. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1752009387-13300-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +index 81332cd4a582..870d12364f99 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +@@ -1195,6 +1195,7 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft, + struct flow_action_entry *meta_action; + unsigned long cookie = flow->cookie; + struct mlx5_ct_entry *entry; ++ bool has_nat; + int err; + + meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule); +@@ -1236,6 +1237,8 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft, + err = mlx5_tc_ct_rule_to_tuple_nat(&entry->tuple_nat, flow_rule); + if (err) + goto err_set; ++ has_nat = memcmp(&entry->tuple, &entry->tuple_nat, ++ sizeof(entry->tuple)); + + spin_lock_bh(&ct_priv->ht_lock); + +@@ -1244,7 +1247,7 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft, + if (err) + goto err_entries; + +- if (memcmp(&entry->tuple, &entry->tuple_nat, sizeof(entry->tuple))) { ++ if (has_nat) { + err = rhashtable_lookup_insert_fast(&ct_priv->ct_tuples_nat_ht, + &entry->tuple_nat_node, + tuples_nat_ht_params); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1436-net-mlx5e-replace-recursive-vlan-push-handling-with-an-itera.patch b/SOURCES/1436-net-mlx5e-replace-recursive-vlan-push-handling-with-an-itera.patch new file mode 100644 index 000000000..50cee9346 --- /dev/null +++ b/SOURCES/1436-net-mlx5e-replace-recursive-vlan-push-handling-with-an-itera.patch @@ -0,0 +1,89 @@ +From d0850dc98029f973ea0cdf275fb32cf5758aedb7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5e: Replace recursive VLAN push handling with an + iterative loop + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c0ca344d796cf00c169f63f09eea0f4905778be1 +Author: Gal Pressman +Date: Wed Jul 9 00:16:25 2025 +0300 + + net/mlx5e: Replace recursive VLAN push handling with an iterative loop + + mlx5e_tc_act_vlan_add_push_action() uses tail-recursion to walk through + a stack of VLAN devices. + + There is no need for a complicated recursion with unnecessary stack + consumption and less obvious code flow, rewrite the function so that it + uses a do while loop instead. + + Signed-off-by: Gal Pressman + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1752009387-13300-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c +index a13c5e707b83..9bdb5820c553 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c +@@ -94,29 +94,30 @@ mlx5e_tc_act_vlan_add_push_action(struct mlx5e_priv *priv, + struct net_device **out_dev, + struct netlink_ext_ack *extack) + { +- struct net_device *vlan_dev = *out_dev; +- struct flow_action_entry vlan_act = { +- .id = FLOW_ACTION_VLAN_PUSH, +- .vlan.vid = vlan_dev_vlan_id(vlan_dev), +- .vlan.proto = vlan_dev_vlan_proto(vlan_dev), +- .vlan.prio = 0, +- }; +- int err; +- +- err = parse_tc_vlan_action(priv, &vlan_act, attr->esw_attr, &attr->action, extack, NULL); +- if (err) +- return err; +- +- rcu_read_lock(); +- *out_dev = dev_get_by_index_rcu(dev_net(vlan_dev), dev_get_iflink(vlan_dev)); +- rcu_read_unlock(); +- if (!*out_dev) +- return -ENODEV; ++ do { ++ struct net_device *vlan_dev = *out_dev; ++ struct flow_action_entry vlan_act = { ++ .id = FLOW_ACTION_VLAN_PUSH, ++ .vlan.vid = vlan_dev_vlan_id(vlan_dev), ++ .vlan.proto = vlan_dev_vlan_proto(vlan_dev), ++ .vlan.prio = 0, ++ }; ++ int err; ++ ++ err = parse_tc_vlan_action(priv, &vlan_act, attr->esw_attr, ++ &attr->action, extack, NULL); ++ if (err) ++ return err; + +- if (is_vlan_dev(*out_dev)) +- err = mlx5e_tc_act_vlan_add_push_action(priv, attr, out_dev, extack); ++ rcu_read_lock(); ++ *out_dev = dev_get_by_index_rcu(dev_net(vlan_dev), ++ dev_get_iflink(vlan_dev)); ++ rcu_read_unlock(); ++ if (!*out_dev) ++ return -ENODEV; ++ } while (is_vlan_dev(*out_dev)); + +- return err; ++ return 0; + } + + int +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1437-net-mlx5-warn-when-write-combining-is-not-supported.patch b/SOURCES/1437-net-mlx5-warn-when-write-combining-is-not-supported.patch new file mode 100644 index 000000000..1caf641c1 --- /dev/null +++ b/SOURCES/1437-net-mlx5-warn-when-write-combining-is-not-supported.patch @@ -0,0 +1,44 @@ +From 93600079ef308e0d3ae39dc234c5eed7534aa941 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5: Warn when write combining is not supported + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d980f371b134d5d66d082161171a6be613975dfc +Author: Maor Gottlieb +Date: Wed Jul 9 00:16:26 2025 +0300 + + net/mlx5: Warn when write combining is not supported + + Warn if write combining is not supported, as it can impact latency. + Add the warning message to be printed only when the driver actually + run the test and detect unsupported state, rather than when + inheriting parent's result for SFs. + + Signed-off-by: Maor Gottlieb + Reviewed-by: Michael Guralnik + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1752009387-13300-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +index 740b719e7072..2f0316616fa4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +@@ -378,6 +378,9 @@ static void mlx5_core_test_wc(struct mlx5_core_dev *mdev) + mlx5_free_bfreg(mdev, &sq->bfreg); + err_alloc_bfreg: + kfree(sq); ++ ++ if (mdev->wc_state == MLX5_WC_STATE_UNSUPPORTED) ++ mlx5_core_warn(mdev, "Write combining is not supported\n"); + } + + bool mlx5_wc_support_get(struct mlx5_core_dev *mdev) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1438-net-mlx5e-rx-remove-unnecessary-rqt-redirects.patch b/SOURCES/1438-net-mlx5e-rx-remove-unnecessary-rqt-redirects.patch new file mode 100644 index 000000000..001c2df58 --- /dev/null +++ b/SOURCES/1438-net-mlx5e-rx-remove-unnecessary-rqt-redirects.patch @@ -0,0 +1,67 @@ +From f7c1631cf13fab0afa3883271bdca40e09aec5c1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5e: RX, Remove unnecessary RQT redirects + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a194be578376f7365db9bf1b8193c74546c86121 +Author: Tariq Toukan +Date: Wed Jul 9 00:16:27 2025 +0300 + + net/mlx5e: RX, Remove unnecessary RQT redirects + + RQTs (Receive Queue Table) should redirect traffic to the channels' RQs + when they're active. Otherwise, redirect to the designated "drop RQ". + + RQTs are created in "inactive" state, pointing to the "drop RQ". + In activate and de-activate flows, do not "deactivate" the rest of RQTs + (beyond the num of channels), as they are already inactive. + + This cuts down unnecessary execution of FW commands (MODIFY_RQT), and + improves the latency of open/close channels or configuration change. + + Perf: + NIC: Connect-X7. + Configuration: 1 combined channel, max num channels 248. + Measure time for "interface up + interface down". + + Before: 0.313 sec + After: 0.057 sec (5.5x faster) + + 247 MODIFY_RQT commands saved in interface up. + 247 MODIFY_RQT commands saved in interface down. + + Signed-off-by: Tariq Toukan + Reviewed-by: Dragos Tatulea + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1752009387-13300-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +index e5cce2df3649..a2acbfee2b77 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +@@ -571,8 +571,6 @@ void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_chann + + for (ix = 0; ix < nch; ix++) + mlx5e_rx_res_channel_activate_direct(res, chs, ix); +- for (ix = nch; ix < res->max_nch; ix++) +- mlx5e_rx_res_channel_deactivate_direct(res, ix); + + if (res->features & MLX5E_RX_RES_FEATURE_PTP) { + u32 rqn; +@@ -595,7 +593,7 @@ void mlx5e_rx_res_channels_deactivate(struct mlx5e_rx_res *res) + + mlx5e_rx_res_rss_disable(res); + +- for (ix = 0; ix < res->max_nch; ix++) ++ for (ix = 0; ix < res->rss_nch; ix++) + mlx5e_rx_res_channel_deactivate_direct(res, ix); + + if (res->features & MLX5E_RX_RES_FEATURE_PTP) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1439-net-mlx5-expose-hca-capability-bits-for-mkey-max-page-size.patch b/SOURCES/1439-net-mlx5-expose-hca-capability-bits-for-mkey-max-page-size.patch new file mode 100644 index 000000000..89a4e1583 --- /dev/null +++ b/SOURCES/1439-net-mlx5-expose-hca-capability-bits-for-mkey-max-page-size.patch @@ -0,0 +1,41 @@ +From 264ed9ba5b356121e93841f29b6daedae5653c16 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5: Expose HCA capability bits for mkey max page size + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 8feaf9832be52be16e588029366e27940f6b88ea +Author: Michael Guralnik +Date: Wed Jul 9 09:42:08 2025 +0300 + + net/mlx5: Expose HCA capability bits for mkey max page size + + Expose the HCA capability for maximal page size that can be configured + for an mkey. Used for enforcing capabilities when working with highly + contiguous memory and using large page sizes. + + Signed-off-by: Michael Guralnik + Link: https://patch.msgid.link/3e4d3fda37934430f65f72601519e22bf396fd05.1751979184.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 3ab683020fab..ebcd5b9b648e 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -2170,7 +2170,9 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { + u8 min_mkey_log_entity_size_fixed_buffer[0x5]; + u8 ec_vf_vport_base[0x10]; + +- u8 reserved_at_3a0[0xa]; ++ u8 reserved_at_3a0[0x2]; ++ u8 max_mkey_log_entity_size_fixed_buffer[0x6]; ++ u8 reserved_at_3a8[0x2]; + u8 max_mkey_log_entity_size_mtt[0x6]; + u8 max_rqt_vhca_id[0x10]; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1440-rdma-mlx5-fix-umr-modifying-of-mkey-page-size.patch b/SOURCES/1440-rdma-mlx5-fix-umr-modifying-of-mkey-page-size.patch new file mode 100644 index 000000000..55293fb3f --- /dev/null +++ b/SOURCES/1440-rdma-mlx5-fix-umr-modifying-of-mkey-page-size.patch @@ -0,0 +1,80 @@ +From 44bec6a44a04c6548a35a7f6ab4072e14adde57f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] RDMA/mlx5: Fix UMR modifying of mkey page size + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c4f96972c3c206ac8f6770b5ecd5320b561d0058 +Author: Edward Srouji +Date: Wed Jul 9 09:42:09 2025 +0300 + + RDMA/mlx5: Fix UMR modifying of mkey page size + + When changing the page size on an mkey, the driver needs to set the + appropriate bits in the mkey mask to indicate which fields are being + modified. + The 6th bit of a page size in mlx5 driver is considered an extension, + and this bit has a dedicated capability and mask bits. + + Previously, the driver was not setting this mask in the mkey mask when + performing page size changes, regardless of its hardware support, + potentially leading to an incorrect page size updates. + + This fixes the issue by setting the relevant bit in the mkey mask when + performing page size changes on an mkey and the 6th bit of this field is + supported by the hardware. + + Fixes: cef7dde8836a ("net/mlx5: Expand mkey page size to support 6 bits") + Signed-off-by: Edward Srouji + Reviewed-by: Michael Guralnik + Link: https://patch.msgid.link/9f43a9c73bf2db6085a99dc836f7137e76579f09.1751979184.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c +index 5be4426a2884..25601dea9e30 100644 +--- a/drivers/infiniband/hw/mlx5/umr.c ++++ b/drivers/infiniband/hw/mlx5/umr.c +@@ -32,13 +32,15 @@ static __be64 get_umr_disable_mr_mask(void) + return cpu_to_be64(result); + } + +-static __be64 get_umr_update_translation_mask(void) ++static __be64 get_umr_update_translation_mask(struct mlx5_ib_dev *dev) + { + u64 result; + + result = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_PAGE_SIZE | + MLX5_MKEY_MASK_START_ADDR; ++ if (MLX5_CAP_GEN_2(dev->mdev, umr_log_entity_size_5)) ++ result |= MLX5_MKEY_MASK_PAGE_SIZE_5; + + return cpu_to_be64(result); + } +@@ -654,7 +656,7 @@ static void mlx5r_umr_final_update_xlt(struct mlx5_ib_dev *dev, + flags & MLX5_IB_UPD_XLT_ENABLE || flags & MLX5_IB_UPD_XLT_ADDR; + + if (update_translation) { +- wqe->ctrl_seg.mkey_mask |= get_umr_update_translation_mask(); ++ wqe->ctrl_seg.mkey_mask |= get_umr_update_translation_mask(dev); + if (!mr->ibmr.length) + MLX5_SET(mkc, &wqe->mkey_seg, length64, 1); + } +diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h +index c9e0d0f437af..3b506482b4fa 100644 +--- a/include/linux/mlx5/device.h ++++ b/include/linux/mlx5/device.h +@@ -279,6 +279,7 @@ enum { + MLX5_MKEY_MASK_SMALL_FENCE = 1ull << 23, + MLX5_MKEY_MASK_RELAXED_ORDERING_WRITE = 1ull << 25, + MLX5_MKEY_MASK_FREE = 1ull << 29, ++ MLX5_MKEY_MASK_PAGE_SIZE_5 = 1ull << 42, + MLX5_MKEY_MASK_RELAXED_ORDERING_READ = 1ull << 47, + }; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1441-net-mlx5-expose-disciplined-fr-counter-through-hca-capabilit.patch b/SOURCES/1441-net-mlx5-expose-disciplined-fr-counter-through-hca-capabilit.patch new file mode 100644 index 000000000..fba761f39 --- /dev/null +++ b/SOURCES/1441-net-mlx5-expose-disciplined-fr-counter-through-hca-capabilit.patch @@ -0,0 +1,45 @@ +From c23046bf130b75046bc7db82c78d2fd9c1a640b7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5: Expose disciplined_fr_counter through HCA + capabilities in mlx5_ifc +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cbe080f931f48bc7b054008fc2567d1c8c247a89 +Author: Carolina Jubran +Date: Wed Jul 9 15:41:06 2025 +0300 + + net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc + + Introduce the `disciplined_fr_counter` capability bit to indicate that + the device’s free-running cycle counter is disciplined to real-time. + + Signed-off-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1752064867-16874-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index ebcd5b9b648e..818bc0e06bf2 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -1845,7 +1845,8 @@ struct mlx5_ifc_cmd_hca_cap_bits { + + u8 log_bf_reg_size[0x5]; + +- u8 reserved_at_270[0x3]; ++ u8 disciplined_fr_counter[0x1]; ++ u8 reserved_at_271[0x2]; + u8 qp_error_syndrome[0x1]; + u8 reserved_at_274[0x2]; + u8 lag_dct[0x2]; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1442-net-mlx5-ifc-updates-for-disabled-host-pf.patch b/SOURCES/1442-net-mlx5-ifc-updates-for-disabled-host-pf.patch new file mode 100644 index 000000000..2ded1012a --- /dev/null +++ b/SOURCES/1442-net-mlx5-ifc-updates-for-disabled-host-pf.patch @@ -0,0 +1,41 @@ +From 1d023170f39f56f036fcd497e7f39916968648fa Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5: IFC updates for disabled host PF + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cd1746cb6555a2238c4aae9f9d60b637a61bf177 +Author: Daniel Jurgens +Date: Wed Jul 9 15:41:07 2025 +0300 + + net/mlx5: IFC updates for disabled host PF + + The port 2 host PF can be disabled, this bit reflects that setting. + + Signed-off-by: Daniel Jurgens + Reviewed-by: William Tu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1752064867-16874-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 818bc0e06bf2..74b9f7c16346 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -12382,7 +12382,9 @@ struct mlx5_ifc_mtrc_ctrl_bits { + + struct mlx5_ifc_host_params_context_bits { + u8 host_number[0x8]; +- u8 reserved_at_8[0x7]; ++ u8 reserved_at_8[0x5]; ++ u8 host_pf_not_exist[0x1]; ++ u8 reserved_at_14[0x1]; + u8 host_pf_disabled[0x1]; + u8 host_num_of_vfs[0x10]; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1443-net-mlx5e-create-destroy-pcie-congestion-event-object.patch b/SOURCES/1443-net-mlx5e-create-destroy-pcie-congestion-event-object.patch new file mode 100644 index 000000000..8850a6318 --- /dev/null +++ b/SOURCES/1443-net-mlx5e-create-destroy-pcie-congestion-event-object.patch @@ -0,0 +1,271 @@ +From ccb319f355cbf175b471e454325000513fc581b1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:21 -0400 +Subject: [PATCH] net/mlx5e: Create/destroy PCIe Congestion Event object + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ab2b0d4d639435f382583b107997a4ce805a665b +Author: Dragos Tatulea +Date: Tue Jul 15 17:30:20 2025 +0300 + + net/mlx5e: Create/destroy PCIe Congestion Event object + + Add initial infrastructure to create and destroy the PCIe Congestion + Event object if the object is supported. + + The verb for the object creation function is "set" instead of + "create" because the function will accommodate the modify operation + as well in a subsequent patch. + + The next patches will hook it up to the event handler and will add + actual functionality. + + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1752589821-145787-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +index d292e6a9e22c..650df18a9216 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile ++++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +@@ -29,7 +29,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += en/rqt.o en/tir.o en/rss.o en/rx_res.o \ + en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \ + en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o en/ptp.o \ + en/qos.o en/htb.o en/trap.o en/fs_tt_redirect.o en/selq.o \ +- lib/crypto.o lib/sd.o ++ lib/crypto.o lib/sd.o en/pcie_cong_event.o + + # + # Netdev extra +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 866652575105..a4116dfb14f4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -920,6 +920,8 @@ struct mlx5e_priv { + struct notifier_block events_nb; + struct notifier_block blocking_events_nb; + ++ struct mlx5e_pcie_cong_event *cong_event; ++ + struct udp_tunnel_nic_info nic_info; + #ifdef CONFIG_MLX5_CORE_EN_DCB + struct mlx5e_dcbx dcbx; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +new file mode 100644 +index 000000000000..9595f8f9a94d +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +@@ -0,0 +1,140 @@ ++// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB ++// Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ++ ++#include "en.h" ++#include "pcie_cong_event.h" ++ ++struct mlx5e_pcie_cong_thresh { ++ u16 inbound_high; ++ u16 inbound_low; ++ u16 outbound_high; ++ u16 outbound_low; ++}; ++ ++struct mlx5e_pcie_cong_event { ++ u64 obj_id; ++ ++ struct mlx5e_priv *priv; ++}; ++ ++/* In units of 0.01 % */ ++static const struct mlx5e_pcie_cong_thresh default_thresh_config = { ++ .inbound_high = 9000, ++ .inbound_low = 7500, ++ .outbound_high = 9000, ++ .outbound_low = 7500, ++}; ++ ++static int ++mlx5_cmd_pcie_cong_event_set(struct mlx5_core_dev *dev, ++ const struct mlx5e_pcie_cong_thresh *config, ++ u64 *obj_id) ++{ ++ u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {}; ++ u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)]; ++ void *cong_obj; ++ void *hdr; ++ int err; ++ ++ hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr); ++ cong_obj = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, cong_obj); ++ ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode, ++ MLX5_CMD_OP_CREATE_GENERAL_OBJECT); ++ ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, ++ MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT); ++ ++ MLX5_SET(pcie_cong_event_obj, cong_obj, inbound_event_en, 1); ++ MLX5_SET(pcie_cong_event_obj, cong_obj, outbound_event_en, 1); ++ ++ MLX5_SET(pcie_cong_event_obj, cong_obj, ++ inbound_cong_high_threshold, config->inbound_high); ++ MLX5_SET(pcie_cong_event_obj, cong_obj, ++ inbound_cong_low_threshold, config->inbound_low); ++ ++ MLX5_SET(pcie_cong_event_obj, cong_obj, ++ outbound_cong_high_threshold, config->outbound_high); ++ MLX5_SET(pcie_cong_event_obj, cong_obj, ++ outbound_cong_low_threshold, config->outbound_low); ++ ++ err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); ++ if (err) ++ return err; ++ ++ *obj_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id); ++ ++ mlx5_core_dbg(dev, "PCIe congestion event (obj_id=%llu) created. Config: in: [%u, %u], out: [%u, %u]\n", ++ *obj_id, ++ config->inbound_high, config->inbound_low, ++ config->outbound_high, config->outbound_low); ++ ++ return 0; ++} ++ ++static int mlx5_cmd_pcie_cong_event_destroy(struct mlx5_core_dev *dev, ++ u64 obj_id) ++{ ++ u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {}; ++ u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)]; ++ void *hdr; ++ ++ hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr); ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode, ++ MLX5_CMD_OP_DESTROY_GENERAL_OBJECT); ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, ++ MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT); ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, obj_id); ++ ++ return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); ++} ++ ++int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) ++{ ++ struct mlx5e_pcie_cong_event *cong_event; ++ struct mlx5_core_dev *mdev = priv->mdev; ++ int err; ++ ++ if (!mlx5_pcie_cong_event_supported(mdev)) ++ return 0; ++ ++ cong_event = kvzalloc_node(sizeof(*cong_event), GFP_KERNEL, ++ mdev->priv.numa_node); ++ if (!cong_event) ++ return -ENOMEM; ++ ++ cong_event->priv = priv; ++ ++ err = mlx5_cmd_pcie_cong_event_set(mdev, &default_thresh_config, ++ &cong_event->obj_id); ++ if (err) { ++ mlx5_core_warn(mdev, "Error creating a PCIe congestion event object\n"); ++ goto err_free; ++ } ++ ++ priv->cong_event = cong_event; ++ ++ return 0; ++ ++err_free: ++ kvfree(cong_event); ++ ++ return err; ++} ++ ++void mlx5e_pcie_cong_event_cleanup(struct mlx5e_priv *priv) ++{ ++ struct mlx5e_pcie_cong_event *cong_event = priv->cong_event; ++ struct mlx5_core_dev *mdev = priv->mdev; ++ ++ if (!cong_event) ++ return; ++ ++ priv->cong_event = NULL; ++ ++ if (mlx5_cmd_pcie_cong_event_destroy(mdev, cong_event->obj_id)) ++ mlx5_core_warn(mdev, "Error destroying PCIe congestion event (obj_id=%llu)\n", ++ cong_event->obj_id); ++ ++ kvfree(cong_event); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h +new file mode 100644 +index 000000000000..b1ea46bf648a +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h +@@ -0,0 +1,10 @@ ++/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ ++/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. */ ++ ++#ifndef __MLX5_PCIE_CONG_EVENT_H__ ++#define __MLX5_PCIE_CONG_EVENT_H__ ++ ++int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv); ++void mlx5e_pcie_cong_event_cleanup(struct mlx5e_priv *priv); ++ ++#endif /* __MLX5_PCIE_CONG_EVENT_H__ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 0396ec0928d5..5503882839b8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -76,6 +76,7 @@ + #include "en/trap.h" + #include "lib/devcom.h" + #include "lib/sd.h" ++#include "en/pcie_cong_event.h" + + static bool mlx5e_hw_gro_supported(struct mlx5_core_dev *mdev) + { +@@ -5972,6 +5973,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv) + if (mlx5e_monitor_counter_supported(priv)) + mlx5e_monitor_counter_init(priv); + ++ mlx5e_pcie_cong_event_init(priv); + mlx5e_hv_vhca_stats_create(priv); + if (netdev->reg_state != NETREG_REGISTERED) + return; +@@ -6002,6 +6004,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv) + + mlx5e_nic_set_rx_mode(priv); + ++ mlx5e_pcie_cong_event_cleanup(priv); + mlx5e_hv_vhca_stats_destroy(priv); + if (mlx5e_monitor_counter_supported(priv)) + mlx5e_monitor_counter_cleanup(priv); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index 2e02bdea8361..c518380c4ce7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -495,4 +495,17 @@ static inline int mlx5_max_eq_cap_get(const struct mlx5_core_dev *dev) + + return 1 << MLX5_CAP_GEN(dev, log_max_eq); + } ++ ++static inline bool mlx5_pcie_cong_event_supported(struct mlx5_core_dev *dev) ++{ ++ u64 features = MLX5_CAP_GEN_2_64(dev, general_obj_types_127_64); ++ ++ if (!(features & MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT)) ++ return false; ++ ++ if (dev->sd) ++ return false; ++ ++ return true; ++} + #endif /* __MLX5_CORE_H__ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1444-net-mlx5e-add-device-pcie-congestion-ethtool-stats.patch b/SOURCES/1444-net-mlx5e-add-device-pcie-congestion-ethtool-stats.patch new file mode 100644 index 000000000..f90122146 --- /dev/null +++ b/SOURCES/1444-net-mlx5e-add-device-pcie-congestion-ethtool-stats.patch @@ -0,0 +1,360 @@ +From cd788818a76e7ddc52d8ae6184196774d92e14cd Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:22 -0400 +Subject: [PATCH] net/mlx5e: Add device PCIe congestion ethtool stats + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 8890ee6dcf6e3d396677308553a8a57f29c7109e +Author: Dragos Tatulea +Date: Tue Jul 15 17:30:21 2025 +0300 + + net/mlx5e: Add device PCIe congestion ethtool stats + + Implement the PCIe Congestion Event notifier which triggers a work item + to query the PCIe Congestion Event object. The result of the congestion + state is reflected in the new ethtool stats: + + * pci_bw_inbound_high: the device has crossed the high threshold for + inbound PCIe traffic. + * pci_bw_inbound_low: the device has crossed the low threshold for + inbound PCIe traffic + * pci_bw_outbound_high: the device has crossed the high threshold for + outbound PCIe traffic. + * pci_bw_outbound_low: the device has crossed the low threshold for + outbound PCIe traffic + + The high and low thresholds are currently configured at 90% and 75%. + These are hysteresis thresholds which help to check if the + PCI bus on the device side is in a congested state. + + If low + 1 = high then the device is in a congested state. If low == high + then the device is not in a congested state. + + The counters are also documented. + + A follow-up patch will make the thresholds configurable. + + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1752589821-145787-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +index 43d72c8b713b..754c81436408 100644 +--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst ++++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +@@ -1341,3 +1341,35 @@ Device Counters + - The number of times the device owned queue had not enough buffers + allocated. + - Error ++ ++ * - `pci_bw_inbound_high` ++ - The number of times the device crossed the high inbound pcie bandwidth ++ threshold. To be compared to pci_bw_inbound_low to check if the device ++ is in a congested state. ++ If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested. ++ If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested. ++ - Tnformative ++ ++ * - `pci_bw_inbound_low` ++ - The number of times the device crossed the low inbound PCIe bandwidth ++ threshold. To be compared to pci_bw_inbound_high to check if the device ++ is in a congested state. ++ If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested. ++ If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested. ++ - Informative ++ ++ * - `pci_bw_outbound_high` ++ - The number of times the device crossed the high outbound pcie bandwidth ++ threshold. To be compared to pci_bw_outbound_low to check if the device ++ is in a congested state. ++ If pci_bw_outbound_high == pci_bw_outbound_low then the device is not congested. ++ If pci_bw_outbound_high > pci_bw_outbound_low then the device is congested. ++ - Informative ++ ++ * - `pci_bw_outbound_low` ++ - The number of times the device crossed the low outbound PCIe bandwidth ++ threshold. To be compared to pci_bw_outbound_high to check if the device ++ is in a congested state. ++ If pci_bw_outbound_high == pci_bw_outbound_low then the device is not congested. ++ If pci_bw_outbound_high > pci_bw_outbound_low then the device is congested. ++ - Informative +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +index 9595f8f9a94d..0ed017569a19 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +@@ -4,6 +4,13 @@ + #include "en.h" + #include "pcie_cong_event.h" + ++#define MLX5E_CONG_HIGH_STATE 0x7 ++ ++enum { ++ MLX5E_INBOUND_CONG = BIT(0), ++ MLX5E_OUTBOUND_CONG = BIT(1), ++}; ++ + struct mlx5e_pcie_cong_thresh { + u16 inbound_high; + u16 inbound_low; +@@ -11,10 +18,27 @@ struct mlx5e_pcie_cong_thresh { + u16 outbound_low; + }; + ++struct mlx5e_pcie_cong_stats { ++ u32 pci_bw_inbound_high; ++ u32 pci_bw_inbound_low; ++ u32 pci_bw_outbound_high; ++ u32 pci_bw_outbound_low; ++}; ++ + struct mlx5e_pcie_cong_event { + u64 obj_id; + + struct mlx5e_priv *priv; ++ ++ /* For event notifier and workqueue. */ ++ struct work_struct work; ++ struct mlx5_nb nb; ++ ++ /* Stores last read state. */ ++ u8 state; ++ ++ /* For ethtool stats group. */ ++ struct mlx5e_pcie_cong_stats stats; + }; + + /* In units of 0.01 % */ +@@ -25,6 +49,51 @@ static const struct mlx5e_pcie_cong_thresh default_thresh_config = { + .outbound_low = 7500, + }; + ++static const struct counter_desc mlx5e_pcie_cong_stats_desc[] = { ++ { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, ++ pci_bw_inbound_high) }, ++ { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, ++ pci_bw_inbound_low) }, ++ { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, ++ pci_bw_outbound_high) }, ++ { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, ++ pci_bw_outbound_low) }, ++}; ++ ++#define NUM_PCIE_CONG_COUNTERS ARRAY_SIZE(mlx5e_pcie_cong_stats_desc) ++ ++static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(pcie_cong) ++{ ++ return priv->cong_event ? NUM_PCIE_CONG_COUNTERS : 0; ++} ++ ++static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(pcie_cong) {} ++ ++static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(pcie_cong) ++{ ++ if (!priv->cong_event) ++ return; ++ ++ for (int i = 0; i < NUM_PCIE_CONG_COUNTERS; i++) ++ ethtool_puts(data, mlx5e_pcie_cong_stats_desc[i].format); ++} ++ ++static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(pcie_cong) ++{ ++ if (!priv->cong_event) ++ return; ++ ++ for (int i = 0; i < NUM_PCIE_CONG_COUNTERS; i++) { ++ u32 ctr = MLX5E_READ_CTR32_CPU(&priv->cong_event->stats, ++ mlx5e_pcie_cong_stats_desc, ++ i); ++ ++ mlx5e_ethtool_put_stat(data, ctr); ++ } ++} ++ ++MLX5E_DEFINE_STATS_GRP(pcie_cong, 0); ++ + static int + mlx5_cmd_pcie_cong_event_set(struct mlx5_core_dev *dev, + const struct mlx5e_pcie_cong_thresh *config, +@@ -89,6 +158,97 @@ static int mlx5_cmd_pcie_cong_event_destroy(struct mlx5_core_dev *dev, + return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); + } + ++static int mlx5_cmd_pcie_cong_event_query(struct mlx5_core_dev *dev, ++ u64 obj_id, ++ u32 *state) ++{ ++ u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {}; ++ u32 out[MLX5_ST_SZ_DW(pcie_cong_event_cmd_out)]; ++ void *obj; ++ void *hdr; ++ u8 cong; ++ int err; ++ ++ hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr); ++ ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode, ++ MLX5_CMD_OP_QUERY_GENERAL_OBJECT); ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, ++ MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT); ++ MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, obj_id); ++ ++ err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); ++ if (err) ++ return err; ++ ++ obj = MLX5_ADDR_OF(pcie_cong_event_cmd_out, out, cong_obj); ++ ++ if (state) { ++ cong = MLX5_GET(pcie_cong_event_obj, obj, inbound_cong_state); ++ if (cong == MLX5E_CONG_HIGH_STATE) ++ *state |= MLX5E_INBOUND_CONG; ++ ++ cong = MLX5_GET(pcie_cong_event_obj, obj, outbound_cong_state); ++ if (cong == MLX5E_CONG_HIGH_STATE) ++ *state |= MLX5E_OUTBOUND_CONG; ++ } ++ ++ return 0; ++} ++ ++static void mlx5e_pcie_cong_event_work(struct work_struct *work) ++{ ++ struct mlx5e_pcie_cong_event *cong_event; ++ struct mlx5_core_dev *dev; ++ struct mlx5e_priv *priv; ++ u32 new_cong_state = 0; ++ u32 changes; ++ int err; ++ ++ cong_event = container_of(work, struct mlx5e_pcie_cong_event, work); ++ priv = cong_event->priv; ++ dev = priv->mdev; ++ ++ err = mlx5_cmd_pcie_cong_event_query(dev, cong_event->obj_id, ++ &new_cong_state); ++ if (err) { ++ mlx5_core_warn(dev, "Error %d when querying PCIe cong event object (obj_id=%llu).\n", ++ err, cong_event->obj_id); ++ return; ++ } ++ ++ changes = cong_event->state ^ new_cong_state; ++ if (!changes) ++ return; ++ ++ cong_event->state = new_cong_state; ++ ++ if (changes & MLX5E_INBOUND_CONG) { ++ if (new_cong_state & MLX5E_INBOUND_CONG) ++ cong_event->stats.pci_bw_inbound_high++; ++ else ++ cong_event->stats.pci_bw_inbound_low++; ++ } ++ ++ if (changes & MLX5E_OUTBOUND_CONG) { ++ if (new_cong_state & MLX5E_OUTBOUND_CONG) ++ cong_event->stats.pci_bw_outbound_high++; ++ else ++ cong_event->stats.pci_bw_outbound_low++; ++ } ++} ++ ++static int mlx5e_pcie_cong_event_handler(struct notifier_block *nb, ++ unsigned long event, void *eqe) ++{ ++ struct mlx5e_pcie_cong_event *cong_event; ++ ++ cong_event = mlx5_nb_cof(nb, struct mlx5e_pcie_cong_event, nb); ++ queue_work(cong_event->priv->wq, &cong_event->work); ++ ++ return NOTIFY_OK; ++} ++ + int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) + { + struct mlx5e_pcie_cong_event *cong_event; +@@ -103,6 +263,10 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) + if (!cong_event) + return -ENOMEM; + ++ INIT_WORK(&cong_event->work, mlx5e_pcie_cong_event_work); ++ MLX5_NB_INIT(&cong_event->nb, mlx5e_pcie_cong_event_handler, ++ OBJECT_CHANGE); ++ + cong_event->priv = priv; + + err = mlx5_cmd_pcie_cong_event_set(mdev, &default_thresh_config, +@@ -112,10 +276,18 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) + goto err_free; + } + ++ err = mlx5_eq_notifier_register(mdev, &cong_event->nb); ++ if (err) { ++ mlx5_core_warn(mdev, "Error registering notifier for the PCIe congestion event\n"); ++ goto err_obj_destroy; ++ } ++ + priv->cong_event = cong_event; + + return 0; + ++err_obj_destroy: ++ mlx5_cmd_pcie_cong_event_destroy(mdev, cong_event->obj_id); + err_free: + kvfree(cong_event); + +@@ -132,6 +304,9 @@ void mlx5e_pcie_cong_event_cleanup(struct mlx5e_priv *priv) + + priv->cong_event = NULL; + ++ mlx5_eq_notifier_unregister(mdev, &cong_event->nb); ++ cancel_work_sync(&cong_event->work); ++ + if (mlx5_cmd_pcie_cong_event_destroy(mdev, cong_event->obj_id)) + mlx5_core_warn(mdev, "Error destroying PCIe congestion event (obj_id=%llu)\n", + cong_event->obj_id); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +index 19664fa7f217..87536f158d07 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +@@ -2612,6 +2612,7 @@ mlx5e_stats_grp_t mlx5e_nic_stats_grps[] = { + #ifdef CONFIG_MLX5_MACSEC + &MLX5E_STATS_GRP(macsec_hw), + #endif ++ &MLX5E_STATS_GRP(pcie_cong), + }; + + unsigned int mlx5e_nic_stats_grps_num(struct mlx5e_priv *priv) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +index def5dea1463d..72dbcc1928ef 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +@@ -535,5 +535,6 @@ extern MLX5E_DECLARE_STATS_GRP(ipsec_hw); + extern MLX5E_DECLARE_STATS_GRP(ipsec_sw); + extern MLX5E_DECLARE_STATS_GRP(ptp); + extern MLX5E_DECLARE_STATS_GRP(macsec_hw); ++extern MLX5E_DECLARE_STATS_GRP(pcie_cong); + + #endif /* __MLX5_EN_STATS_H__ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +index dfb079e59d85..66dce17219a6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +@@ -585,6 +585,9 @@ static void gather_async_events_mask(struct mlx5_core_dev *dev, u64 mask[4]) + async_event_mask |= + (1ull << MLX5_EVENT_TYPE_OBJECT_CHANGE); + ++ if (mlx5_pcie_cong_event_supported(dev)) ++ async_event_mask |= (1ull << MLX5_EVENT_TYPE_OBJECT_CHANGE); ++ + mask[0] = async_event_mask; + + if (MLX5_CAP_GEN(dev, event_cap)) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1445-net-mlx5-fix-an-is-err-vs-null-bug-in-esw-qos-move-node.patch b/SOURCES/1445-net-mlx5-fix-an-is-err-vs-null-bug-in-esw-qos-move-node.patch new file mode 100644 index 000000000..966233f24 --- /dev/null +++ b/SOURCES/1445-net-mlx5-fix-an-is-err-vs-null-bug-in-esw-qos-move-node.patch @@ -0,0 +1,44 @@ +From 3896bc463eaccebeeb68c34cf97221d49c43f0b3 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:22 -0400 +Subject: [PATCH] net/mlx5: Fix an IS_ERR() vs NULL bug in esw_qos_move_node() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 49be1e245ea3e3515c5989ce1af215d8500dec85 +Author: Dan Carpenter +Date: Tue Jul 15 18:01:30 2025 -0500 + + net/mlx5: Fix an IS_ERR() vs NULL bug in esw_qos_move_node() + + The __esw_qos_alloc_node() function returns NULL on error. It doesn't + return error pointers. Update the error checking to match. + + Fixes: 96619c485fa6 ("net/mlx5: Add support for setting tc-bw on nodes") + Signed-off-by: Dan Carpenter + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/0ce4ec2a-2b5d-4652-9638-e715a99902a7@sabinyo.mountain + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index e1cef8dd3b4d..91d863c8c152 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -1405,9 +1405,10 @@ esw_qos_move_node(struct mlx5_esw_sched_node *curr_node) + + new_node = __esw_qos_alloc_node(curr_node->esw, curr_node->ix, + curr_node->type, NULL); +- if (!IS_ERR(new_node)) +- esw_qos_nodes_set_parent(&curr_node->children, new_node); ++ if (!new_node) ++ return ERR_PTR(-ENOMEM); + ++ esw_qos_nodes_set_parent(&curr_node->children, new_node); + return new_node; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1446-net-mlx5-hws-enable-ipsec-hardware-offload-in-legacy-mode.patch b/SOURCES/1446-net-mlx5-hws-enable-ipsec-hardware-offload-in-legacy-mode.patch new file mode 100644 index 000000000..1bb00c3a7 --- /dev/null +++ b/SOURCES/1446-net-mlx5-hws-enable-ipsec-hardware-offload-in-legacy-mode.patch @@ -0,0 +1,46 @@ +From 714f88650e67c8c680525cd46b6a1ba36654fce9 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:22 -0400 +Subject: [PATCH] net/mlx5: HWS, Enable IPSec hardware offload in legacy mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 159846ffbaf5e723b4eafdf6f951028cfda61601 +Author: Lama Kayal +Date: Wed Jul 16 17:17:47 2025 +0300 + + net/mlx5: HWS, Enable IPSec hardware offload in legacy mode + + IPSec hardware offload in legacy mode should not be affected by the + steering mode, hence it should also work properly with hmfs mode. + + Remove steering mode validation when calculating the cap for packet + offload, this will also enable the missing cap MLX5_IPSEC_CAP_PRIO + needed for crypto offload. + + Signed-off-by: Lama Kayal + Reviewed-by: Jianbo Liu + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Link: https://patch.msgid.link/1752675472-201445-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c +index 820debf3fbbf..ef7322d381af 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c +@@ -42,8 +42,7 @@ u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev) + + if (MLX5_CAP_IPSEC(mdev, ipsec_full_offload) && + (mdev->priv.steering->mode == MLX5_FLOW_STEERING_MODE_DMFS || +- (mdev->priv.steering->mode == MLX5_FLOW_STEERING_MODE_SMFS && +- is_mdev_legacy_mode(mdev)))) { ++ is_mdev_legacy_mode(mdev))) { + if (MLX5_CAP_FLOWTABLE_NIC_TX(mdev, + reformat_add_esp_trasport) && + MLX5_CAP_FLOWTABLE_NIC_RX(mdev, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1447-net-mlx5e-fix-kdoc-warning-on-eswitch-h.patch b/SOURCES/1447-net-mlx5e-fix-kdoc-warning-on-eswitch-h.patch new file mode 100644 index 000000000..22307dcab --- /dev/null +++ b/SOURCES/1447-net-mlx5e-fix-kdoc-warning-on-eswitch-h.patch @@ -0,0 +1,43 @@ +From 97e26aa456907b095f3c1ee54b720ccca2bac2e4 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:22 -0400 +Subject: [PATCH] net/mlx5e: fix kdoc warning on eswitch.h + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 394d31d52fb64f622f51a461a4d7fc0c683a980f +Author: Moshe Shemesh +Date: Wed Jul 16 17:17:48 2025 +0300 + + net/mlx5e: fix kdoc warning on eswitch.h + + Fix the following kdoc warning: + git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/mlx5/core/ |\ + xargs scripts/kernel-doc --none + drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:824: warning: cannot + understand function prototype: 'struct mlx5_esw_event_info ' + + Signed-off-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Link: https://patch.msgid.link/1752675472-201445-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index d59fdcb29cb8..b0b8ef3ec3c4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -827,7 +827,7 @@ void mlx5_esw_vport_vhca_id_clear(struct mlx5_eswitch *esw, u16 vport_num); + int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); + + /** +- * mlx5_esw_event_info - Indicates eswitch mode changed/changing. ++ * struct mlx5_esw_event_info - Indicates eswitch mode changed/changing. + * + * @new_mode: New mode of eswitch. + */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1448-net-mlx5e-properly-access-rcu-protected-qdisc-sleeping-varia.patch b/SOURCES/1448-net-mlx5e-properly-access-rcu-protected-qdisc-sleeping-varia.patch new file mode 100644 index 000000000..718adad51 --- /dev/null +++ b/SOURCES/1448-net-mlx5e-properly-access-rcu-protected-qdisc-sleeping-varia.patch @@ -0,0 +1,50 @@ +From e124614f134e8ae058d0a4bbd972e283f8f6a3cb Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:03:22 -0400 +Subject: [PATCH] net/mlx5e: Properly access RCU protected qdisc_sleeping + variable + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2a601b2d35623065d31ebaf697b07502d54878c9 +Author: Leon Romanovsky +Date: Wed Jul 16 17:17:49 2025 +0300 + + net/mlx5e: Properly access RCU protected qdisc_sleeping variable + + qdisc_sleeping variable is declared as "struct Qdisc __rcu" and + as such needs proper annotation while accessing it. + + Without rtnl_dereference(), the following error is generated by sparse: + + drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning: + incorrect type in initializer (different address spaces) + drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: expected + struct Qdisc *qdisc + drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: got struct + Qdisc [noderef] __rcu *qdisc_sleeping + + Signed-off-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Link: https://patch.msgid.link/1752675472-201445-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c +index f0744a45db92..4e461cb03b83 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c +@@ -374,7 +374,7 @@ void mlx5e_reactivate_qos_sq(struct mlx5e_priv *priv, u16 qid, struct netdev_que + void mlx5e_reset_qdisc(struct net_device *dev, u16 qid) + { + struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, qid); +- struct Qdisc *qdisc = dev_queue->qdisc_sleeping; ++ struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc_sleeping); + + if (!qdisc) + return; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1449-net-mlx5-add-ifc-bits-to-support-rss-for-ipsec-offload.patch b/SOURCES/1449-net-mlx5-add-ifc-bits-to-support-rss-for-ipsec-offload.patch new file mode 100644 index 000000000..beed06c45 --- /dev/null +++ b/SOURCES/1449-net-mlx5-add-ifc-bits-to-support-rss-for-ipsec-offload.patch @@ -0,0 +1,140 @@ +From 5b55319dbd507f05d71a112154a2cfb085f0e6e3 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:04:04 -0400 +Subject: [PATCH] net/mlx5: Add IFC bits to support RSS for IPSec offload + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 438794e93f6271af93f0d16a1851725115b5fd51 +Author: Jianbo Liu +Date: Thu Jul 17 09:48:13 2025 +0300 + + net/mlx5: Add IFC bits to support RSS for IPSec offload + + This adds the capabilities, ipsec_next_header and inner/outer + l4_type_ext fields to support RSS for the decrypted packets. + + These fields are specifically for firmware steering. HWS validation + logic is updated to correctly handle the changes, ensuring the + unsupported fields are not set. + + Besides, reserved_at_c4 is fixed to reserved_at_d4 to reflect the + accurate offset within the structure. + + Signed-off-by: Jianbo Liu + Reviewed-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1752734895-257735-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +index d45e1145d197..c6436c3a7a83 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +@@ -727,8 +727,9 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd, + u32 *s_ipv6, *d_ipv6; + + if (HWS_IS_FLD_SET_SZ(match_param, outer_headers.l4_type, 0x2) || +- HWS_IS_FLD_SET_SZ(match_param, outer_headers.reserved_at_c2, 0xe) || +- HWS_IS_FLD_SET_SZ(match_param, outer_headers.reserved_at_c4, 0x4)) { ++ HWS_IS_FLD_SET_SZ(match_param, outer_headers.l4_type_ext, 0x4) || ++ HWS_IS_FLD_SET_SZ(match_param, outer_headers.reserved_at_c6, 0xa) || ++ HWS_IS_FLD_SET_SZ(match_param, outer_headers.reserved_at_d4, 0x4)) { + mlx5hws_err(cd->ctx, "Unsupported outer parameters set\n"); + return -EINVAL; + } +@@ -903,8 +904,9 @@ hws_definer_conv_inner(struct mlx5hws_definer_conv_data *cd, + u32 *s_ipv6, *d_ipv6; + + if (HWS_IS_FLD_SET_SZ(match_param, inner_headers.l4_type, 0x2) || +- HWS_IS_FLD_SET_SZ(match_param, inner_headers.reserved_at_c2, 0xe) || +- HWS_IS_FLD_SET_SZ(match_param, inner_headers.reserved_at_c4, 0x4)) { ++ HWS_IS_FLD_SET_SZ(match_param, inner_headers.l4_type_ext, 0x4) || ++ HWS_IS_FLD_SET_SZ(match_param, inner_headers.reserved_at_c6, 0xa) || ++ HWS_IS_FLD_SET_SZ(match_param, inner_headers.reserved_at_d4, 0x4)) { + mlx5hws_err(cd->ctx, "Unsupported inner parameters set\n"); + return -EINVAL; + } +@@ -1279,7 +1281,8 @@ hws_definer_conv_misc2(struct mlx5hws_definer_conv_data *cd, + struct mlx5hws_definer_fc *curr_fc; + + if (HWS_IS_FLD_SET_SZ(match_param, misc_parameters_2.reserved_at_1a0, 0x8) || +- HWS_IS_FLD_SET_SZ(match_param, misc_parameters_2.reserved_at_1b8, 0x8) || ++ HWS_IS_FLD_SET_SZ(match_param, ++ misc_parameters_2.ipsec_next_header, 0x8) || + HWS_IS_FLD_SET_SZ(match_param, misc_parameters_2.reserved_at_1c0, 0x40) || + HWS_IS_FLD_SET(match_param, misc_parameters_2.macsec_syndrome) || + HWS_IS_FLD_SET(match_param, misc_parameters_2.ipsec_syndrome)) { +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 74b9f7c16346..f6b8a4bfab26 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -420,7 +420,8 @@ struct mlx5_ifc_flow_table_fields_supported_bits { + + /* Table 2170 - Flow Table Fields Supported 2 Format */ + struct mlx5_ifc_flow_table_fields_supported_2_bits { +- u8 reserved_at_0[0x2]; ++ u8 inner_l4_type_ext[0x1]; ++ u8 outer_l4_type_ext[0x1]; + u8 inner_l4_type[0x1]; + u8 outer_l4_type[0x1]; + u8 reserved_at_4[0xa]; +@@ -429,7 +430,11 @@ struct mlx5_ifc_flow_table_fields_supported_2_bits { + u8 tunnel_header_0_1[0x1]; + u8 reserved_at_11[0xf]; + +- u8 reserved_at_20[0x60]; ++ u8 reserved_at_20[0xf]; ++ u8 ipsec_next_header[0x1]; ++ u8 reserved_at_30[0x10]; ++ ++ u8 reserved_at_40[0x40]; + }; + + struct mlx5_ifc_flow_table_prop_layout_bits { +@@ -552,6 +557,13 @@ enum { + MLX5_PACKET_L4_TYPE_UDP, + }; + ++enum { ++ MLX5_PACKET_L4_TYPE_EXT_NONE, ++ MLX5_PACKET_L4_TYPE_EXT_TCP, ++ MLX5_PACKET_L4_TYPE_EXT_UDP, ++ MLX5_PACKET_L4_TYPE_EXT_ICMP, ++}; ++ + struct mlx5_ifc_fte_match_set_lyr_2_4_bits { + u8 smac_47_16[0x20]; + +@@ -578,10 +590,10 @@ struct mlx5_ifc_fte_match_set_lyr_2_4_bits { + u8 tcp_dport[0x10]; + + u8 l4_type[0x2]; +- u8 reserved_at_c2[0xe]; ++ u8 l4_type_ext[0x4]; ++ u8 reserved_at_c6[0xa]; + u8 ipv4_ihl[0x4]; +- u8 reserved_at_c4[0x4]; +- ++ u8 reserved_at_d4[0x4]; + u8 ttl_hoplimit[0x8]; + + u8 udp_sport[0x10]; +@@ -689,10 +701,9 @@ struct mlx5_ifc_fte_match_set_misc2_bits { + u8 metadata_reg_a[0x20]; + + u8 reserved_at_1a0[0x8]; +- + u8 macsec_syndrome[0x8]; + u8 ipsec_syndrome[0x8]; +- u8 reserved_at_1b8[0x8]; ++ u8 ipsec_next_header[0x8]; + + u8 reserved_at_1c0[0x40]; + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1450-net-mlx5-add-ifc-bits-and-enums-for-buf-ownership.patch b/SOURCES/1450-net-mlx5-add-ifc-bits-and-enums-for-buf-ownership.patch new file mode 100644 index 000000000..f5189e32f --- /dev/null +++ b/SOURCES/1450-net-mlx5-add-ifc-bits-and-enums-for-buf-ownership.patch @@ -0,0 +1,60 @@ +From a34de299819925d89bfda50f13e278d49eabe506 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:04:04 -0400 +Subject: [PATCH] net/mlx5: Add IFC bits and enums for buf_ownership + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6f09ee0b583cad4f2b6a82842c26235bee3d5c2e +Author: Oren Sidi +Date: Thu Jul 17 09:48:14 2025 +0300 + + net/mlx5: Add IFC bits and enums for buf_ownership + + Extend structure layouts and defines buf_ownership. + buf_ownership indicates whether the buffer is managed by SW or FW. + + Signed-off-by: Oren Sidi + Reviewed-by: Alex Lazar + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1752734895-257735-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index f6b8a4bfab26..0bb817b8c697 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -10473,8 +10473,16 @@ struct mlx5_ifc_pifr_reg_bits { + u8 port_filter_update_en[8][0x20]; + }; + ++enum { ++ MLX5_BUF_OWNERSHIP_UNKNOWN = 0x0, ++ MLX5_BUF_OWNERSHIP_FW_OWNED = 0x1, ++ MLX5_BUF_OWNERSHIP_SW_OWNED = 0x2, ++}; ++ + struct mlx5_ifc_pfcc_reg_bits { +- u8 reserved_at_0[0x8]; ++ u8 reserved_at_0[0x4]; ++ u8 buf_ownership[0x2]; ++ u8 reserved_at_6[0x2]; + u8 local_port[0x8]; + u8 reserved_at_10[0xb]; + u8 ppan_mask_n[0x1]; +@@ -10610,7 +10618,9 @@ struct mlx5_ifc_pcam_enhanced_features_bits { + u8 fec_200G_per_lane_in_pplm[0x1]; + u8 reserved_at_1e[0x2a]; + u8 fec_100G_per_lane_in_pplm[0x1]; +- u8 reserved_at_49[0x1f]; ++ u8 reserved_at_49[0xa]; ++ u8 buffer_ownership[0x1]; ++ u8 resereved_at_54[0x14]; + u8 fec_50G_per_lane_in_pplm[0x1]; + u8 reserved_at_69[0x4]; + u8 rx_icrc_encapsulated_counter[0x1]; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1451-net-mlx5-expose-cable-length-field-in-pfcc-register.patch b/SOURCES/1451-net-mlx5-expose-cable-length-field-in-pfcc-register.patch new file mode 100644 index 000000000..fca44f5af --- /dev/null +++ b/SOURCES/1451-net-mlx5-expose-cable-length-field-in-pfcc-register.patch @@ -0,0 +1,89 @@ +From 028519c5490c7b257c0f4e93a467c313351e5602 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:04:04 -0400 +Subject: [PATCH] net/mlx5: Expose cable_length field in PFCC register + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9a0048e0ae14cb7babfd459ec920234e8a2ab86e +Author: Oren Sidi +Date: Thu Jul 17 09:48:15 2025 +0300 + + net/mlx5: Expose cable_length field in PFCC register + + Introduce new "cable_length" field in PFCC register and related fields + to enhance rx buffer configuration management: + 1. cable_length: Shifts cable length handling to fw by storing a + manually entered length from user in PFCC.cable_length + 2. lane_rate_oper: In a case where PFCC.cable_length is not supported, + helps compute a default cable length + + Signed-off-by: Oren Sidi + Reviewed-by: Alex Lazar + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1752734895-257735-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 0bb817b8c697..21ed80a892a7 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -9993,6 +9993,10 @@ struct mlx5_ifc_pude_reg_bits { + u8 reserved_at_20[0x60]; + }; + ++enum { ++ MLX5_PTYS_CONNECTOR_TYPE_PORT_DA = 0x7, ++}; ++ + struct mlx5_ifc_ptys_reg_bits { + u8 reserved_at_0[0x1]; + u8 an_disable_admin[0x1]; +@@ -10029,7 +10033,8 @@ struct mlx5_ifc_ptys_reg_bits { + u8 ib_link_width_oper[0x10]; + u8 ib_proto_oper[0x10]; + +- u8 reserved_at_160[0x1c]; ++ u8 reserved_at_160[0x8]; ++ u8 lane_rate_oper[0x14]; + u8 connector_type[0x4]; + + u8 eth_proto_lp_advertise[0x20]; +@@ -10484,7 +10489,8 @@ struct mlx5_ifc_pfcc_reg_bits { + u8 buf_ownership[0x2]; + u8 reserved_at_6[0x2]; + u8 local_port[0x8]; +- u8 reserved_at_10[0xb]; ++ u8 reserved_at_10[0xa]; ++ u8 cable_length_mask[0x1]; + u8 ppan_mask_n[0x1]; + u8 minor_stall_mask[0x1]; + u8 critical_stall_mask[0x1]; +@@ -10513,7 +10519,10 @@ struct mlx5_ifc_pfcc_reg_bits { + u8 device_stall_minor_watermark[0x10]; + u8 device_stall_critical_watermark[0x10]; + +- u8 reserved_at_a0[0x60]; ++ u8 reserved_at_a0[0x18]; ++ u8 cable_length[0x8]; ++ ++ u8 reserved_at_c0[0x40]; + }; + + struct mlx5_ifc_pelc_reg_bits { +@@ -10614,7 +10623,9 @@ struct mlx5_ifc_mtutc_reg_bits { + struct mlx5_ifc_pcam_enhanced_features_bits { + u8 reserved_at_0[0x10]; + u8 ppcnt_recovery_counters[0x1]; +- u8 reserved_at_11[0xc]; ++ u8 reserved_at_11[0x7]; ++ u8 cable_length[0x1]; ++ u8 reserved_at_19[0x4]; + u8 fec_200G_per_lane_in_pplm[0x1]; + u8 reserved_at_1e[0x2a]; + u8 fec_100G_per_lane_in_pplm[0x1]; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1452-net-mlx5e-shampo-cleanup-reservation-size-formula.patch b/SOURCES/1452-net-mlx5e-shampo-cleanup-reservation-size-formula.patch new file mode 100644 index 000000000..a6bcf5dcb --- /dev/null +++ b/SOURCES/1452-net-mlx5e-shampo-cleanup-reservation-size-formula.patch @@ -0,0 +1,145 @@ +From 543d15cff83bf56fc1377b9d635e9e88d981de46 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:04:04 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO, Cleanup reservation size formula + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit bc2d44b83f2b333719560740068663a2b405deaf +Author: Lama Kayal +Date: Mon Jul 21 10:13:17 2025 +0300 + + net/mlx5e: SHAMPO, Cleanup reservation size formula + + The reservation size formula can be reduced to a simple evaluation of + MLX5E_SHAMPO_WQ_RESRV_SIZE. This leaves mlx5e_shampo_get_log_rsrv_size() + with one single use, which can be replaced with a macro for simplicity. + + Also, function mlx5e_shampo_get_log_rsrv_size() is used only throughout + params.c, make it static. + + Signed-off-by: Lama Kayal + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Reviewed-by: Jacob Keller + Link: https://patch.msgid.link/1753081999-326247-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index a4116dfb14f4..8b39c49a3c2a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -85,8 +85,9 @@ struct page_pool; + #define MLX5E_SHAMPO_WQ_HEADER_PER_PAGE (PAGE_SIZE >> MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE) + #define MLX5E_SHAMPO_LOG_WQ_HEADER_PER_PAGE (PAGE_SHIFT - MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE) + #define MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE (64) +-#define MLX5E_SHAMPO_WQ_RESRV_SIZE (64 * 1024) +-#define MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE (4096) ++#define MLX5E_SHAMPO_WQ_RESRV_SIZE_BASE_SHIFT (12) ++#define MLX5E_SHAMPO_WQ_LOG_RESRV_SIZE (16) ++#define MLX5E_SHAMPO_WQ_RESRV_SIZE BIT(MLX5E_SHAMPO_WQ_LOG_RESRV_SIZE) + + #define MLX5_MPWRQ_MIN_LOG_STRIDE_SZ(mdev) \ + (6 + MLX5_CAP_GEN(mdev, cache_line_128byte)) /* HW restriction */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +index fc945bce933a..86f6147de22b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +@@ -420,19 +420,10 @@ u8 mlx5e_shampo_get_log_hd_entry_size(struct mlx5_core_dev *mdev, + return order_base_2(DIV_ROUND_UP(MLX5E_RX_MAX_HEAD, MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE)); + } + +-u8 mlx5e_shampo_get_log_rsrv_size(struct mlx5_core_dev *mdev, +- struct mlx5e_params *params) ++static u8 mlx5e_shampo_get_log_pkt_per_rsrv(struct mlx5e_params *params) + { +- return order_base_2(MLX5E_SHAMPO_WQ_RESRV_SIZE / MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE); +-} +- +-u8 mlx5e_shampo_get_log_pkt_per_rsrv(struct mlx5_core_dev *mdev, +- struct mlx5e_params *params) +-{ +- u32 resrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * +- MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE; +- +- return order_base_2(DIV_ROUND_UP(resrv_size, params->sw_mtu)); ++ return order_base_2(DIV_ROUND_UP(MLX5E_SHAMPO_WQ_RESRV_SIZE, ++ params->sw_mtu)); + } + + u8 mlx5e_mpwqe_get_log_stride_size(struct mlx5_core_dev *mdev, +@@ -834,13 +825,12 @@ static u32 mlx5e_shampo_get_log_cq_size(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + struct mlx5e_xsk_param *xsk) + { +- int rsrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * +- MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE; + u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk)); +- int pkt_per_rsrv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params)); + u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, xsk); ++ int pkt_per_rsrv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(params)); + int wq_size = BIT(mlx5e_mpwqe_get_log_rq_size(mdev, params, xsk)); + int wqe_size = BIT(log_stride_sz) * num_strides; ++ int rsrv_size = MLX5E_SHAMPO_WQ_RESRV_SIZE; + + /* +1 is for the case that the pkt_per_rsrv dont consume the reservation + * so we get a filler cqe for the rest of the reservation. +@@ -932,10 +922,11 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev, + + MLX5_SET(wq, wq, shampo_enable, true); + MLX5_SET(wq, wq, log_reservation_size, +- mlx5e_shampo_get_log_rsrv_size(mdev, params)); ++ MLX5E_SHAMPO_WQ_LOG_RESRV_SIZE - ++ MLX5E_SHAMPO_WQ_RESRV_SIZE_BASE_SHIFT); + MLX5_SET(wq, wq, + log_max_num_of_packets_per_reservation, +- mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params)); ++ mlx5e_shampo_get_log_pkt_per_rsrv(params)); + MLX5_SET(wq, wq, log_headers_entry_size, + mlx5e_shampo_get_log_hd_entry_size(mdev, params)); + lro_timeout = +@@ -1048,18 +1039,17 @@ u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + struct mlx5e_rq_param *rq_param) + { +- int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * +- MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE; + u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, NULL)); +- int pkt_per_resv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params)); + u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, NULL); ++ int pkt_per_rsrv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(params)); + int wqe_size = BIT(log_stride_sz) * num_strides; ++ int rsrv_size = MLX5E_SHAMPO_WQ_RESRV_SIZE; + u32 hd_per_wqe; + + /* Assumption: hd_per_wqe % 8 == 0. */ +- hd_per_wqe = (wqe_size / resv_size) * pkt_per_resv; +- mlx5_core_dbg(mdev, "%s hd_per_wqe = %d rsrv_size = %d wqe_size = %d pkt_per_resv = %d\n", +- __func__, hd_per_wqe, resv_size, wqe_size, pkt_per_resv); ++ hd_per_wqe = (wqe_size / rsrv_size) * pkt_per_rsrv; ++ mlx5_core_dbg(mdev, "%s hd_per_wqe = %d rsrv_size = %d wqe_size = %d pkt_per_rsrv = %d\n", ++ __func__, hd_per_wqe, rsrv_size, wqe_size, pkt_per_rsrv); + return hd_per_wqe; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +index bd5877acc5b1..919895f64dcd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +@@ -97,10 +97,6 @@ u8 mlx5e_mpwqe_get_log_rq_size(struct mlx5_core_dev *mdev, + struct mlx5e_xsk_param *xsk); + u8 mlx5e_shampo_get_log_hd_entry_size(struct mlx5_core_dev *mdev, + struct mlx5e_params *params); +-u8 mlx5e_shampo_get_log_rsrv_size(struct mlx5_core_dev *mdev, +- struct mlx5e_params *params); +-u8 mlx5e_shampo_get_log_pkt_per_rsrv(struct mlx5_core_dev *mdev, +- struct mlx5e_params *params); + u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + struct mlx5e_rq_param *rq_param); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1453-net-mlx5e-shampo-remove-mlx5e-shampo-get-log-hd-entry-size.patch b/SOURCES/1453-net-mlx5e-shampo-remove-mlx5e-shampo-get-log-hd-entry-size.patch new file mode 100644 index 000000000..803fe5e4f --- /dev/null +++ b/SOURCES/1453-net-mlx5e-shampo-remove-mlx5e-shampo-get-log-hd-entry-size.patch @@ -0,0 +1,83 @@ +From 120cdc1802d8b873c3e75e1f0bfe0f784a78b0ea Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:04:04 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO, Remove + mlx5e_shampo_get_log_hd_entry_size() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit eee529c0044e06959a40c6dba6d85df493f54fc3 +Author: Lama Kayal +Date: Mon Jul 21 10:13:18 2025 +0300 + + net/mlx5e: SHAMPO, Remove mlx5e_shampo_get_log_hd_entry_size() + + Refactor mlx5e_shampo_get_log_hd_entry_size() as macro, for more + simplicity. + + Signed-off-by: Lama Kayal + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Reviewed-by: Jacob Keller + Link: https://patch.msgid.link/1753081999-326247-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 8b39c49a3c2a..765180379e62 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -84,7 +84,7 @@ struct page_pool; + #define MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE (9) + #define MLX5E_SHAMPO_WQ_HEADER_PER_PAGE (PAGE_SIZE >> MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE) + #define MLX5E_SHAMPO_LOG_WQ_HEADER_PER_PAGE (PAGE_SHIFT - MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE) +-#define MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE (64) ++#define MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE_SHIFT (6) + #define MLX5E_SHAMPO_WQ_RESRV_SIZE_BASE_SHIFT (12) + #define MLX5E_SHAMPO_WQ_LOG_RESRV_SIZE (16) + #define MLX5E_SHAMPO_WQ_RESRV_SIZE BIT(MLX5E_SHAMPO_WQ_LOG_RESRV_SIZE) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +index 86f6147de22b..3cca06a74cf9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +@@ -414,12 +414,6 @@ u8 mlx5e_mpwqe_get_log_rq_size(struct mlx5_core_dev *mdev, + return params->log_rq_mtu_frames - log_pkts_per_wqe; + } + +-u8 mlx5e_shampo_get_log_hd_entry_size(struct mlx5_core_dev *mdev, +- struct mlx5e_params *params) +-{ +- return order_base_2(DIV_ROUND_UP(MLX5E_RX_MAX_HEAD, MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE)); +-} +- + static u8 mlx5e_shampo_get_log_pkt_per_rsrv(struct mlx5e_params *params) + { + return order_base_2(DIV_ROUND_UP(MLX5E_SHAMPO_WQ_RESRV_SIZE, +@@ -928,7 +922,8 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev, + log_max_num_of_packets_per_reservation, + mlx5e_shampo_get_log_pkt_per_rsrv(params)); + MLX5_SET(wq, wq, log_headers_entry_size, +- mlx5e_shampo_get_log_hd_entry_size(mdev, params)); ++ MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE - ++ MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE_SHIFT); + lro_timeout = + mlx5e_choose_lro_timeout(mdev, + MLX5E_DEFAULT_SHAMPO_TIMEOUT); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +index 919895f64dcd..488ccdbc1e2c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +@@ -95,8 +95,6 @@ bool mlx5e_rx_mpwqe_is_linear_skb(struct mlx5_core_dev *mdev, + u8 mlx5e_mpwqe_get_log_rq_size(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + struct mlx5e_xsk_param *xsk); +-u8 mlx5e_shampo_get_log_hd_entry_size(struct mlx5_core_dev *mdev, +- struct mlx5e_params *params); + u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + struct mlx5e_rq_param *rq_param); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1454-net-mlx5e-remove-duplicate-mkey-from-shampo-header.patch b/SOURCES/1454-net-mlx5e-remove-duplicate-mkey-from-shampo-header.patch new file mode 100644 index 000000000..3bc528947 --- /dev/null +++ b/SOURCES/1454-net-mlx5e-remove-duplicate-mkey-from-shampo-header.patch @@ -0,0 +1,142 @@ +From df022bfa01899f989571317ef32c93103244750e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:04:04 -0400 +Subject: [PATCH] net/mlx5e: Remove duplicate mkey from SHAMPO header + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit eeaf11464f38db9307b7d9ed6c7750b83c344ff8 +Author: Lama Kayal +Date: Mon Jul 21 10:13:19 2025 +0300 + + net/mlx5e: Remove duplicate mkey from SHAMPO header + + SHAMPO structure holds two variations of the mkey, which is unnecessary, + a duplication that's repeated per rq. + + Remove duplicate mkey information and keep only one version, the one + used in the fast path, rename field to reflect field type clearly. + + Signed-off-by: Lama Kayal + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Reviewed-by: Jacob Keller + Link: https://patch.msgid.link/1753081999-326247-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 765180379e62..130756b1cc94 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -630,14 +630,13 @@ struct mlx5e_dma_info { + }; + + struct mlx5e_shampo_hd { +- u32 mkey; + struct mlx5e_frag_page *pages; + u32 hd_per_wq; + u16 hd_per_wqe; + unsigned long *bitmap; + u16 pi; + u16 ci; +- __be32 key; ++ __be32 mkey_be; + }; + + struct mlx5e_hw_gro_data { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 5503882839b8..91c1d56d79f5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -546,18 +546,26 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq + } + + static int mlx5e_create_rq_hd_umr_mkey(struct mlx5_core_dev *mdev, +- u16 hd_per_wq, u32 *umr_mkey) ++ u16 hd_per_wq, __be32 *umr_mkey) + { + u32 max_ksm_size = BIT(MLX5_CAP_GEN(mdev, log_max_klm_list_size)); ++ u32 mkey; ++ int err; + + if (max_ksm_size < hd_per_wq) { + mlx5_core_err(mdev, "max ksm list size 0x%x is smaller than shampo header buffer list size 0x%x\n", + max_ksm_size, hd_per_wq); + return -EINVAL; + } +- return mlx5e_create_umr_ksm_mkey(mdev, hd_per_wq, +- MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE, +- umr_mkey); ++ ++ err = mlx5e_create_umr_ksm_mkey(mdev, hd_per_wq, ++ MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE, ++ &mkey); ++ if (err) ++ return err; ++ ++ *umr_mkey = cpu_to_be32(mkey); ++ return 0; + } + + static void mlx5e_init_frags_partition(struct mlx5e_rq *rq) +@@ -781,11 +789,10 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + goto err_shampo_hd_info_alloc; + + err = mlx5e_create_rq_hd_umr_mkey(mdev, hd_per_wq, +- &rq->mpwqe.shampo->mkey); ++ &rq->mpwqe.shampo->mkey_be); + if (err) + goto err_umr_mkey; + +- rq->mpwqe.shampo->key = cpu_to_be32(rq->mpwqe.shampo->mkey); + rq->mpwqe.shampo->hd_per_wqe = + mlx5e_shampo_hd_per_wqe(mdev, params, rqp); + wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz)); +@@ -830,7 +837,7 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + err_hw_gro_data: + page_pool_destroy(rq->hd_page_pool); + err_hds_page_pool: +- mlx5_core_destroy_mkey(mdev, rq->mpwqe.shampo->mkey); ++ mlx5_core_destroy_mkey(mdev, be32_to_cpu(rq->mpwqe.shampo->mkey_be)); + err_umr_mkey: + mlx5e_rq_shampo_hd_info_free(rq); + err_shampo_hd_info_alloc: +@@ -847,7 +854,8 @@ static void mlx5e_rq_free_shampo(struct mlx5e_rq *rq) + if (rq->hd_page_pool != rq->page_pool) + page_pool_destroy(rq->hd_page_pool); + mlx5e_rq_shampo_hd_info_free(rq); +- mlx5_core_destroy_mkey(rq->mdev, rq->mpwqe.shampo->mkey); ++ mlx5_core_destroy_mkey(rq->mdev, ++ be32_to_cpu(rq->mpwqe.shampo->mkey_be)); + kvfree(rq->mpwqe.shampo); + } + +@@ -1115,7 +1123,8 @@ int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param, u16 q_cou + if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) { + MLX5_SET(wq, wq, log_headers_buffer_entry_num, + order_base_2(rq->mpwqe.shampo->hd_per_wq)); +- MLX5_SET(wq, wq, headers_mkey, rq->mpwqe.shampo->mkey); ++ MLX5_SET(wq, wq, headers_mkey, ++ be32_to_cpu(rq->mpwqe.shampo->mkey_be)); + } + + mlx5_fill_page_frag_array(&rq->wq_ctrl.buf, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 36a4780332d7..724c5db20c19 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -676,7 +676,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + wqe_bbs = MLX5E_KSM_UMR_WQEBBS(ksm_entries); + pi = mlx5e_icosq_get_next_pi(sq, wqe_bbs); + umr_wqe = mlx5_wq_cyc_get_wqe(&sq->wq, pi); +- build_ksm_umr(sq, umr_wqe, shampo->key, index, ksm_entries); ++ build_ksm_umr(sq, umr_wqe, shampo->mkey_be, index, ksm_entries); + + WARN_ON_ONCE(ksm_entries & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)); + while (i < ksm_entries) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1455-pci-tph-expose-pcie-tph-get-st-table-size.patch b/SOURCES/1455-pci-tph-expose-pcie-tph-get-st-table-size.patch new file mode 100644 index 000000000..29ebd3834 --- /dev/null +++ b/SOURCES/1455-pci-tph-expose-pcie-tph-get-st-table-size.patch @@ -0,0 +1,95 @@ +From 73ba61cbb0ce67722aa9069f6abc63761e6877b2 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:08:11 -0400 +Subject: [PATCH] PCI/TPH: Expose pcie_tph_get_st_table_size() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 0a61ec9cc51b0e43981222005444508437e95b33 +Author: Yishai Hadas +Date: Thu Jul 17 15:17:25 2025 +0300 + + PCI/TPH: Expose pcie_tph_get_st_table_size() + + Expose pcie_tph_get_st_table_size() to be used by drivers as will be + done in the next patch from the series. + + Signed-off-by: Yishai Hadas + Acked-by: Bjorn Helgaas + Link: https://patch.msgid.link/9ae851e0ee42cc56d2a30276e116b65091030ceb.1752752567.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c +index 07de59ca2ebf..6822c2e3f93f 100644 +--- a/drivers/pci/tph.c ++++ b/drivers/pci/tph.c +@@ -168,7 +168,7 @@ static u32 get_st_table_loc(struct pci_dev *pdev) + * Return the size of ST table. If ST table is not in TPH Requester Extended + * Capability space, return 0. Otherwise return the ST Table Size + 1. + */ +-static u16 get_st_table_size(struct pci_dev *pdev) ++u16 pcie_tph_get_st_table_size(struct pci_dev *pdev) + { + u32 reg; + u32 loc; +@@ -185,6 +185,7 @@ static u16 get_st_table_size(struct pci_dev *pdev) + + return FIELD_GET(PCI_TPH_CAP_ST_MASK, reg) + 1; + } ++EXPORT_SYMBOL(pcie_tph_get_st_table_size); + + /* Return device's Root Port completer capability */ + static u8 get_rp_completer_type(struct pci_dev *pdev) +@@ -253,7 +254,7 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag) + int offset; + + /* Check if index is out of bound */ +- st_table_size = get_st_table_size(pdev); ++ st_table_size = pcie_tph_get_st_table_size(pdev); + if (index >= st_table_size) + return -ENXIO; + +@@ -485,7 +486,7 @@ void pci_restore_tph_state(struct pci_dev *pdev) + pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, *cap++); + st_entry = (u16 *)cap; + offset = PCI_TPH_BASE_SIZEOF; +- num_entries = get_st_table_size(pdev); ++ num_entries = pcie_tph_get_st_table_size(pdev); + for (i = 0; i < num_entries; i++) { + pci_write_config_word(pdev, pdev->tph_cap + offset, + *st_entry++); +@@ -517,7 +518,7 @@ void pci_save_tph_state(struct pci_dev *pdev) + /* Save all ST entries in extended capability structure */ + st_entry = (u16 *)cap; + offset = PCI_TPH_BASE_SIZEOF; +- num_entries = get_st_table_size(pdev); ++ num_entries = pcie_tph_get_st_table_size(pdev); + for (i = 0; i < num_entries; i++) { + pci_read_config_word(pdev, pdev->tph_cap + offset, + st_entry++); +@@ -541,7 +542,7 @@ void pci_tph_init(struct pci_dev *pdev) + if (!pdev->tph_cap) + return; + +- num_entries = get_st_table_size(pdev); ++ num_entries = pcie_tph_get_st_table_size(pdev); + save_size = sizeof(u32) + num_entries * sizeof(u16); + pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size); + } +diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h +index c3e806c13d64..9e4e331b1603 100644 +--- a/include/linux/pci-tph.h ++++ b/include/linux/pci-tph.h +@@ -28,6 +28,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev, + unsigned int cpu_uid, u16 *tag); + void pcie_disable_tph(struct pci_dev *pdev); + int pcie_enable_tph(struct pci_dev *pdev, int mode); ++u16 pcie_tph_get_st_table_size(struct pci_dev *pdev); + #else + static inline int pcie_tph_set_st_entry(struct pci_dev *pdev, + unsigned int index, u16 tag) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1456-net-mlx5-expose-ifc-bits-for-tph.patch b/SOURCES/1456-net-mlx5-expose-ifc-bits-for-tph.patch new file mode 100644 index 000000000..581adafee --- /dev/null +++ b/SOURCES/1456-net-mlx5-expose-ifc-bits-for-tph.patch @@ -0,0 +1,65 @@ +From cd08612ff17299301ce7302f125ed6c09670a1eb Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Thu, 16 Apr 2026 18:04:05 -0400 +Subject: [PATCH] net/mlx5: Expose IFC bits for TPH + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5f9ec7880e6b3c4d0cf242fe28506d0b084328b1 +Author: Yishai Hadas +Date: Thu Jul 17 15:17:26 2025 +0300 + + net/mlx5: Expose IFC bits for TPH + + Expose IFC bits for the TPH functionality. + + Signed-off-by: Yishai Hadas + Reviewed-by: Edward Srouji + Reviewed-by: Moshe Shemesh + Link: https://patch.msgid.link/38ea3a0d56551364214e8edf359c9c77c9a3b71b.1752752567.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 21ed80a892a7..fd3bedd8dbcb 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -1870,7 +1870,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { + u8 reserved_at_280[0x10]; + u8 max_wqe_sz_sq[0x10]; + +- u8 reserved_at_2a0[0xb]; ++ u8 reserved_at_2a0[0x7]; ++ u8 mkey_pcie_tph[0x1]; ++ u8 reserved_at_2a8[0x3]; + u8 shampo[0x1]; + u8 reserved_at_2ac[0x4]; + u8 max_wqe_sz_rq[0x10]; +@@ -4417,6 +4419,10 @@ enum { + MLX5_MKC_ACCESS_MODE_CROSSING = 0x6, + }; + ++enum { ++ MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX = 0, ++}; ++ + struct mlx5_ifc_mkc_bits { + u8 reserved_at_0[0x1]; + u8 free[0x1]; +@@ -4468,7 +4474,11 @@ struct mlx5_ifc_mkc_bits { + u8 relaxed_ordering_read[0x1]; + u8 log_page_size[0x6]; + +- u8 reserved_at_1e0[0x20]; ++ u8 reserved_at_1e0[0x5]; ++ u8 pcie_tph_en[0x1]; ++ u8 pcie_tph_ph[0x2]; ++ u8 pcie_tph_steering_tag_index[0x8]; ++ u8 reserved_at_1f0[0x10]; + }; + + struct mlx5_ifc_pkey_bits { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1457-net-mlx5-add-support-for-device-steering-tag.patch b/SOURCES/1457-net-mlx5-add-support-for-device-steering-tag.patch new file mode 100644 index 000000000..da663705b --- /dev/null +++ b/SOURCES/1457-net-mlx5-add-support-for-device-steering-tag.patch @@ -0,0 +1,348 @@ +From 5034908f65e9590cfbd34d152e901eca8fa0af9b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:20:02 -0400 +Subject: [PATCH] net/mlx5: Add support for device steering tag + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +A context diff due to the leftover defenition of "struct mlx5_thermal" +that was suppose to be deleted by the following commit: +c9398e30c639 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API") + +commit 888a7776f4fb04c19bec70c737c61c2f383c6b1e +Author: Yishai Hadas +Date: Thu Jul 17 15:17:27 2025 +0300 + + net/mlx5: Add support for device steering tag + + Background, from PCIe specification 6.2. + + TLP Processing Hints (TPH) + -------------------------- + TLP Processing Hints is an optional feature that provides hints in + Request TLP headers to facilitate optimized processing of Requests that + target Memory Space. These Processing Hints enable the system hardware + (e.g., the Root Complex and/or Endpoints) to optimize platform + resources such as system and memory interconnect on a per TLP basis. + Steering Tags are system-specific values used to identify a processing + resource that a Requester explicitly targets. System software discovers + and identifies TPH capabilities to determine the Steering Tag allocation + for each Function that supports TPH. + + This patch adds steering tag support for mlx5 based NICs by: + + - Enabling the TPH functionality over PCI if both FW and OS support it. + - Managing steering tags and their matching steering indexes by + writing a ST to an ST index over the PCI configuration space. + - Exposing APIs to upper layers (e.g.,mlx5_ib) to allow usage of + the PCI TPH infrastructure. + + Further details: + - Upon probing of a device, the feature will be enabled based + on both capability detection and OS support. + + - It will retrieve the appropriate ST for a given CPU ID and memory + type using the pcie_tph_get_cpu_st() API. + + - It will track available ST indices according to the configuration + space table size (expected to be 63 entries), reserving index 0 to + indicate non-TPH use. + + - It will assign a free ST index with a ST using the + pcie_tph_set_st_entry() API. + + - It will reuse the same index for identical (CPU ID + memory type) + combinations by maintaining a reference count per entry. + + - It will expose APIs to upper layers (e.g., mlx5_ib) to allow usage of + the PCI TPH infrastructure. + + - SF will use its parent PF stuff. + + Signed-off-by: Yishai Hadas + Link: https://patch.msgid.link/de1ae7398e9e34eacd8c10845683df44fc9e32f8.1752752567.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +index 650df18a9216..a253c73db9e5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile ++++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +@@ -167,5 +167,10 @@ mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o sf/dev/driver.o irq_ + # + mlx5_core-$(CONFIG_MLX5_SF_MANAGER) += sf/cmd.o sf/hw_table.o sf/devlink.o + ++# ++# TPH support ++# ++mlx5_core-$(CONFIG_PCIE_TPH) += lib/st.o ++ + obj-$(CONFIG_MLX5_DPLL) += mlx5_dpll.o + mlx5_dpll-y := dpll.o +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c +new file mode 100644 +index 000000000000..47fe215f66bf +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c +@@ -0,0 +1,164 @@ ++// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB ++/* ++ * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved ++ */ ++ ++#include ++#include ++ ++#include "mlx5_core.h" ++#include "lib/mlx5.h" ++ ++struct mlx5_st_idx_data { ++ refcount_t usecount; ++ u16 tag; ++}; ++ ++struct mlx5_st { ++ /* serialize access upon alloc/free flows */ ++ struct mutex lock; ++ struct xa_limit index_limit; ++ struct xarray idx_xa; /* key == index, value == struct mlx5_st_idx_data */ ++}; ++ ++struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev) ++{ ++ struct pci_dev *pdev = dev->pdev; ++ struct mlx5_st *st; ++ u16 num_entries; ++ int ret; ++ ++ if (!MLX5_CAP_GEN(dev, mkey_pcie_tph)) ++ return NULL; ++ ++#ifdef CONFIG_MLX5_SF ++ if (mlx5_core_is_sf(dev)) ++ return dev->priv.parent_mdev->st; ++#endif ++ ++ /* Checking whether the device is capable */ ++ if (!pdev->tph_cap) ++ return NULL; ++ ++ num_entries = pcie_tph_get_st_table_size(pdev); ++ /* We need a reserved entry for non TPH cases */ ++ if (num_entries < 2) ++ return NULL; ++ ++ /* The OS doesn't support ST */ ++ ret = pcie_enable_tph(pdev, PCI_TPH_ST_DS_MODE); ++ if (ret) ++ return NULL; ++ ++ st = kzalloc(sizeof(*st), GFP_KERNEL); ++ if (!st) ++ goto end; ++ ++ mutex_init(&st->lock); ++ xa_init_flags(&st->idx_xa, XA_FLAGS_ALLOC); ++ /* entry 0 is reserved for non TPH cases */ ++ st->index_limit.min = MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX + 1; ++ st->index_limit.max = num_entries - 1; ++ ++ return st; ++ ++end: ++ pcie_disable_tph(dev->pdev); ++ return NULL; ++} ++ ++void mlx5_st_destroy(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_st *st = dev->st; ++ ++ if (mlx5_core_is_sf(dev) || !st) ++ return; ++ ++ pcie_disable_tph(dev->pdev); ++ WARN_ON_ONCE(!xa_empty(&st->idx_xa)); ++ kfree(st); ++} ++ ++int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, ++ unsigned int cpu_uid, u16 *st_index) ++{ ++ struct mlx5_st_idx_data *idx_data; ++ struct mlx5_st *st = dev->st; ++ unsigned long index; ++ u32 xa_id; ++ u16 tag; ++ int ret; ++ ++ if (!st) ++ return -EOPNOTSUPP; ++ ++ ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag); ++ if (ret) ++ return ret; ++ ++ mutex_lock(&st->lock); ++ ++ xa_for_each(&st->idx_xa, index, idx_data) { ++ if (tag == idx_data->tag) { ++ refcount_inc(&idx_data->usecount); ++ *st_index = index; ++ goto end; ++ } ++ } ++ ++ idx_data = kzalloc(sizeof(*idx_data), GFP_KERNEL); ++ if (!idx_data) { ++ ret = -ENOMEM; ++ goto end; ++ } ++ ++ refcount_set(&idx_data->usecount, 1); ++ idx_data->tag = tag; ++ ++ ret = xa_alloc(&st->idx_xa, &xa_id, idx_data, st->index_limit, GFP_KERNEL); ++ if (ret) ++ goto clean_idx_data; ++ ++ ret = pcie_tph_set_st_entry(dev->pdev, xa_id, tag); ++ if (ret) ++ goto clean_idx_xa; ++ ++ *st_index = xa_id; ++ goto end; ++ ++clean_idx_xa: ++ xa_erase(&st->idx_xa, xa_id); ++clean_idx_data: ++ kfree(idx_data); ++end: ++ mutex_unlock(&st->lock); ++ return ret; ++} ++EXPORT_SYMBOL_GPL(mlx5_st_alloc_index); ++ ++int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index) ++{ ++ struct mlx5_st_idx_data *idx_data; ++ struct mlx5_st *st = dev->st; ++ int ret = 0; ++ ++ if (!st) ++ return -EOPNOTSUPP; ++ ++ mutex_lock(&st->lock); ++ idx_data = xa_load(&st->idx_xa, st_index); ++ if (WARN_ON_ONCE(!idx_data)) { ++ ret = -EINVAL; ++ goto end; ++ } ++ ++ if (refcount_dec_and_test(&idx_data->usecount)) { ++ xa_erase(&st->idx_xa, st_index); ++ /* We leave PCI config space as was before, no mkey will refer to it */ ++ } ++ ++end: ++ mutex_unlock(&st->lock); ++ return ret; ++} ++EXPORT_SYMBOL_GPL(mlx5_st_dealloc_index); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 42daaf8387da..f6b04b2ae623 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1102,6 +1102,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + } + + dev->dm = mlx5_dm_create(dev); ++ dev->st = mlx5_st_create(dev); + dev->tracer = mlx5_fw_tracer_create(dev); + dev->hv_vhca = mlx5_hv_vhca_create(dev); + dev->rsc_dump = mlx5_rsc_dump_create(dev); +@@ -1150,6 +1151,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev) + mlx5_rsc_dump_destroy(dev); + mlx5_hv_vhca_destroy(dev->hv_vhca); + mlx5_fw_tracer_destroy(dev->tracer); ++ mlx5_st_destroy(dev); + mlx5_dm_cleanup(dev); + mlx5_fs_core_free(dev); + mlx5_sf_table_cleanup(dev); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index c518380c4ce7..b6d53db27cd5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -300,6 +300,15 @@ int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 arm, u8 mode); + struct mlx5_dm *mlx5_dm_create(struct mlx5_core_dev *dev); + void mlx5_dm_cleanup(struct mlx5_core_dev *dev); + ++#ifdef CONFIG_PCIE_TPH ++struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev); ++void mlx5_st_destroy(struct mlx5_core_dev *dev); ++#else ++static inline struct mlx5_st * ++mlx5_st_create(struct mlx5_core_dev *dev) { return NULL; } ++static inline void mlx5_st_destroy(struct mlx5_core_dev *dev) { return; } ++#endif ++ + void mlx5_toggle_port_link(struct mlx5_core_dev *dev); + int mlx5_set_port_admin_status(struct mlx5_core_dev *dev, + enum mlx5_port_status status); +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 39e9146e079d..8c5fbfb85749 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -36,6 +36,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -688,7 +689,7 @@ struct mlx5_fw_tracer; + struct mlx5_vxlan; + struct mlx5_geneve; + struct mlx5_hv_vhca; +-struct mlx5_thermal; ++struct mlx5_st; + + #define MLX5_LOG_SW_ICM_BLOCK_SIZE(dev) (MLX5_CAP_DEV_MEM(dev, log_sw_icm_alloc_granularity)) + #define MLX5_SW_ICM_BLOCK_SIZE(dev) (1 << MLX5_LOG_SW_ICM_BLOCK_SIZE(dev)) +@@ -758,6 +759,7 @@ struct mlx5_core_dev { + u32 issi; + struct mlx5e_resources mlx5e_res; + struct mlx5_dm *dm; ++ struct mlx5_st *st; + struct mlx5_vxlan *vxlan; + struct mlx5_geneve *geneve; + struct { +@@ -1161,6 +1163,23 @@ int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type, + int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type, + u64 length, u16 uid, phys_addr_t addr, u32 obj_id); + ++#ifdef CONFIG_PCIE_TPH ++int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, ++ unsigned int cpu_uid, u16 *st_index); ++int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index); ++#else ++static inline int mlx5_st_alloc_index(struct mlx5_core_dev *dev, ++ enum tph_mem_type mem_type, ++ unsigned int cpu_uid, u16 *st_index) ++{ ++ return -EOPNOTSUPP; ++} ++static inline int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index) ++{ ++ return -EOPNOTSUPP; ++} ++#endif ++ + struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev); + void mlx5_vf_put_core_dev(struct mlx5_core_dev *mdev); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1458-net-mlx5-fix-build-wframe-larger-than-warnings.patch b/SOURCES/1458-net-mlx5-fix-build-wframe-larger-than-warnings.patch new file mode 100644 index 000000000..70899c3f6 --- /dev/null +++ b/SOURCES/1458-net-mlx5-fix-build-wframe-larger-than-warnings.patch @@ -0,0 +1,220 @@ +From ac499e95a7f84f3fafcf362d7ea068edb6850edf Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:25:38 -0400 +Subject: [PATCH] net/mlx5: Fix build -Wframe-larger-than warnings +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 4335012705499aa24cec714ab746c0d3abf97cab +Author: Zhu Yanjun +Date: Tue Jul 22 14:20:23 2025 -0700 + + net/mlx5: Fix build -Wframe-larger-than warnings + + When building, the following warnings will appear. + " + pci_irq.c: In function ‘mlx5_ctrl_irq_request’: + pci_irq.c:494:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] + + pci_irq.c: In function ‘mlx5_irq_request_vector’: + pci_irq.c:561:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] + + eq.c: In function ‘comp_irq_request_sf’: + eq.c:897:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=] + + irq_affinity.c: In function ‘irq_pool_request_irq’: + irq_affinity.c:74:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=] + " + + These warnings indicate that the stack frame size exceeds 1024 bytes in + these functions. + + To resolve this, instead of allocating large memory buffers on the stack, + it is better to use kvzalloc to allocate memory dynamically on the heap. + This approach reduces stack usage and eliminates these frame size warnings. + + Acked-by: Junxian Huang + Signed-off-by: Zhu Yanjun + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20250722212023.244296-1-yanjun.zhu@linux.dev + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +index 66dce17219a6..1ab77159409d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +@@ -876,19 +876,25 @@ static int comp_irq_request_sf(struct mlx5_core_dev *dev, u16 vecidx) + { + struct mlx5_irq_pool *pool = mlx5_irq_table_get_comp_irq_pool(dev); + struct mlx5_eq_table *table = dev->priv.eq_table; +- struct irq_affinity_desc af_desc = {}; ++ struct irq_affinity_desc *af_desc; + struct mlx5_irq *irq; + +- /* In case SF irq pool does not exist, fallback to the PF irqs*/ ++ /* In case SF irq pool does not exist, fallback to the PF irqs */ + if (!mlx5_irq_pool_is_sf_pool(pool)) + return comp_irq_request_pci(dev, vecidx); + +- af_desc.is_managed = false; +- cpumask_copy(&af_desc.mask, cpu_online_mask); +- cpumask_andnot(&af_desc.mask, &af_desc.mask, &table->used_cpus); +- irq = mlx5_irq_affinity_request(dev, pool, &af_desc); +- if (IS_ERR(irq)) ++ af_desc = kvzalloc(sizeof(*af_desc), GFP_KERNEL); ++ if (!af_desc) ++ return -ENOMEM; ++ ++ af_desc->is_managed = false; ++ cpumask_copy(&af_desc->mask, cpu_online_mask); ++ cpumask_andnot(&af_desc->mask, &af_desc->mask, &table->used_cpus); ++ irq = mlx5_irq_affinity_request(dev, pool, af_desc); ++ if (IS_ERR(irq)) { ++ kvfree(af_desc); + return PTR_ERR(irq); ++ } + + cpumask_or(&table->used_cpus, &table->used_cpus, mlx5_irq_get_affinity_mask(irq)); + mlx5_core_dbg(pool->dev, "IRQ %u mapped to cpu %*pbl, %u EQs on this irq\n", +@@ -896,6 +902,8 @@ static int comp_irq_request_sf(struct mlx5_core_dev *dev, u16 vecidx) + cpumask_pr_args(mlx5_irq_get_affinity_mask(irq)), + mlx5_irq_read_locked(irq) / MLX5_EQ_REFS_PER_IRQ); + ++ kvfree(af_desc); ++ + return xa_err(xa_store(&table->comp_irqs, vecidx, irq, GFP_KERNEL)); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c b/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c +index 2691d88cdee1..82d3c2568244 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c +@@ -47,29 +47,40 @@ static int cpu_get_least_loaded(struct mlx5_irq_pool *pool, + static struct mlx5_irq * + irq_pool_request_irq(struct mlx5_irq_pool *pool, struct irq_affinity_desc *af_desc) + { +- struct irq_affinity_desc auto_desc = {}; ++ struct irq_affinity_desc *auto_desc; + struct mlx5_irq *irq; + u32 irq_index; + int err; + ++ auto_desc = kvzalloc(sizeof(*auto_desc), GFP_KERNEL); ++ if (!auto_desc) ++ return ERR_PTR(-ENOMEM); ++ + err = xa_alloc(&pool->irqs, &irq_index, NULL, pool->xa_num_irqs, GFP_KERNEL); +- if (err) ++ if (err) { ++ kvfree(auto_desc); + return ERR_PTR(err); ++ } ++ + if (pool->irqs_per_cpu) { + if (cpumask_weight(&af_desc->mask) > 1) + /* if req_mask contain more then one CPU, set the least loadad CPU + * of req_mask + */ + cpumask_set_cpu(cpu_get_least_loaded(pool, &af_desc->mask), +- &auto_desc.mask); ++ &auto_desc->mask); + else + cpu_get(pool, cpumask_first(&af_desc->mask)); + } ++ + irq = mlx5_irq_alloc(pool, irq_index, +- cpumask_empty(&auto_desc.mask) ? af_desc : &auto_desc, ++ cpumask_empty(&auto_desc->mask) ? af_desc : auto_desc, + NULL); + if (IS_ERR(irq)) + xa_erase(&pool->irqs, irq_index); ++ ++ kvfree(auto_desc); ++ + return irq; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +index 40024cfa3099..692ef9c2f729 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +@@ -470,26 +470,32 @@ void mlx5_ctrl_irq_release(struct mlx5_core_dev *dev, struct mlx5_irq *ctrl_irq) + struct mlx5_irq *mlx5_ctrl_irq_request(struct mlx5_core_dev *dev) + { + struct mlx5_irq_pool *pool = ctrl_irq_pool_get(dev); +- struct irq_affinity_desc af_desc; ++ struct irq_affinity_desc *af_desc; + struct mlx5_irq *irq; + +- cpumask_copy(&af_desc.mask, cpu_online_mask); +- af_desc.is_managed = false; ++ af_desc = kvzalloc(sizeof(*af_desc), GFP_KERNEL); ++ if (!af_desc) ++ return ERR_PTR(-ENOMEM); ++ ++ cpumask_copy(&af_desc->mask, cpu_online_mask); ++ af_desc->is_managed = false; + if (!mlx5_irq_pool_is_sf_pool(pool)) { + /* In case we are allocating a control IRQ from a pci device's pool. + * This can happen also for a SF if the SFs pool is empty. + */ + if (!pool->xa_num_irqs.max) { +- cpumask_clear(&af_desc.mask); ++ cpumask_clear(&af_desc->mask); + /* In case we only have a single IRQ for PF/VF */ +- cpumask_set_cpu(cpumask_first(cpu_online_mask), &af_desc.mask); ++ cpumask_set_cpu(cpumask_first(cpu_online_mask), &af_desc->mask); + } + /* Allocate the IRQ in index 0. The vector was already allocated */ +- irq = irq_pool_request_vector(pool, 0, &af_desc, NULL); ++ irq = irq_pool_request_vector(pool, 0, af_desc, NULL); + } else { +- irq = mlx5_irq_affinity_request(dev, pool, &af_desc); ++ irq = mlx5_irq_affinity_request(dev, pool, af_desc); + } + ++ kvfree(af_desc); ++ + return irq; + } + +@@ -548,16 +554,26 @@ struct mlx5_irq *mlx5_irq_request_vector(struct mlx5_core_dev *dev, u16 cpu, + { + struct mlx5_irq_table *table = mlx5_irq_table_get(dev); + struct mlx5_irq_pool *pool = table->pcif_pool; +- struct irq_affinity_desc af_desc; + int offset = MLX5_IRQ_VEC_COMP_BASE; ++ struct irq_affinity_desc *af_desc; ++ struct mlx5_irq *irq; ++ ++ af_desc = kvzalloc(sizeof(*af_desc), GFP_KERNEL); ++ if (!af_desc) ++ return ERR_PTR(-ENOMEM); + + if (!pool->xa_num_irqs.max) + offset = 0; + +- af_desc.is_managed = false; +- cpumask_clear(&af_desc.mask); +- cpumask_set_cpu(cpu, &af_desc.mask); +- return mlx5_irq_request(dev, vecidx + offset, &af_desc, rmap); ++ af_desc->is_managed = false; ++ cpumask_clear(&af_desc->mask); ++ cpumask_set_cpu(cpu, &af_desc->mask); ++ ++ irq = mlx5_irq_request(dev, vecidx + offset, af_desc, rmap); ++ ++ kvfree(af_desc); ++ ++ return irq; + } + + static struct mlx5_irq_pool * +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1459-net-fix-typos.patch b/SOURCES/1459-net-fix-typos.patch new file mode 100644 index 000000000..fd3b333bf --- /dev/null +++ b/SOURCES/1459-net-fix-typos.patch @@ -0,0 +1,42 @@ +From 51c8ae5f817c369912dd8b597d224bb562481baf Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:26:59 -0400 +Subject: [PATCH] net: Fix typos + +Conflicts: +Include only the mlx5 hunk. + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fe09560f82415d6592e74821e031a76eed173a03 +Author: Bjorn Helgaas +Date: Wed Jul 23 15:15:05 2025 -0500 + + net: Fix typos + + Fix typos in comments and error messages. + + Signed-off-by: Bjorn Helgaas + Reviewed-by: David Arinzon + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 130756b1cc94..ebe793bba8dd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -375,7 +375,7 @@ struct mlx5e_sq_dma { + enum mlx5e_dma_map_type type; + }; + +-/* Keep this enum consistent with with the corresponding strings array ++/* Keep this enum consistent with the corresponding strings array + * declared in en/reporter_tx.c + */ + enum { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1460-net-mlx5e-clear-read-only-port-buffer-size-in-pbmc-before-up.patch b/SOURCES/1460-net-mlx5e-clear-read-only-port-buffer-size-in-pbmc-before-up.patch new file mode 100644 index 000000000..b3e0ef066 --- /dev/null +++ b/SOURCES/1460-net-mlx5e-clear-read-only-port-buffer-size-in-pbmc-before-up.patch @@ -0,0 +1,53 @@ +From eb9b792b02f53bb7126f64e8739c50016165d05e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:26:59 -0400 +Subject: [PATCH] net/mlx5e: Clear Read-Only port buffer size in PBMC before + update + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fd4b97246a23c1149479b88490946bcfbd28de63 +Author: Alexei Lazar +Date: Wed Jul 23 10:44:30 2025 +0300 + + net/mlx5e: Clear Read-Only port buffer size in PBMC before update + + When updating the PBMC register, we read its current value, + modify desired fields, then write it back. + + The port_buffer_size field within PBMC is Read-Only (RO). + If this RO field contains a non-zero value when read, + attempting to write it back will cause the entire PBMC + register update to fail. + + This commit ensures port_buffer_size is explicitly cleared + to zero after reading the PBMC register but before writing + back the modified value. + This allows updates to other fields in the PBMC register to succeed. + + Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") + Signed-off-by: Alexei Lazar + Reviewed-by: Yael Chemla + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1753256672-337784-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +index 8e25f4ef5ccc..5ae787656a7c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +@@ -331,6 +331,9 @@ static int port_set_buffer(struct mlx5e_priv *priv, + if (err) + goto out; + ++ /* RO bits should be set to 0 on write */ ++ MLX5_SET(pbmc_reg, in, port_buffer_size, 0); ++ + err = mlx5e_port_set_pbmc(mdev, in); + out: + kfree(in); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1461-net-mlx5e-remove-skb-secpath-if-xfrm-state-is-not-found.patch b/SOURCES/1461-net-mlx5e-remove-skb-secpath-if-xfrm-state-is-not-found.patch new file mode 100644 index 000000000..eda39b1e4 --- /dev/null +++ b/SOURCES/1461-net-mlx5e-remove-skb-secpath-if-xfrm-state-is-not-found.patch @@ -0,0 +1,113 @@ +From 5dae68507960568d99b6cd300975915c69f9b0c6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:26:59 -0400 +Subject: [PATCH] net/mlx5e: Remove skb secpath if xfrm state is not found + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6d19c44b5c6dd72f9a357d0399604ec16a77de3c +Author: Jianbo Liu +Date: Wed Jul 23 10:44:31 2025 +0300 + + net/mlx5e: Remove skb secpath if xfrm state is not found + + Hardware returns a unique identifier for a decrypted packet's xfrm + state, this state is looked up in an xarray. However, the state might + have been freed by the time of this lookup. + + Currently, if the state is not found, only a counter is incremented. + The secpath (sp) extension on the skb is not removed, resulting in + sp->len becoming 0. + + Subsequently, functions like __xfrm_policy_check() attempt to access + fields such as xfrm_input_state(skb)->xso.type (which dereferences + sp->xvec[sp->len - 1]) without first validating sp->len. This leads to + a crash when dereferencing an invalid state pointer. + + This patch prevents the crash by explicitly removing the secpath + extension from the skb if the xfrm state is not found after hardware + decryption. This ensures downstream functions do not operate on a + zero-length secpath. + + BUG: unable to handle page fault for address: ffffffff000002c8 + #PF: supervisor read access in kernel mode + #PF: error_code(0x0000) - not-present page + PGD 282e067 P4D 282e067 PUD 0 + Oops: Oops: 0000 [#1] SMP + CPU: 12 UID: 0 PID: 0 Comm: swapper/12 Not tainted 6.15.0-rc7_for_upstream_min_debug_2025_05_27_22_44 #1 NONE + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 + RIP: 0010:__xfrm_policy_check+0x61a/0xa30 + Code: b6 77 7f 83 e6 02 74 14 4d 8b af d8 00 00 00 41 0f b6 45 05 c1 e0 03 48 98 49 01 c5 41 8b 45 00 83 e8 01 48 98 49 8b 44 c5 10 <0f> b6 80 c8 02 00 00 83 e0 0c 3c 04 0f 84 0c 02 00 00 31 ff 80 fa + RSP: 0018:ffff88885fb04918 EFLAGS: 00010297 + RAX: ffffffff00000000 RBX: 0000000000000002 RCX: 0000000000000000 + RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000 + RBP: ffffffff8311af80 R08: 0000000000000020 R09: 00000000c2eda353 + R10: ffff88812be2bbc8 R11: 000000001faab533 R12: ffff88885fb049c8 + R13: ffff88812be2bbc8 R14: 0000000000000000 R15: ffff88811896ae00 + FS: 0000000000000000(0000) GS:ffff8888dca82000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: ffffffff000002c8 CR3: 0000000243050002 CR4: 0000000000372eb0 + DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 + DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 + Call Trace: + + ? try_to_wake_up+0x108/0x4c0 + ? udp4_lib_lookup2+0xbe/0x150 + ? udp_lib_lport_inuse+0x100/0x100 + ? __udp4_lib_lookup+0x2b0/0x410 + __xfrm_policy_check2.constprop.0+0x11e/0x130 + udp_queue_rcv_one_skb+0x1d/0x530 + udp_unicast_rcv_skb+0x76/0x90 + __udp4_lib_rcv+0xa64/0xe90 + ip_protocol_deliver_rcu+0x20/0x130 + ip_local_deliver_finish+0x75/0xa0 + ip_local_deliver+0xc1/0xd0 + ? ip_protocol_deliver_rcu+0x130/0x130 + ip_sublist_rcv+0x1f9/0x240 + ? ip_rcv_finish_core+0x430/0x430 + ip_list_rcv+0xfc/0x130 + __netif_receive_skb_list_core+0x181/0x1e0 + netif_receive_skb_list_internal+0x200/0x360 + ? mlx5e_build_rx_skb+0x1bc/0xda0 [mlx5_core] + gro_receive_skb+0xfd/0x210 + mlx5e_handle_rx_cqe_mpwrq+0x141/0x280 [mlx5_core] + mlx5e_poll_rx_cq+0xcc/0x8e0 [mlx5_core] + ? mlx5e_handle_rx_dim+0x91/0xd0 [mlx5_core] + mlx5e_napi_poll+0x114/0xab0 [mlx5_core] + __napi_poll+0x25/0x170 + net_rx_action+0x32d/0x3a0 + ? mlx5_eq_comp_int+0x8d/0x280 [mlx5_core] + ? notifier_call_chain+0x33/0xa0 + handle_softirqs+0xda/0x250 + irq_exit_rcu+0x6d/0xc0 + common_interrupt+0x81/0xa0 + + + Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload") + Signed-off-by: Jianbo Liu + Reviewed-by: Dragos Tatulea + Reviewed-by: Yael Chemla + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1753256672-337784-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c +index 727fa7c18523..6056106edcc6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c +@@ -327,6 +327,10 @@ void mlx5e_ipsec_offload_handle_rx_skb(struct net_device *netdev, + if (unlikely(!sa_entry)) { + rcu_read_unlock(); + atomic64_inc(&ipsec->sw_stats.ipsec_rx_drop_sadb_miss); ++ /* Clear secpath to prevent invalid dereference ++ * in downstream XFRM policy checks. ++ */ ++ secpath_reset(skb); + return; + } + xfrm_state_hold(sa_entry->x); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1462-net-mlx5e-fix-potential-deadlock-by-deferring-rx-timeout-rec.patch b/SOURCES/1462-net-mlx5e-fix-potential-deadlock-by-deferring-rx-timeout-rec.patch new file mode 100644 index 000000000..72f64ab0e --- /dev/null +++ b/SOURCES/1462-net-mlx5e-fix-potential-deadlock-by-deferring-rx-timeout-rec.patch @@ -0,0 +1,157 @@ +From 8ebb6e8e7eda49cd1bcc7bd9aded34c22d677dcc Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:26:59 -0400 +Subject: [PATCH] net/mlx5e: Fix potential deadlock by deferring RX timeout + recovery + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Adjust the change to use rtnl_trylock(). + +commit e80d65561571db5024fbdd5ec3f5472cfc485d21 +Author: Shahar Shitrit +Date: Wed Jul 23 10:44:32 2025 +0300 + + net/mlx5e: Fix potential deadlock by deferring RX timeout recovery + + mlx5e_reporter_rx_timeout() is currently invoked synchronously + in the driver's open error flow. This causes the thread holding + priv->state_lock to attempt acquiring the devlink lock, which + can result in a circular dependency with other devlink operations. + + For example: + + - Devlink health diagnose flow: + - __devlink_nl_pre_doit() acquires the devlink lock. + - devlink_nl_health_reporter_diagnose_doit() invokes the + driver's diagnose callback. + - mlx5e_rx_reporter_diagnose() then attempts to acquire + priv->state_lock. + + - Driver open flow: + - mlx5e_open() acquires priv->state_lock. + - If an error occurs, devlink_health_reporter may be called, + attempting to acquire the devlink lock. + + To prevent this circular locking scenario, defer the RX timeout + recovery by scheduling it via a workqueue. This ensures that the + recovery work acquires locks in a consistent order: first the + devlink lock, then priv->state_lock. + + Additionally, make the recovery work acquire the netdev instance + lock to safely synchronize with the open/close channel flows, + similar to mlx5e_tx_timeout_work. Repeatedly attempt to acquire + the netdev instance lock until it is taken or the target RQ is no + longer active, as indicated by the MLX5E_STATE_CHANNELS_ACTIVE bit. + + Fixes: 32c57fb26863 ("net/mlx5e: Report and recover from rx timeout") + Signed-off-by: Shahar Shitrit + Reviewed-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1753256672-337784-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index ebe793bba8dd..5e150e083829 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -726,6 +726,7 @@ struct mlx5e_rq { + struct xsk_buff_pool *xsk_pool; + + struct work_struct recover_work; ++ struct work_struct rx_timeout_work; + + /* control */ + struct mlx5_wq_ctrl wq_ctrl; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +index e106f0696486..1b9ea72abc5a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +@@ -170,16 +170,23 @@ static int mlx5e_rx_reporter_err_rq_cqe_recover(void *ctx) + static int mlx5e_rx_reporter_timeout_recover(void *ctx) + { + struct mlx5_eq_comp *eq; ++ struct mlx5e_priv *priv; + struct mlx5e_rq *rq; + int err; + + rq = ctx; ++ priv = rq->priv; ++ ++ mutex_lock(&priv->state_lock); ++ + eq = rq->cq.mcq.eq; + + err = mlx5e_health_channel_eq_recover(rq->netdev, eq, rq->cq.ch_stats); + if (err && rq->icosq) + clear_bit(MLX5E_SQ_STATE_ENABLED, &rq->icosq->state); + ++ mutex_unlock(&priv->state_lock); ++ + return err; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 91c1d56d79f5..4cc80cda8a09 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -676,6 +676,27 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work) + mlx5e_reporter_rq_cqe_err(rq); + } + ++static void mlx5e_rq_timeout_work(struct work_struct *timeout_work) ++{ ++ struct mlx5e_rq *rq = container_of(timeout_work, ++ struct mlx5e_rq, ++ rx_timeout_work); ++ ++ /* Acquire netdev instance lock to synchronize with channel close and ++ * reopen flows. Either successfully obtain the lock, or detect that ++ * channels are closing for another reason, making this work no longer ++ * necessary. ++ */ ++ while (!rtnl_trylock()) { ++ if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq->priv->state)) ++ return; ++ msleep(20); ++ } ++ ++ mlx5e_reporter_rx_timeout(rq); ++ netdev_unlock(rq->netdev); ++} ++ + static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq) + { + rq->wqe_overflow.page = alloc_page(GFP_KERNEL); +@@ -874,6 +895,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params, + + rqp->wq.db_numa_node = node; + INIT_WORK(&rq->recover_work, mlx5e_rq_err_cqe_work); ++ INIT_WORK(&rq->rx_timeout_work, mlx5e_rq_timeout_work); + + if (params->xdp_prog) + bpf_prog_inc(params->xdp_prog); +@@ -1254,7 +1276,8 @@ int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time) + netdev_warn(rq->netdev, "Failed to get min RX wqes on Channel[%d] RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n", + rq->ix, rq->rqn, mlx5e_rqwq_get_cur_sz(rq), min_wqes); + +- mlx5e_reporter_rx_timeout(rq); ++ queue_work(rq->priv->wq, &rq->rx_timeout_work); ++ + return -ETIMEDOUT; + } + +@@ -1425,6 +1448,7 @@ void mlx5e_close_rq(struct mlx5e_rq *rq) + if (rq->dim) + cancel_work_sync(&rq->dim->work); + cancel_work_sync(&rq->recover_work); ++ cancel_work_sync(&rq->rx_timeout_work); + mlx5e_destroy_rq(rq); + mlx5e_free_rx_descs(rq); + mlx5e_free_rq(rq); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1463-net-mlx5e-support-routed-networks-during-ipsec-macs-initiali.patch b/SOURCES/1463-net-mlx5e-support-routed-networks-during-ipsec-macs-initiali.patch new file mode 100644 index 000000000..2cd01cafa --- /dev/null +++ b/SOURCES/1463-net-mlx5e-support-routed-networks-during-ipsec-macs-initiali.patch @@ -0,0 +1,158 @@ +From 6f1e0e77ee9fe96ebe49a651a1131bbf6435ada0 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:26:59 -0400 +Subject: [PATCH] net/mlx5e: Support routed networks during IPsec MACs + initialization + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 71670f766b8f4c1490e07ad4394e8e27c03b2e91 +Author: Alexandre Cassen +Date: Tue Jul 22 17:23:47 2025 +0300 + + net/mlx5e: Support routed networks during IPsec MACs initialization + + Remote IPsec tunnel endpoint may refer to a network segment that is + not directly connected to the host. In such a case, IPsec tunnel + endpoints are connected to a router and reachable via a routing path. + In IPsec packet offload mode, HW is initialized with the MAC address + of both IPsec tunnel endpoints. + + Extend the current IPsec init MACs procedure to resolve nexthop for + routed networks. Direct neighbour lookup and probe is still used + for directly connected networks and as a fallback mechanism if fib + lookup fails. + + Signed-off-by: Alexandre Cassen + Signed-off-by: Leon Romanovsky + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Link: https://patch.msgid.link/1753194228-333722-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +index 77f61cd28a79..00e77c71e201 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +@@ -36,6 +36,7 @@ + #include + #include + #include ++#include + + #include "en.h" + #include "eswitch.h" +@@ -259,9 +260,15 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, + struct mlx5_accel_esp_xfrm_attrs *attrs) + { + struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry); ++ struct mlx5e_ipsec_addr *addrs = &attrs->addrs; + struct net_device *netdev = sa_entry->dev; ++ struct xfrm_state *x = sa_entry->x; ++ struct dst_entry *rt_dst_entry; ++ struct flowi4 fl4 = {}; ++ struct flowi6 fl6 = {}; + struct neighbour *n; + u8 addr[ETH_ALEN]; ++ struct rtable *rt; + const void *pkey; + u8 *dst, *src; + +@@ -274,18 +281,89 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, + case XFRM_DEV_OFFLOAD_IN: + src = attrs->dmac; + dst = attrs->smac; +- pkey = &attrs->addrs.saddr.a4; ++ ++ switch (addrs->family) { ++ case AF_INET: ++ fl4.flowi4_proto = x->sel.proto; ++ fl4.daddr = addrs->saddr.a4; ++ fl4.saddr = addrs->daddr.a4; ++ pkey = &addrs->saddr.a4; ++ break; ++ case AF_INET6: ++ fl6.flowi6_proto = x->sel.proto; ++ memcpy(fl6.daddr.s6_addr32, addrs->saddr.a6, 16); ++ memcpy(fl6.saddr.s6_addr32, addrs->daddr.a6, 16); ++ pkey = &addrs->saddr.a6; ++ break; ++ default: ++ return; ++ } + break; + case XFRM_DEV_OFFLOAD_OUT: + src = attrs->smac; + dst = attrs->dmac; +- pkey = &attrs->addrs.daddr.a4; ++ switch (addrs->family) { ++ case AF_INET: ++ fl4.flowi4_proto = x->sel.proto; ++ fl4.daddr = addrs->daddr.a4; ++ fl4.saddr = addrs->saddr.a4; ++ pkey = &addrs->daddr.a4; ++ break; ++ case AF_INET6: ++ fl6.flowi6_proto = x->sel.proto; ++ memcpy(fl6.daddr.s6_addr32, addrs->daddr.a6, 16); ++ memcpy(fl6.saddr.s6_addr32, addrs->saddr.a6, 16); ++ pkey = &addrs->daddr.a6; ++ break; ++ default: ++ return; ++ } + break; + default: + return; + } + + ether_addr_copy(src, addr); ++ ++ /* Destination can refer to a routed network, so perform FIB lookup ++ * to resolve nexthop and get its MAC. Neighbour resolution is used as ++ * fallback. ++ */ ++ switch (addrs->family) { ++ case AF_INET: ++ rt = ip_route_output_key(dev_net(netdev), &fl4); ++ if (IS_ERR(rt)) ++ goto neigh; ++ ++ if (rt->rt_type != RTN_UNICAST) { ++ ip_rt_put(rt); ++ goto neigh; ++ } ++ rt_dst_entry = &rt->dst; ++ break; ++ case AF_INET6: ++ rt_dst_entry = ipv6_stub->ipv6_dst_lookup_flow( ++ dev_net(netdev), NULL, &fl6, NULL); ++ if (IS_ERR(rt_dst_entry)) ++ goto neigh; ++ break; ++ default: ++ return; ++ } ++ ++ n = dst_neigh_lookup(rt_dst_entry, pkey); ++ if (!n) { ++ dst_release(rt_dst_entry); ++ goto neigh; ++ } ++ ++ neigh_ha_snapshot(addr, n, netdev); ++ ether_addr_copy(dst, addr); ++ dst_release(rt_dst_entry); ++ neigh_release(n); ++ return; ++ ++neigh: + n = neigh_lookup(&arp_tbl, pkey, netdev); + if (!n) { + n = neigh_create(&arp_tbl, pkey, netdev); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1464-net-mlx5e-expose-tis-via-devlink-tx-reporter-diagnose.patch b/SOURCES/1464-net-mlx5e-expose-tis-via-devlink-tx-reporter-diagnose.patch new file mode 100644 index 000000000..bfc41cbd5 --- /dev/null +++ b/SOURCES/1464-net-mlx5e-expose-tis-via-devlink-tx-reporter-diagnose.patch @@ -0,0 +1,78 @@ +From f418c4b8d61a2d8f7bdb2a10e71e2bdba863f7be Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:26:59 -0400 +Subject: [PATCH] net/mlx5e: Expose TIS via devlink tx reporter diagnose + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5474ca2118191abe27ba089737eace66a9b51d8f +Author: Feng Liu +Date: Tue Jul 22 17:23:48 2025 +0300 + + net/mlx5e: Expose TIS via devlink tx reporter diagnose + + Underneath "TIS Config" tag expose TIS diagnostic information. + Expose the tisn of each TC under each lag port. + + $ sudo devlink health diagnose auxiliary/mlx5_core.eth.2/131072 reporter tx + ...... + TIS Config: + lag port: 0 tc: 0 tisn: 0 + lag port: 1 tc: 0 tisn: 8 + ...... + + Signed-off-by: Feng Liu + Reviewed-by: Aya Levin + Signed-off-by: Tariq Toukan + Reviewed-by: Michal Swiatkowski + Link: https://patch.msgid.link/1753194228-333722-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +index 2439495e36f8..069ab8aaac5c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +@@ -315,6 +315,30 @@ mlx5e_tx_reporter_diagnose_common_config(struct devlink_health_reporter *reporte + mlx5e_health_fmsg_named_obj_nest_end(fmsg); + } + ++static void ++mlx5e_tx_reporter_diagnose_tis_config(struct devlink_health_reporter *reporter, ++ struct devlink_fmsg *fmsg) ++{ ++ struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter); ++ u8 num_tc = mlx5e_get_dcb_num_tc(&priv->channels.params); ++ u32 tc, i, tisn; ++ ++ devlink_fmsg_arr_pair_nest_start(fmsg, "TIS Config"); ++ for (i = 0; i < mlx5e_get_num_lag_ports(priv->mdev); i++) { ++ for (tc = 0; tc < num_tc; tc++) { ++ tisn = mlx5e_profile_get_tisn(priv->mdev, priv, ++ priv->profile, i, tc); ++ ++ devlink_fmsg_obj_nest_start(fmsg); ++ devlink_fmsg_u32_pair_put(fmsg, "lag port", i); ++ devlink_fmsg_u32_pair_put(fmsg, "tc", tc); ++ devlink_fmsg_u32_pair_put(fmsg, "tisn", tisn); ++ devlink_fmsg_obj_nest_end(fmsg); ++ } ++ } ++ devlink_fmsg_arr_pair_nest_end(fmsg); ++} ++ + static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, + struct netlink_ext_ack *extack) +@@ -330,6 +354,7 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter, + goto unlock; + + mlx5e_tx_reporter_diagnose_common_config(reporter, fmsg); ++ mlx5e_tx_reporter_diagnose_tis_config(reporter, fmsg); + devlink_fmsg_arr_pair_nest_start(fmsg, "SQs"); + + for (i = 0; i < priv->channels.num; i++) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1465-net-mlx5-correctly-set-gso-segs-when-lro-is-used.patch b/SOURCES/1465-net-mlx5-correctly-set-gso-segs-when-lro-is-used.patch new file mode 100644 index 000000000..e5ebd299a --- /dev/null +++ b/SOURCES/1465-net-mlx5-correctly-set-gso-segs-when-lro-is-used.patch @@ -0,0 +1,61 @@ +From 44d0bcecf55051a134440d9552d7f9e5488eeb75 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: Correctly set gso_segs when LRO is used + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 77bf1c55b2acc7fa3734b14f4561e3d75aea1a90 +Author: Christoph Paasch +Date: Tue Jul 29 11:34:00 2025 -0700 + + net/mlx5: Correctly set gso_segs when LRO is used + + When gso_segs is left at 0, a number of assumptions will end up being + incorrect throughout the stack. + + For example, in the GRO-path, we set NAPI_GRO_CB()->count to gso_segs. + So, if a non-LRO'ed packet followed by an LRO'ed packet is being + processed in GRO, the first one will have NAPI_GRO_CB()->count set to 1 and + the next one to 0 (in dev_gro_receive()). + Since commit 531d0d32de3e + ("net/mlx5: Correctly set gso_size when LRO is used") + these packets will get merged (as their gso_size now matches). + So, we end up in gro_complete() with NAPI_GRO_CB()->count == 1 and thus + don't call inet_gro_complete(). Meaning, checksum-validation in + tcp_checksum_complete() will fail with a "hw csum failure". + + Even before the above mentioned commit, incorrect gso_segs means that other + things like TCP's accounting of incoming packets (tp->segs_in, + data_segs_in, rcv_ooopack) will be incorrect. Which means that if one + does bytes_received/data_segs_in, the result will be bigger than the + MTU. + + Fix this by initializing gso_segs correctly when LRO is used. + + Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") + Reported-by: Gal Pressman + Closes: https://lore.kernel.org/netdev/6583783f-f0fb-4fb1-a415-feec8155bc69@nvidia.com/ + Signed-off-by: Christoph Paasch + Reviewed-by: Gal Pressman + Reviewed-by: Eric Dumazet + Link: https://patch.msgid.link/20250729-mlx5_gso_segs-v1-1-b48c480c1c12@openai.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 724c5db20c19..3301d5495134 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -1573,6 +1573,7 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, + unsigned int hdrlen = mlx5e_lro_update_hdr(skb, cqe, cqe_bcnt); + + skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt - hdrlen, lro_num_seg); ++ skb_shinfo(skb)->gso_segs = lro_num_seg; + /* Subtract one since we already counted this as one + * "regular" packet in mlx5e_complete_rx_cqe() + */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1466-net-mlx5-hws-fix-bad-parameter-in-cq-creation.patch b/SOURCES/1466-net-mlx5-hws-fix-bad-parameter-in-cq-creation.patch new file mode 100644 index 000000000..5af0dbc4b --- /dev/null +++ b/SOURCES/1466-net-mlx5-hws-fix-bad-parameter-in-cq-creation.patch @@ -0,0 +1,39 @@ +From cd9308a78bfd2a2630260ab69e87f12affdcf017 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: HWS, fix bad parameter in CQ creation + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2462c1b9217246a889ec318b3894d84e4dd709c6 +Author: Yevgeny Kliteynik +Date: Sun Aug 17 23:23:17 2025 +0300 + + net/mlx5: HWS, fix bad parameter in CQ creation + + 'cqe_sz' valid value should be 0 for 64-byte CQE. + + Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling") + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250817202323.308604-2-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +index c4b22be19a9b..b0595c9b09e4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +@@ -964,7 +964,6 @@ static int hws_send_ring_open_cq(struct mlx5_core_dev *mdev, + return -ENOMEM; + + MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.uar->index); +- MLX5_SET(cqc, cqc_data, cqe_sz, queue->num_entries); + MLX5_SET(cqc, cqc_data, log_cq_size, ilog2(queue->num_entries)); + + err = hws_send_ring_alloc_cq(mdev, numa_node, queue, cqc_data, cq); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1467-net-mlx5-hws-fix-simple-rules-rehash-error-flow.patch b/SOURCES/1467-net-mlx5-hws-fix-simple-rules-rehash-error-flow.patch new file mode 100644 index 000000000..9a5c96d1a --- /dev/null +++ b/SOURCES/1467-net-mlx5-hws-fix-simple-rules-rehash-error-flow.patch @@ -0,0 +1,146 @@ +From ecf0cb0c6c0e3b29c4e807a8b13feda75d404863 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: HWS, fix simple rules rehash error flow + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 615b690612b7785ab8632f6a5a941550622e4e36 +Author: Yevgeny Kliteynik +Date: Sun Aug 17 23:23:18 2025 +0300 + + net/mlx5: HWS, fix simple rules rehash error flow + + Moving rules from matcher to matcher should not fail. + However, if it does fail due to various reasons, the error flow + should allow the kernel to continue functioning (albeit with broken + steering rules) instead of going into series of soft lock-ups or + some other problematic behaviour. + + This patch fixes the error flow for moving simple rules: + - If new rule creation fails before it was even enqeued, do not + poll for completion + - If TIMEOUT happened while moving the rule, no point trying + to poll for completions for other rules. Something is broken, + completion won't come, just abort the rehash sequence. + - If some other completion with error received, don't give up. + Continue handling rest of the rules to minimize the damage. + - Make sure that the first error code that was received will + be actually returned to the caller instead of replacing it + with the generic error code. + + All the aforementioned issues stem from the same bad error flow, + so no point fixing them one by one and leaving partially broken + code - fixing them in one patch. + + Fixes: ef94799a8741 ("net/mlx5: HWS, rework rehash loop") + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250817202323.308604-3-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 92de4b761a83..0219a49b2326 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -74,9 +74,9 @@ static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + static int + hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + { +- bool move_error = false, poll_error = false, drain_error = false; + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; + struct mlx5hws_matcher *matcher = bwc_matcher->matcher; ++ int drain_error = 0, move_error = 0, poll_error = 0; + u16 bwc_queues = mlx5hws_bwc_queues(ctx); + struct mlx5hws_rule_attr rule_attr; + struct mlx5hws_bwc_rule *bwc_rule; +@@ -99,11 +99,15 @@ hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + ret = mlx5hws_matcher_resize_rule_move(matcher, + bwc_rule->rule, + &rule_attr); +- if (unlikely(ret && !move_error)) { +- mlx5hws_err(ctx, +- "Moving BWC rule: move failed (%d), attempting to move rest of the rules\n", +- ret); +- move_error = true; ++ if (unlikely(ret)) { ++ if (!move_error) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: move failed (%d), attempting to move rest of the rules\n", ++ ret); ++ move_error = ret; ++ } ++ /* Rule wasn't queued, no need to poll */ ++ continue; + } + + pending_rules++; +@@ -111,11 +115,19 @@ hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + rule_attr.queue_id, + &pending_rules, + false); +- if (unlikely(ret && !poll_error)) { +- mlx5hws_err(ctx, +- "Moving BWC rule: poll failed (%d), attempting to move rest of the rules\n", +- ret); +- poll_error = true; ++ if (unlikely(ret)) { ++ if (ret == -ETIMEDOUT) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: timeout polling for completions (%d), aborting rehash\n", ++ ret); ++ return ret; ++ } ++ if (!poll_error) { ++ mlx5hws_err(ctx, ++ "Moving BWC rule: polling for completions failed (%d), attempting to move rest of the rules\n", ++ ret); ++ poll_error = ret; ++ } + } + } + +@@ -126,17 +138,30 @@ hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + rule_attr.queue_id, + &pending_rules, + true); +- if (unlikely(ret && !drain_error)) { +- mlx5hws_err(ctx, +- "Moving BWC rule: drain failed (%d), attempting to move rest of the rules\n", +- ret); +- drain_error = true; ++ if (unlikely(ret)) { ++ if (ret == -ETIMEDOUT) { ++ mlx5hws_err(ctx, ++ "Moving bwc rule: timeout draining completions (%d), aborting rehash\n", ++ ret); ++ return ret; ++ } ++ if (!drain_error) { ++ mlx5hws_err(ctx, ++ "Moving bwc rule: drain failed (%d), attempting to move rest of the rules\n", ++ ret); ++ drain_error = ret; ++ } + } + } + } + +- if (move_error || poll_error || drain_error) +- ret = -EINVAL; ++ /* Return the first error that happened */ ++ if (unlikely(move_error)) ++ return move_error; ++ if (unlikely(poll_error)) ++ return poll_error; ++ if (unlikely(drain_error)) ++ return drain_error; + + return ret; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1468-net-mlx5-hws-fix-complex-rules-rehash-error-flow.patch b/SOURCES/1468-net-mlx5-hws-fix-complex-rules-rehash-error-flow.patch new file mode 100644 index 000000000..80e64cb0e --- /dev/null +++ b/SOURCES/1468-net-mlx5-hws-fix-complex-rules-rehash-error-flow.patch @@ -0,0 +1,126 @@ +From 87f9d3bc475e376926bcd033f4d3779713a2f0e3 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: HWS, fix complex rules rehash error flow + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 4a842b1bf18a32ee0c25dd6dd98728b786a76fe4 +Author: Yevgeny Kliteynik +Date: Sun Aug 17 23:23:19 2025 +0300 + + net/mlx5: HWS, fix complex rules rehash error flow + + Moving rules from matcher to matcher should not fail. + However, if it does fail due to various reasons, the error flow + should allow the kernel to continue functioning (albeit with broken + steering rules) instead of going into series of soft lock-ups or + some other problematic behaviour. + + Similar to the simple rules, complex rules rehash logic suffers + from the same problems. This patch fixes the error flow for moving + complex rules: + - If new rule creation fails before it was even enqeued, do not + poll for completion + - If TIMEOUT happened while moving the rule, no point trying + to poll for completions for other rules. Something is broken, + completion won't come, just abort the rehash sequence. + - If some other completion with error received, don't give up. + Continue handling rest of the rules to minimize the damage. + - Make sure that the first error code that was received will + be actually returned to the caller instead of replacing it + with the generic error code. + + All the aforementioned issues stem from the same bad error flow, + so no point fixing them one by one and leaving partially broken + code - fixing them in one patch. + + Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers") + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250817202323.308604-4-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +index ca7501c57468..14e79579c719 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +@@ -1328,11 +1328,11 @@ mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + { + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; + struct mlx5hws_matcher *matcher = bwc_matcher->matcher; +- bool move_error = false, poll_error = false; + u16 bwc_queues = mlx5hws_bwc_queues(ctx); + struct mlx5hws_bwc_rule *tmp_bwc_rule; + struct mlx5hws_rule_attr rule_attr; + struct mlx5hws_table *isolated_tbl; ++ int move_error = 0, poll_error = 0; + struct mlx5hws_rule *tmp_rule; + struct list_head *rules_list; + u32 expected_completions = 1; +@@ -1391,11 +1391,15 @@ mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + ret = mlx5hws_matcher_resize_rule_move(matcher, + tmp_rule, + &rule_attr); +- if (unlikely(ret && !move_error)) { +- mlx5hws_err(ctx, +- "Moving complex BWC rule failed (%d), attempting to move rest of the rules\n", +- ret); +- move_error = true; ++ if (unlikely(ret)) { ++ if (!move_error) { ++ mlx5hws_err(ctx, ++ "Moving complex BWC rule: move failed (%d), attempting to move rest of the rules\n", ++ ret); ++ move_error = ret; ++ } ++ /* Rule wasn't queued, no need to poll */ ++ continue; + } + + expected_completions = 1; +@@ -1403,11 +1407,19 @@ mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + rule_attr.queue_id, + &expected_completions, + true); +- if (unlikely(ret && !poll_error)) { +- mlx5hws_err(ctx, +- "Moving complex BWC rule: poll failed (%d), attempting to move rest of the rules\n", +- ret); +- poll_error = true; ++ if (unlikely(ret)) { ++ if (ret == -ETIMEDOUT) { ++ mlx5hws_err(ctx, ++ "Moving complex BWC rule: timeout polling for completions (%d), aborting rehash\n", ++ ret); ++ return ret; ++ } ++ if (!poll_error) { ++ mlx5hws_err(ctx, ++ "Moving complex BWC rule: polling for completions failed (%d), attempting to move rest of the rules\n", ++ ret); ++ poll_error = ret; ++ } + } + + /* Done moving the rule to the new matcher, +@@ -1422,8 +1434,11 @@ mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + } + } + +- if (move_error || poll_error) +- ret = -EINVAL; ++ /* Return the first error that happened */ ++ if (unlikely(move_error)) ++ return move_error; ++ if (unlikely(poll_error)) ++ return poll_error; + + return ret; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1469-net-mlx5-hws-prevent-rehash-from-filling-up-the-queues.patch b/SOURCES/1469-net-mlx5-hws-prevent-rehash-from-filling-up-the-queues.patch new file mode 100644 index 000000000..0552e074b --- /dev/null +++ b/SOURCES/1469-net-mlx5-hws-prevent-rehash-from-filling-up-the-queues.patch @@ -0,0 +1,60 @@ +From 85b6561f38b73bc4a2ff631d7bf93e45feb6f3cf Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: HWS, prevent rehash from filling up the queues + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1a72298d27ce4d41b3fd405f6921e8711815767a +Author: Yevgeny Kliteynik +Date: Sun Aug 17 23:23:20 2025 +0300 + + net/mlx5: HWS, prevent rehash from filling up the queues + + While moving the rules during rehash, CQ is not drained. The flush + and drain happens only when all the rules of a certain queue have been + moved. This behaviour can lead to accumulating large quantity of rules + that haven't got their completion yet, and eventually will fill up + the queue and will cause the rehash to fail. + + Fix this problem by requiring drain once the number of outstanding + completions reaches a certain threshold. + + Fixes: ef94799a8741 ("net/mlx5: HWS, rework rehash loop") + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250817202323.308604-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 0219a49b2326..2a59be11fe55 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -84,6 +84,7 @@ hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + struct list_head *rules_list; + u32 pending_rules; + int i, ret = 0; ++ bool drain; + + mlx5hws_bwc_rule_fill_attr(bwc_matcher, 0, 0, &rule_attr); + +@@ -111,10 +112,12 @@ hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + } + + pending_rules++; ++ drain = pending_rules >= ++ hws_bwc_get_burst_th(ctx, rule_attr.queue_id); + ret = mlx5hws_bwc_queue_poll(ctx, + rule_attr.queue_id, + &pending_rules, +- false); ++ drain); + if (unlikely(ret)) { + if (ret == -ETIMEDOUT) { + mlx5hws_err(ctx, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1470-net-mlx5-hws-don-t-rehash-on-every-kind-of-insertion-failure.patch b/SOURCES/1470-net-mlx5-hws-don-t-rehash-on-every-kind-of-insertion-failure.patch new file mode 100644 index 000000000..26ebdab95 --- /dev/null +++ b/SOURCES/1470-net-mlx5-hws-don-t-rehash-on-every-kind-of-insertion-failure.patch @@ -0,0 +1,57 @@ +From 701cc59d96f2bba3314c909d9b5d62ff97e5341d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: HWS, don't rehash on every kind of insertion + failure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7c60952f83584bc4950057cfed2cc3c87343b5db +Author: Yevgeny Kliteynik +Date: Sun Aug 17 23:23:21 2025 +0300 + + net/mlx5: HWS, don't rehash on every kind of insertion failure + + If rule creation failed due to a full queue, due to timeout + in polling for completion, or due to matcher being in resize, + don't try to initiate rehash sequence - rehash would have + failed anyway. + + Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Vlad Dogaru + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250817202323.308604-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index 2a59be11fe55..adeccc588e5d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -1063,6 +1063,21 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule, + return 0; /* rule inserted successfully */ + } + ++ /* Rule insertion could fail due to queue being full, timeout, or ++ * matcher in resize. In such cases, no point in trying to rehash. ++ */ ++ if (ret == -EBUSY || ret == -ETIMEDOUT || ret == -EAGAIN) { ++ mutex_unlock(queue_lock); ++ mlx5hws_err(ctx, ++ "BWC rule insertion failed - %s (%d)\n", ++ ret == -EBUSY ? "queue is full" : ++ ret == -ETIMEDOUT ? "timeout" : ++ ret == -EAGAIN ? "matcher in resize" : "N/A", ++ ret); ++ hws_bwc_rule_cnt_dec(bwc_rule); ++ return ret; ++ } ++ + /* At this point the rule wasn't added. + * It could be because there was collision, or some other problem. + * Try rehash by size and insert rule again - last chance. +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1471-net-mlx5-hws-fix-table-creation-uid.patch b/SOURCES/1471-net-mlx5-hws-fix-table-creation-uid.patch new file mode 100644 index 000000000..96386ac9a --- /dev/null +++ b/SOURCES/1471-net-mlx5-hws-fix-table-creation-uid.patch @@ -0,0 +1,179 @@ +From 5ba0229ef0a40fce8e7bc1ccb7cda677afe9277d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix table creation UID + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 8a51507320ebddaab32610199774f69cd7d53e78 +Author: Alex Vesker +Date: Sun Aug 17 23:23:22 2025 +0300 + + net/mlx5: HWS, Fix table creation UID + + During table creation, caller passes a UID using ft_attr. The UID + value was ignored, which leads to problems when the caller sets the + UID to a non-zero value, such as SHARED_RESOURCE_UID (0xffff) - the + internal FT objects will be created with UID=0. + + Fixes: 0869701cba3d ("net/mlx5: HWS, added FW commands handling") + Signed-off-by: Alex Vesker + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250817202323.308604-7-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +index 9c83753e4592..0bdcab2e5cf3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +@@ -55,6 +55,7 @@ int mlx5hws_cmd_flow_table_create(struct mlx5_core_dev *mdev, + + MLX5_SET(create_flow_table_in, in, opcode, MLX5_CMD_OP_CREATE_FLOW_TABLE); + MLX5_SET(create_flow_table_in, in, table_type, ft_attr->type); ++ MLX5_SET(create_flow_table_in, in, uid, ft_attr->uid); + + ft_ctx = MLX5_ADDR_OF(create_flow_table_in, in, flow_table_context); + MLX5_SET(flow_table_context, ft_ctx, level, ft_attr->level); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h +index fa6bff210266..122ccc671628 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h +@@ -36,6 +36,7 @@ struct mlx5hws_cmd_set_fte_attr { + struct mlx5hws_cmd_ft_create_attr { + u8 type; + u8 level; ++ u16 uid; + bool rtc_valid; + bool decap_en; + bool reformat_en; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +index 57592b92e24b..131e74b2b774 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +@@ -267,6 +267,7 @@ static int mlx5_cmd_hws_create_flow_table(struct mlx5_flow_root_namespace *ns, + + tbl_attr.type = MLX5HWS_TABLE_TYPE_FDB; + tbl_attr.level = ft_attr->level; ++ tbl_attr.uid = ft_attr->uid; + tbl = mlx5hws_table_create(ctx, &tbl_attr); + if (!tbl) { + mlx5_core_err(ns->dev, "Failed creating hws flow_table\n"); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +index f3ea09caba2b..32f87fdf3213 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c +@@ -85,6 +85,7 @@ static int hws_matcher_create_end_ft_isolated(struct mlx5hws_matcher *matcher) + + ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, + tbl, ++ 0, + &matcher->end_ft_id); + if (ret) { + mlx5hws_err(tbl->ctx, "Isolated matcher: failed to create end flow table\n"); +@@ -112,7 +113,9 @@ static int hws_matcher_create_end_ft(struct mlx5hws_matcher *matcher) + if (mlx5hws_matcher_is_isolated(matcher)) + ret = hws_matcher_create_end_ft_isolated(matcher); + else +- ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, tbl, ++ ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, ++ tbl, ++ 0, + &matcher->end_ft_id); + + if (ret) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index 59c14745ed0c..2498ceff2060 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -75,6 +75,7 @@ struct mlx5hws_context_attr { + struct mlx5hws_table_attr { + enum mlx5hws_table_type type; + u32 level; ++ u16 uid; + }; + + enum mlx5hws_matcher_flow_src { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c +index 568f691733f3..6113383ae47b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c +@@ -9,6 +9,7 @@ u32 mlx5hws_table_get_id(struct mlx5hws_table *tbl) + } + + static void hws_table_init_next_ft_attr(struct mlx5hws_table *tbl, ++ u16 uid, + struct mlx5hws_cmd_ft_create_attr *ft_attr) + { + ft_attr->type = tbl->fw_ft_type; +@@ -16,7 +17,9 @@ static void hws_table_init_next_ft_attr(struct mlx5hws_table *tbl, + ft_attr->level = tbl->ctx->caps->fdb_ft.max_level - 1; + else + ft_attr->level = tbl->ctx->caps->nic_ft.max_level - 1; ++ + ft_attr->rtc_valid = true; ++ ft_attr->uid = uid; + } + + static void hws_table_set_cap_attr(struct mlx5hws_table *tbl, +@@ -119,12 +122,12 @@ static int hws_table_connect_to_default_miss_tbl(struct mlx5hws_table *tbl, u32 + + int mlx5hws_table_create_default_ft(struct mlx5_core_dev *mdev, + struct mlx5hws_table *tbl, +- u32 *ft_id) ++ u16 uid, u32 *ft_id) + { + struct mlx5hws_cmd_ft_create_attr ft_attr = {0}; + int ret; + +- hws_table_init_next_ft_attr(tbl, &ft_attr); ++ hws_table_init_next_ft_attr(tbl, uid, &ft_attr); + hws_table_set_cap_attr(tbl, &ft_attr); + + ret = mlx5hws_cmd_flow_table_create(mdev, &ft_attr, ft_id); +@@ -189,7 +192,10 @@ static int hws_table_init(struct mlx5hws_table *tbl) + } + + mutex_lock(&ctx->ctrl_lock); +- ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, tbl, &tbl->ft_id); ++ ret = mlx5hws_table_create_default_ft(tbl->ctx->mdev, ++ tbl, ++ tbl->uid, ++ &tbl->ft_id); + if (ret) { + mlx5hws_err(tbl->ctx, "Failed to create flow table object\n"); + mutex_unlock(&ctx->ctrl_lock); +@@ -239,6 +245,7 @@ struct mlx5hws_table *mlx5hws_table_create(struct mlx5hws_context *ctx, + tbl->ctx = ctx; + tbl->type = attr->type; + tbl->level = attr->level; ++ tbl->uid = attr->uid; + + ret = hws_table_init(tbl); + if (ret) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h +index 0400cce0c317..1246f9bd8422 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.h +@@ -18,6 +18,7 @@ struct mlx5hws_table { + enum mlx5hws_table_type type; + u32 fw_ft_type; + u32 level; ++ u16 uid; + struct list_head matchers_list; + struct list_head tbl_list_node; + struct mlx5hws_default_miss default_miss; +@@ -47,7 +48,7 @@ u32 mlx5hws_table_get_res_fw_ft_type(enum mlx5hws_table_type tbl_type, + + int mlx5hws_table_create_default_ft(struct mlx5_core_dev *mdev, + struct mlx5hws_table *tbl, +- u32 *ft_id); ++ u16 uid, u32 *ft_id); + + void mlx5hws_table_destroy_default_ft(struct mlx5hws_table *tbl, + u32 ft_id); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1472-net-mlx5-ct-use-the-correct-counter-offset.patch b/SOURCES/1472-net-mlx5-ct-use-the-correct-counter-offset.patch new file mode 100644 index 000000000..59cbf73ee --- /dev/null +++ b/SOURCES/1472-net-mlx5-ct-use-the-correct-counter-offset.patch @@ -0,0 +1,48 @@ +From e9c25119d53e4ed49454a852dbf0d4a0b1998973 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: CT: Use the correct counter offset + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d2d6f950cb43be6845a41cac5956cb2a10e657e5 +Author: Vlad Dogaru +Date: Sun Aug 17 23:23:23 2025 +0300 + + net/mlx5: CT: Use the correct counter offset + + Specifying the counter action is not enough, as it is used by multiple + counters that were allocated in a bulk. By omitting the offset, rules + will be associated with a different counter from the same bulk. + Subsequently, the CT subsystem checks the correct counter, assumes that + no traffic has triggered the rule, and ages out the rule. The end result + is intermittent offloading of long lived connections, as rules are aged + out then promptly re-added. + + Fix this by specifying the correct offset along with the counter rule. + + Fixes: 34eea5b12a10 ("net/mlx5e: CT: Add initial support for Hardware Steering") + Signed-off-by: Vlad Dogaru + Reviewed-by: Yevgeny Kliteynik + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250817202323.308604-8-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c +index a4263137fef5..01d522b02947 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c +@@ -173,6 +173,8 @@ static void mlx5_ct_fs_hmfs_fill_rule_actions(struct mlx5_ct_fs_hmfs *fs_hmfs, + + memset(rule_actions, 0, NUM_CT_HMFS_RULES * sizeof(*rule_actions)); + rule_actions[0].action = mlx5_fc_get_hws_action(fs_hmfs->ctx, attr->counter); ++ rule_actions[0].counter.offset = ++ attr->counter->id - attr->counter->bulk->base_id; + /* Modify header is special, it may require extra arguments outside the action itself. */ + if (mh_action->mh_data) { + rule_actions[1].modify_header.offset = mh_action->mh_data->offset; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1473-net-mlx5-base-ecvf-devlink-port-attrs-from-0.patch b/SOURCES/1473-net-mlx5-base-ecvf-devlink-port-attrs-from-0.patch new file mode 100644 index 000000000..0c78fdd9d --- /dev/null +++ b/SOURCES/1473-net-mlx5-base-ecvf-devlink-port-attrs-from-0.patch @@ -0,0 +1,49 @@ +From 4664ef9e5ba7021bb556c8e0f1f690d7538c2b11 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: Base ECVF devlink port attrs from 0 + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit bc17455bc843b2f4b206e0bb8139013eb3d3c08b +Author: Daniel Jurgens +Date: Wed Aug 20 16:32:02 2025 +0300 + + net/mlx5: Base ECVF devlink port attrs from 0 + + Adjust the vport number by the base ECVF vport number so the port + attributes start at 0. Previously the port attributes would start 1 + after the maximum number of host VFs. + + Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") + Signed-off-by: Daniel Jurgens + Reviewed-by: Parav Pandit + Reviewed-by: Saeed Mahameed + Signed-off-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250820133209.389065-2-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +index b7102e14d23d..c33accadae0f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +@@ -47,10 +47,12 @@ static void mlx5_esw_offloads_pf_vf_devlink_port_attrs_set(struct mlx5_eswitch * + devlink_port_attrs_pci_vf_set(dl_port, controller_num, pfnum, + vport_num - 1, external); + } else if (mlx5_core_is_ec_vf_vport(esw->dev, vport_num)) { ++ u16 base_vport = mlx5_core_ec_vf_vport_base(dev); ++ + memcpy(dl_port->attrs.switch_id.id, ppid.id, ppid.id_len); + dl_port->attrs.switch_id.id_len = ppid.id_len; + devlink_port_attrs_pci_vf_set(dl_port, 0, pfnum, +- vport_num - 1, false); ++ vport_num - base_vport, false); + } + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1474-net-mlx5-remove-default-qos-group-and-attach-vports-directly.patch b/SOURCES/1474-net-mlx5-remove-default-qos-group-and-attach-vports-directly.patch new file mode 100644 index 000000000..661644ba3 --- /dev/null +++ b/SOURCES/1474-net-mlx5-remove-default-qos-group-and-attach-vports-directly.patch @@ -0,0 +1,299 @@ +From 1aea4efe28aadf5ebb2abbcaa1286585a6c71a01 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:00 -0400 +Subject: [PATCH] net/mlx5: Remove default QoS group and attach vports directly + to root TSAR + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 330f0f6713a39581936decac72331e6ab7f13529 +Author: Carolina Jubran +Date: Wed Aug 20 16:32:03 2025 +0300 + + net/mlx5: Remove default QoS group and attach vports directly to root TSAR + + Currently, the driver creates a default group (`node0`) and attaches + all vports to it unless the user explicitly sets a parent group. As a + result, when a user configures tx_share on a group and tx_share on + a VF, the expectation is for the group and the VF to share bandwidth + relatively. However, since the VF is not connected to the same parent + (but to the default node), the proportional share logic is not applied + correctly. + + To fix this, remove the default group (`node0`) and instead connect + vports directly to the root TSAR when no parent is specified. This + ensures that vports and groups share the same root scheduler and their + tx_share values are compared directly under the same hierarchy. + + Fixes: 0fe132eac38c ("net/mlx5: E-switch, Allow to add vports to rate groups") + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250820133209.389065-3-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 91d863c8c152..cd58d3934596 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -462,6 +462,7 @@ static int + esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_node, + struct netlink_ext_ack *extack) + { ++ struct mlx5_esw_sched_node *parent = vport_node->parent; + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; + struct mlx5_core_dev *dev = vport_node->esw->dev; + void *attr; +@@ -477,7 +478,7 @@ esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_node, + attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); + MLX5_SET(vport_element, attr, vport_number, vport_node->vport->vport); + MLX5_SET(scheduling_context, sched_ctx, parent_element_id, +- vport_node->parent->ix); ++ parent ? parent->ix : vport_node->esw->qos.root_tsar_ix); + MLX5_SET(scheduling_context, sched_ctx, max_average_bw, + vport_node->max_rate); + +@@ -786,48 +787,15 @@ static int esw_qos_create(struct mlx5_eswitch *esw, struct netlink_ext_ack *exta + return err; + } + +- if (MLX5_CAP_QOS(dev, log_esw_max_sched_depth)) { +- esw->qos.node0 = __esw_qos_create_vports_sched_node(esw, NULL, extack); +- } else { +- /* The eswitch doesn't support scheduling nodes. +- * Create a software-only node0 using the root TSAR to attach vport QoS to. +- */ +- if (!__esw_qos_alloc_node(esw, +- esw->qos.root_tsar_ix, +- SCHED_NODE_TYPE_VPORTS_TSAR, +- NULL)) +- esw->qos.node0 = ERR_PTR(-ENOMEM); +- else +- list_add_tail(&esw->qos.node0->entry, +- &esw->qos.domain->nodes); +- } +- if (IS_ERR(esw->qos.node0)) { +- err = PTR_ERR(esw->qos.node0); +- esw_warn(dev, "E-Switch create rate node 0 failed (%d)\n", err); +- goto err_node0; +- } + refcount_set(&esw->qos.refcnt, 1); + + return 0; +- +-err_node0: +- if (mlx5_destroy_scheduling_element_cmd(esw->dev, SCHEDULING_HIERARCHY_E_SWITCH, +- esw->qos.root_tsar_ix)) +- esw_warn(esw->dev, "E-Switch destroy root TSAR failed.\n"); +- +- return err; + } + + static void esw_qos_destroy(struct mlx5_eswitch *esw) + { + int err; + +- if (esw->qos.node0->ix != esw->qos.root_tsar_ix) +- __esw_qos_destroy_node(esw->qos.node0, NULL); +- else +- __esw_qos_free_node(esw->qos.node0); +- esw->qos.node0 = NULL; +- + err = mlx5_destroy_scheduling_element_cmd(esw->dev, + SCHEDULING_HIERARCHY_E_SWITCH, + esw->qos.root_tsar_ix); +@@ -990,13 +958,16 @@ esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type type, + struct netlink_ext_ack *extack) + { + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; +- int err, new_level, max_level; ++ struct mlx5_esw_sched_node *parent = vport_node->parent; ++ int err; + + if (type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { ++ int new_level, max_level; ++ + /* Increase the parent's level by 2 to account for both the + * TC arbiter and the vports TC scheduling element. + */ +- new_level = vport_node->parent->level + 2; ++ new_level = (parent ? parent->level : 2) + 2; + max_level = 1 << MLX5_CAP_QOS(vport_node->esw->dev, + log_esw_max_sched_depth); + if (new_level > max_level) { +@@ -1033,9 +1004,7 @@ esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type type, + err_sched_nodes: + if (type == SCHED_NODE_TYPE_RATE_LIMITER) { + esw_qos_node_destroy_sched_element(vport_node, NULL); +- list_add_tail(&vport_node->entry, +- &vport_node->parent->children); +- vport_node->level = vport_node->parent->level + 1; ++ esw_qos_node_attach_to_parent(vport_node); + } else { + esw_qos_tc_arbiter_scheduling_teardown(vport_node, NULL); + } +@@ -1083,7 +1052,6 @@ static int esw_qos_set_vport_tcs_min_rate(struct mlx5_vport *vport, + static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack) + { + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; +- struct mlx5_esw_sched_node *parent = vport_node->parent; + enum sched_node_type curr_type = vport_node->type; + + if (curr_type == SCHED_NODE_TYPE_VPORT) +@@ -1093,7 +1061,7 @@ static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_a + + vport_node->bw_share = 0; + list_del_init(&vport_node->entry); +- esw_qos_normalize_min_rate(parent->esw, parent, extack); ++ esw_qos_normalize_min_rate(vport_node->esw, vport_node->parent, extack); + + trace_mlx5_esw_vport_qos_destroy(vport_node->esw->dev, vport); + } +@@ -1103,25 +1071,23 @@ static int esw_qos_vport_enable(struct mlx5_vport *vport, + struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) + { ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; + int err; + + esw_assert_qos_lock_held(vport->dev->priv.eswitch); + +- esw_qos_node_set_parent(vport->qos.sched_node, parent); +- if (type == SCHED_NODE_TYPE_VPORT) { +- err = esw_qos_vport_create_sched_element(vport->qos.sched_node, +- extack); +- } else { ++ esw_qos_node_set_parent(vport_node, parent); ++ if (type == SCHED_NODE_TYPE_VPORT) ++ err = esw_qos_vport_create_sched_element(vport_node, extack); ++ else + err = esw_qos_vport_tc_enable(vport, type, extack); +- } + if (err) + return err; + +- vport->qos.sched_node->type = type; +- esw_qos_normalize_min_rate(parent->esw, parent, extack); +- trace_mlx5_esw_vport_qos_create(vport->dev, vport, +- vport->qos.sched_node->max_rate, +- vport->qos.sched_node->bw_share); ++ vport_node->type = type; ++ esw_qos_normalize_min_rate(vport_node->esw, parent, extack); ++ trace_mlx5_esw_vport_qos_create(vport->dev, vport, vport_node->max_rate, ++ vport_node->bw_share); + + return 0; + } +@@ -1132,6 +1098,7 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t + { + struct mlx5_eswitch *esw = vport->dev->priv.eswitch; + struct mlx5_esw_sched_node *sched_node; ++ struct mlx5_eswitch *parent_esw; + int err; + + esw_assert_qos_lock_held(esw); +@@ -1139,10 +1106,12 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t + if (err) + return err; + +- parent = parent ?: esw->qos.node0; +- sched_node = __esw_qos_alloc_node(parent->esw, 0, type, parent); ++ parent_esw = parent ? parent->esw : esw; ++ sched_node = __esw_qos_alloc_node(parent_esw, 0, type, parent); + if (!sched_node) + return -ENOMEM; ++ if (!parent) ++ list_add_tail(&sched_node->entry, &esw->qos.domain->nodes); + + sched_node->max_rate = max_rate; + sched_node->min_rate = min_rate; +@@ -1168,7 +1137,7 @@ void mlx5_esw_qos_vport_disable(struct mlx5_vport *vport) + goto unlock; + + parent = vport->qos.sched_node->parent; +- WARN(parent != esw->qos.node0, "Disabling QoS on port before detaching it from node"); ++ WARN(parent, "Disabling QoS on port before detaching it from node"); + + esw_qos_vport_disable(vport, NULL); + mlx5_esw_qos_vport_qos_free(vport); +@@ -1268,7 +1237,6 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, + int err; + + esw_assert_qos_lock_held(vport->dev->priv.eswitch); +- parent = parent ?: curr_parent; + if (curr_type == type && curr_parent == parent) + return 0; + +@@ -1306,16 +1274,16 @@ static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw + + esw_assert_qos_lock_held(esw); + curr_parent = vport->qos.sched_node->parent; +- parent = parent ?: esw->qos.node0; + if (curr_parent == parent) + return 0; + + /* Set vport QoS type based on parent node type if different from + * default QoS; otherwise, use the vport's current QoS type. + */ +- if (parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ if (parent && parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) + type = SCHED_NODE_TYPE_RATE_LIMITER; +- else if (curr_parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) ++ else if (curr_parent && ++ curr_parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) + type = SCHED_NODE_TYPE_VPORT; + else + type = vport->qos.sched_node->type; +@@ -1654,9 +1622,10 @@ static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, + static bool esw_qos_vport_validate_unsupported_tc_bw(struct mlx5_vport *vport, + u32 *tc_bw) + { +- struct mlx5_eswitch *esw = vport->qos.sched_node ? +- vport->qos.sched_node->parent->esw : +- vport->dev->priv.eswitch; ++ struct mlx5_esw_sched_node *node = vport->qos.sched_node; ++ struct mlx5_eswitch *esw = vport->dev->priv.eswitch; ++ ++ esw = (node && node->parent) ? node->parent->esw : esw; + + return esw_qos_validate_unsupported_tc_bw(esw, tc_bw); + } +@@ -1763,7 +1732,7 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, + if (disable) { + if (vport_node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) + err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_VPORT, +- NULL, extack); ++ vport_node->parent, extack); + goto unlock; + } + +@@ -1775,7 +1744,7 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, + } else { + err = esw_qos_vport_update(vport, + SCHED_NODE_TYPE_TC_ARBITER_TSAR, +- NULL, extack); ++ vport_node->parent, extack); + } + if (!err) + esw_qos_set_tc_arbiter_bw_shares(vport_node, tc_bw, extack); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index b0b8ef3ec3c4..45506ad56847 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -373,11 +373,6 @@ struct mlx5_eswitch { + refcount_t refcnt; + u32 root_tsar_ix; + struct mlx5_qos_domain *domain; +- /* Contains all vports with QoS enabled but no explicit node. +- * Cannot be NULL if QoS is enabled, but may be a fake node +- * referencing the root TSAR if the esw doesn't support nodes. +- */ +- struct mlx5_esw_sched_node *node0; + } qos; + + struct mlx5_esw_bridge_offloads *br_offloads; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1475-net-mlx5e-preserve-tc-bw-during-parent-changes.patch b/SOURCES/1475-net-mlx5e-preserve-tc-bw-during-parent-changes.patch new file mode 100644 index 000000000..116514886 --- /dev/null +++ b/SOURCES/1475-net-mlx5e-preserve-tc-bw-during-parent-changes.patch @@ -0,0 +1,110 @@ +From 563c9ce2e47d6913030f6d78e8cde9303038c48e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5e: Preserve tc-bw during parent changes + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e8f973576ca5387ffd2917b8ae661d3f9acde526 +Author: Carolina Jubran +Date: Wed Aug 20 16:32:04 2025 +0300 + + net/mlx5e: Preserve tc-bw during parent changes + + When changing parent of a node/leaf with tc-bw configured, the code + saves and restores tc-bw values. However, it was reading the converted + hardware bw_share values (where 0 becomes 1) instead of the original + user values, causing incorrect tc-bw calculations after parent change. + + Store original tc-bw values in the node structure and use them directly + for save/restore operations. + + Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw") + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250820133209.389065-4-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index cd58d3934596..4ed5968f1638 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -102,6 +102,8 @@ struct mlx5_esw_sched_node { + u8 level; + /* Valid only when this node represents a traffic class. */ + u8 tc; ++ /* Valid only for a TC arbiter node or vport TC arbiter. */ ++ u32 tc_bw[DEVLINK_RATE_TCS_MAX]; + }; + + static void esw_qos_node_attach_to_parent(struct mlx5_esw_sched_node *node) +@@ -609,10 +611,7 @@ static void + esw_qos_tc_arbiter_get_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node, + u32 *tc_bw) + { +- struct mlx5_esw_sched_node *vports_tc_node; +- +- list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) +- tc_bw[vports_tc_node->tc] = vports_tc_node->bw_share; ++ memcpy(tc_bw, tc_arbiter_node->tc_bw, sizeof(tc_arbiter_node->tc_bw)); + } + + static void +@@ -629,6 +628,7 @@ esw_qos_set_tc_arbiter_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node, + u8 tc = vports_tc_node->tc; + u32 bw_share; + ++ tc_arbiter_node->tc_bw[tc] = tc_bw[tc]; + bw_share = tc_bw[tc] * fw_max_bw_share; + bw_share = esw_qos_calc_bw_share(bw_share, divider, + fw_max_bw_share); +@@ -1060,6 +1060,7 @@ static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_a + esw_qos_vport_tc_disable(vport, extack); + + vport_node->bw_share = 0; ++ memset(vport_node->tc_bw, 0, sizeof(vport_node->tc_bw)); + list_del_init(&vport_node->entry); + esw_qos_normalize_min_rate(vport_node->esw, vport_node->parent, extack); + +@@ -1231,8 +1232,9 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, + struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) + { +- struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent; +- enum sched_node_type curr_type = vport->qos.sched_node->type; ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; ++ struct mlx5_esw_sched_node *curr_parent = vport_node->parent; ++ enum sched_node_type curr_type = vport_node->type; + u32 curr_tc_bw[DEVLINK_RATE_TCS_MAX] = {0}; + int err; + +@@ -1244,10 +1246,8 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, + if (err) + return err; + +- if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) { +- esw_qos_tc_arbiter_get_bw_shares(vport->qos.sched_node, +- curr_tc_bw); +- } ++ if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) ++ esw_qos_tc_arbiter_get_bw_shares(vport_node, curr_tc_bw); + + esw_qos_vport_disable(vport, extack); + +@@ -1258,8 +1258,8 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, + } + + if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) { +- esw_qos_set_tc_arbiter_bw_shares(vport->qos.sched_node, +- curr_tc_bw, extack); ++ esw_qos_set_tc_arbiter_bw_shares(vport_node, curr_tc_bw, ++ extack); + } + + return err; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1476-net-mlx5-destroy-vport-qos-element-when-no-configuration-rem.patch b/SOURCES/1476-net-mlx5-destroy-vport-qos-element-when-no-configuration-rem.patch new file mode 100644 index 000000000..b7502c4f1 --- /dev/null +++ b/SOURCES/1476-net-mlx5-destroy-vport-qos-element-when-no-configuration-rem.patch @@ -0,0 +1,151 @@ +From b80130b7b7999d4307a31abd933abe52a149bb54 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: Destroy vport QoS element when no configuration + remains + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b697ef4d1d136948d282384e6cc3d1af469ea123 +Author: Carolina Jubran +Date: Wed Aug 20 16:32:05 2025 +0300 + + net/mlx5: Destroy vport QoS element when no configuration remains + + If a VF has been configured and the user later clears all QoS settings, + the vport element remains in the firmware QoS tree. This leads to + inconsistent behavior compared to VFs that were never configured, since + the FW assumes that unconfigured VFs are outside the QoS hierarchy. + As a result, the bandwidth share across VFs may differ, even though + none of them appear to have any configuration. + + Align the driver behavior with the FW expectation by destroying the + vport QoS element when all configurations are removed. + + Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate") + Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw") + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Mark Bloch + Reviewed-by: Przemek Kitszel + Link: https://patch.msgid.link/20250820133209.389065-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 4ed5968f1638..452a948a3e6d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -1127,6 +1127,19 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t + return err; + } + ++static void mlx5_esw_qos_vport_disable_locked(struct mlx5_vport *vport) ++{ ++ struct mlx5_eswitch *esw = vport->dev->priv.eswitch; ++ ++ esw_assert_qos_lock_held(esw); ++ if (!vport->qos.sched_node) ++ return; ++ ++ esw_qos_vport_disable(vport, NULL); ++ mlx5_esw_qos_vport_qos_free(vport); ++ esw_qos_put(esw); ++} ++ + void mlx5_esw_qos_vport_disable(struct mlx5_vport *vport) + { + struct mlx5_eswitch *esw = vport->dev->priv.eswitch; +@@ -1140,9 +1153,7 @@ void mlx5_esw_qos_vport_disable(struct mlx5_vport *vport) + parent = vport->qos.sched_node->parent; + WARN(parent, "Disabling QoS on port before detaching it from node"); + +- esw_qos_vport_disable(vport, NULL); +- mlx5_esw_qos_vport_qos_free(vport); +- esw_qos_put(esw); ++ mlx5_esw_qos_vport_disable_locked(vport); + unlock: + esw_qos_unlock(esw); + } +@@ -1642,6 +1653,21 @@ static bool esw_qos_tc_bw_disabled(u32 *tc_bw) + return true; + } + ++static void esw_vport_qos_prune_empty(struct mlx5_vport *vport) ++{ ++ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; ++ ++ esw_assert_qos_lock_held(vport->dev->priv.eswitch); ++ if (!vport_node) ++ return; ++ ++ if (vport_node->parent || vport_node->max_rate || ++ vport_node->min_rate || !esw_qos_tc_bw_disabled(vport_node->tc_bw)) ++ return; ++ ++ mlx5_esw_qos_vport_disable_locked(vport); ++} ++ + int mlx5_esw_qos_init(struct mlx5_eswitch *esw) + { + if (esw->qos.domain) +@@ -1675,6 +1701,10 @@ int mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void + + esw_qos_lock(esw); + err = mlx5_esw_qos_set_vport_min_rate(vport, tx_share, extack); ++ if (err) ++ goto out; ++ esw_vport_qos_prune_empty(vport); ++out: + esw_qos_unlock(esw); + return err; + } +@@ -1696,6 +1726,10 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void * + + esw_qos_lock(esw); + err = mlx5_esw_qos_set_vport_max_rate(vport, tx_max, extack); ++ if (err) ++ goto out; ++ esw_vport_qos_prune_empty(vport); ++out: + esw_qos_unlock(esw); + return err; + } +@@ -1733,6 +1767,7 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, + if (vport_node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) + err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_VPORT, + vport_node->parent, extack); ++ esw_vport_qos_prune_empty(vport); + goto unlock; + } + +@@ -1893,14 +1928,20 @@ int mlx5_esw_devlink_rate_leaf_parent_set(struct devlink_rate *devlink_rate, + void *priv, void *parent_priv, + struct netlink_ext_ack *extack) + { +- struct mlx5_esw_sched_node *node; ++ struct mlx5_esw_sched_node *node = parent ? parent_priv : NULL; + struct mlx5_vport *vport = priv; ++ int err; + +- if (!parent) +- return mlx5_esw_qos_vport_update_parent(vport, NULL, extack); ++ err = mlx5_esw_qos_vport_update_parent(vport, node, extack); ++ if (!err) { ++ struct mlx5_eswitch *esw = vport->dev->priv.eswitch; ++ ++ esw_qos_lock(esw); ++ esw_vport_qos_prune_empty(vport); ++ esw_qos_unlock(esw); ++ } + +- node = parent_priv; +- return mlx5_esw_qos_vport_update_parent(vport, node, extack); ++ return err; + } + + static bool esw_qos_is_node_empty(struct mlx5_esw_sched_node *node) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1477-net-mlx5-fix-qos-reference-leak-in-vport-enable-error-path.patch b/SOURCES/1477-net-mlx5-fix-qos-reference-leak-in-vport-enable-error-path.patch new file mode 100644 index 000000000..f5efa6291 --- /dev/null +++ b/SOURCES/1477-net-mlx5-fix-qos-reference-leak-in-vport-enable-error-path.patch @@ -0,0 +1,44 @@ +From 5b2ddfb962de15082b8a2f6959cb9b18e384f355 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: Fix QoS reference leak in vport enable error path + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3c114fb2afe493066df5b9e560ef37216b153c90 +Author: Carolina Jubran +Date: Wed Aug 20 16:32:06 2025 +0300 + + net/mlx5: Fix QoS reference leak in vport enable error path + + Add missing esw_qos_put() call when __esw_qos_alloc_node() fails in + mlx5_esw_qos_vport_enable(). + + Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions") + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250820133209.389065-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 452a948a3e6d..41aec07bb6c2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -1109,8 +1109,10 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t + + parent_esw = parent ? parent->esw : esw; + sched_node = __esw_qos_alloc_node(parent_esw, 0, type, parent); +- if (!sched_node) ++ if (!sched_node) { ++ esw_qos_put(esw); + return -ENOMEM; ++ } + if (!parent) + list_add_tail(&sched_node->entry, &esw->qos.domain->nodes); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1478-net-mlx5-restore-missing-scheduling-node-cleanup-on-vport-en.patch b/SOURCES/1478-net-mlx5-restore-missing-scheduling-node-cleanup-on-vport-en.patch new file mode 100644 index 000000000..74eab8a73 --- /dev/null +++ b/SOURCES/1478-net-mlx5-restore-missing-scheduling-node-cleanup-on-vport-en.patch @@ -0,0 +1,41 @@ +From 511f926b17986007f23bb5a1351e214af05d8b4c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: Restore missing scheduling node cleanup on vport + enable failure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 51b17c98e3dbb2093a81b0490050a0eaa919ebee +Author: Carolina Jubran +Date: Wed Aug 20 16:32:07 2025 +0300 + + net/mlx5: Restore missing scheduling node cleanup on vport enable failure + + Restore the __esw_qos_free_node() call removed by the offending commit. + + Fixes: 97733d1e00a0 ("net/mlx5: Add traffic class scheduling support for vport QoS") + Signed-off-by: Carolina Jubran + Reviewed-by: Tariq Toukan + Reviewed-by: Cosmin Ratiu + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250820133209.389065-7-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 41aec07bb6c2..8b4977650183 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -1122,6 +1122,7 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t + vport->qos.sched_node = sched_node; + err = esw_qos_vport_enable(vport, type, parent, extack); + if (err) { ++ __esw_qos_free_node(sched_node); + esw_qos_put(esw); + vport->qos.sched_node = NULL; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1479-net-mlx5e-query-fw-for-buffer-ownership.patch b/SOURCES/1479-net-mlx5e-query-fw-for-buffer-ownership.patch new file mode 100644 index 000000000..e218f4022 --- /dev/null +++ b/SOURCES/1479-net-mlx5e-query-fw-for-buffer-ownership.patch @@ -0,0 +1,142 @@ +From 9f88327ce89be95757619078e64831f60cb79111 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5e: Query FW for buffer ownership + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 451d2849ea66659040b59ae3cb7e50cc97404733 +Author: Alexei Lazar +Date: Wed Aug 20 16:32:08 2025 +0300 + + net/mlx5e: Query FW for buffer ownership + + The SW currently saves local buffer ownership when setting + the buffer. + This means that the SW assumes it has ownership of the buffer + after the command is set. + + If setting the buffer fails and we remain in FW ownership, + the local buffer ownership state incorrectly remains as SW-owned. + This leads to incorrect behavior in subsequent PFC commands, + causing failures. + + Instead of saving local buffer ownership in SW, + query the FW for buffer ownership when setting the buffer. + This ensures that the buffer ownership state is accurately + reflected, avoiding the issues caused by incorrect ownership + states. + + Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX") + Signed-off-by: Alexei Lazar + Reviewed-by: Shahar Shitrit + Reviewed-by: Dragos Tatulea + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h b/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h +index b59aee75de94..2c98a5299df3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h +@@ -26,7 +26,6 @@ struct mlx5e_dcbx { + u8 cap; + + /* Buffer configuration */ +- bool manual_buffer; + u32 cable_len; + u32 xoff; + u16 port_buff_cell_sz; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index 8705cffc747f..b08328fe1aa3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -362,6 +362,7 @@ static int mlx5e_dcbnl_ieee_getpfc(struct net_device *dev, + static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev, + struct ieee_pfc *pfc) + { ++ u8 buffer_ownership = MLX5_BUF_OWNERSHIP_UNKNOWN; + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5_core_dev *mdev = priv->mdev; + u32 old_cable_len = priv->dcbx.cable_len; +@@ -389,7 +390,14 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev, + + if (MLX5_BUFFER_SUPPORTED(mdev)) { + pfc_new.pfc_en = (changed & MLX5E_PORT_BUFFER_PFC) ? pfc->pfc_en : curr_pfc_en; +- if (priv->dcbx.manual_buffer) ++ ret = mlx5_query_port_buffer_ownership(mdev, ++ &buffer_ownership); ++ if (ret) ++ netdev_err(dev, ++ "%s, Failed to get buffer ownership: %d\n", ++ __func__, ret); ++ ++ if (buffer_ownership == MLX5_BUF_OWNERSHIP_SW_OWNED) + ret = mlx5e_port_manual_buffer_config(priv, changed, + dev->mtu, &pfc_new, + NULL, NULL); +@@ -982,7 +990,6 @@ static int mlx5e_dcbnl_setbuffer(struct net_device *dev, + if (!changed) + return 0; + +- priv->dcbx.manual_buffer = true; + err = mlx5e_port_manual_buffer_config(priv, changed, dev->mtu, NULL, + buffer_size, prio2buffer); + return err; +@@ -1250,7 +1257,6 @@ void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv) + priv->dcbx.cap |= DCB_CAP_DCBX_HOST; + + priv->dcbx.port_buff_cell_sz = mlx5e_query_port_buffers_cell_size(priv); +- priv->dcbx.manual_buffer = false; + priv->dcbx.cable_len = MLX5E_DEFAULT_CABLE_LEN; + + mlx5e_ets_init(priv); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index b6d53db27cd5..9d3504f5abfa 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -367,6 +367,8 @@ int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out); + int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in); + int mlx5_set_trust_state(struct mlx5_core_dev *mdev, u8 trust_state); + int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state); ++int mlx5_query_port_buffer_ownership(struct mlx5_core_dev *mdev, ++ u8 *buffer_ownership); + int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio); + int mlx5_query_dscp2prio(struct mlx5_core_dev *mdev, u8 *dscp2prio); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c +index 7b99e08a7964..aa9f2b0a77d3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c +@@ -968,6 +968,26 @@ int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state) + return err; + } + ++int mlx5_query_port_buffer_ownership(struct mlx5_core_dev *mdev, ++ u8 *buffer_ownership) ++{ ++ u32 out[MLX5_ST_SZ_DW(pfcc_reg)] = {}; ++ int err; ++ ++ if (!MLX5_CAP_PCAM_FEATURE(mdev, buffer_ownership)) { ++ *buffer_ownership = MLX5_BUF_OWNERSHIP_UNKNOWN; ++ return 0; ++ } ++ ++ err = mlx5_query_pfcc_reg(mdev, out, sizeof(out)); ++ if (err) ++ return err; ++ ++ *buffer_ownership = MLX5_GET(pfcc_reg, out, buf_ownership); ++ ++ return 0; ++} ++ + int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio) + { + int sz = MLX5_ST_SZ_BYTES(qpdpm_reg); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1480-net-mlx5e-preserve-shared-buffer-capacity-during-headroom-up.patch b/SOURCES/1480-net-mlx5e-preserve-shared-buffer-capacity-during-headroom-up.patch new file mode 100644 index 000000000..882cd2ec7 --- /dev/null +++ b/SOURCES/1480-net-mlx5e-preserve-shared-buffer-capacity-during-headroom-up.patch @@ -0,0 +1,110 @@ +From e3dabd2aa80641b16aa1f10f3a08a3312ed081e8 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5e: Preserve shared buffer capacity during headroom + updates + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 8b0587a885fdb34fd6090a3f8625cb7ac1444826 +Author: Armen Ratner +Date: Wed Aug 20 16:32:09 2025 +0300 + + net/mlx5e: Preserve shared buffer capacity during headroom updates + + When port buffer headroom changes, port_update_shared_buffer() + recalculates the shared buffer size and splits it in a 3:1 ratio + (lossy:lossless) - Currently, the calculation is: + lossless = shared / 4; + lossy = (shared / 4) * 3; + + Meaning, the calculation dropped the remainder of shared % 4 due to + integer division, unintentionally reducing the total shared buffer + by up to three cells on each update. Over time, this could shrink + the buffer below usable size. + + Fix it by changing the calculation to: + lossless = shared / 4; + lossy = shared - lossless; + + This retains all buffer cells while still approximating the + intended 3:1 split, preventing capacity loss over time. + + While at it, perform headroom calculations in units of cells rather than + in bytes for more accurate calculations avoiding extra divisions. + + Fixes: a440030d8946 ("net/mlx5e: Update shared buffer along with device buffer changes") + Signed-off-by: Armen Ratner + Signed-off-by: Maher Sanalla + Reviewed-by: Tariq Toukan + Signed-off-by: Alexei Lazar + Signed-off-by: Mark Bloch + Reviewed-by: Przemek Kitszel + Link: https://patch.msgid.link/20250820133209.389065-9-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +index 5ae787656a7c..3efa8bf1d14e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +@@ -272,8 +272,8 @@ static int port_update_shared_buffer(struct mlx5_core_dev *mdev, + /* Total shared buffer size is split in a ratio of 3:1 between + * lossy and lossless pools respectively. + */ +- lossy_epool_size = (shared_buffer_size / 4) * 3; + lossless_ipool_size = shared_buffer_size / 4; ++ lossy_epool_size = shared_buffer_size - lossless_ipool_size; + + mlx5e_port_set_sbpr(mdev, 0, MLX5_EGRESS_DIR, MLX5_LOSSY_POOL, 0, + lossy_epool_size); +@@ -288,14 +288,12 @@ static int port_set_buffer(struct mlx5e_priv *priv, + u16 port_buff_cell_sz = priv->dcbx.port_buff_cell_sz; + struct mlx5_core_dev *mdev = priv->mdev; + int sz = MLX5_ST_SZ_BYTES(pbmc_reg); +- u32 new_headroom_size = 0; +- u32 current_headroom_size; ++ u32 current_headroom_cells = 0; ++ u32 new_headroom_cells = 0; + void *in; + int err; + int i; + +- current_headroom_size = port_buffer->headroom_size; +- + in = kzalloc(sz, GFP_KERNEL); + if (!in) + return -ENOMEM; +@@ -306,12 +304,14 @@ static int port_set_buffer(struct mlx5e_priv *priv, + + for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) { + void *buffer = MLX5_ADDR_OF(pbmc_reg, in, buffer[i]); ++ current_headroom_cells += MLX5_GET(bufferx_reg, buffer, size); ++ + u64 size = port_buffer->buffer[i].size; + u64 xoff = port_buffer->buffer[i].xoff; + u64 xon = port_buffer->buffer[i].xon; + +- new_headroom_size += size; + do_div(size, port_buff_cell_sz); ++ new_headroom_cells += size; + do_div(xoff, port_buff_cell_sz); + do_div(xon, port_buff_cell_sz); + MLX5_SET(bufferx_reg, buffer, size, size); +@@ -320,10 +320,8 @@ static int port_set_buffer(struct mlx5e_priv *priv, + MLX5_SET(bufferx_reg, buffer, xon_threshold, xon); + } + +- new_headroom_size /= port_buff_cell_sz; +- current_headroom_size /= port_buff_cell_sz; +- err = port_update_shared_buffer(priv->mdev, current_headroom_size, +- new_headroom_size); ++ err = port_update_shared_buffer(priv->mdev, current_headroom_cells, ++ new_headroom_cells); + if (err) + goto out; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1481-net-mlx5-hws-fix-memory-leak-in-hws-pool-buddy-init-error-pa.patch b/SOURCES/1481-net-mlx5-hws-fix-memory-leak-in-hws-pool-buddy-init-error-pa.patch new file mode 100644 index 000000000..14b424afc --- /dev/null +++ b/SOURCES/1481-net-mlx5-hws-fix-memory-leak-in-hws-pool-buddy-init-error-pa.patch @@ -0,0 +1,43 @@ +From 223d0918d6f47f7a9bca9ccd248bfc9f9e3dfcc6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix memory leak in hws_pool_buddy_init error + path + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2c0a959bebdc1ada13cf9a8242f177c5400299e6 +Author: Lama Kayal +Date: Mon Aug 25 17:34:24 2025 +0300 + + net/mlx5: HWS, Fix memory leak in hws_pool_buddy_init error path + + In the error path of hws_pool_buddy_init(), the buddy allocator cleanup + doesn't free the allocator structure itself, causing a memory leak. + + Add the missing kfree() to properly release all allocated memory. + + Fixes: c61afff94373 ("net/mlx5: HWS, added memory management handling") + Signed-off-by: Lama Kayal + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-2-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +index 7e37d6e9eb83..7b5071c3df36 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c +@@ -124,6 +124,7 @@ static int hws_pool_buddy_init(struct mlx5hws_pool *pool) + mlx5hws_err(pool->ctx, "Failed to create resource type: %d size %zu\n", + pool->type, pool->alloc_log_sz); + mlx5hws_buddy_cleanup(buddy); ++ kfree(buddy); + return -ENOMEM; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1482-net-mlx5-hws-fix-memory-leak-in-hws-action-get-shared-stc-ni.patch b/SOURCES/1482-net-mlx5-hws-fix-memory-leak-in-hws-action-get-shared-stc-ni.patch new file mode 100644 index 000000000..eed884841 --- /dev/null +++ b/SOURCES/1482-net-mlx5-hws-fix-memory-leak-in-hws-action-get-shared-stc-ni.patch @@ -0,0 +1,45 @@ +From c1f59cf7c514ce94f11a1211050c606e42d03dca Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix memory leak in + hws_action_get_shared_stc_nic error flow + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a630f83592cdad1253523a1b760cfe78fef6cd9c +Author: Lama Kayal +Date: Mon Aug 25 17:34:25 2025 +0300 + + net/mlx5: HWS, Fix memory leak in hws_action_get_shared_stc_nic error flow + + When an invalid stc_type is provided, the function allocates memory for + shared_stc but jumps to unlock_and_out without freeing it, causing a + memory leak. + + Fix by jumping to free_shared_stc label instead to ensure proper cleanup. + + Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling") + Signed-off-by: Lama Kayal + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-3-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index 396804369b00..6b36a4a7d895 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -117,7 +117,7 @@ static int hws_action_get_shared_stc_nic(struct mlx5hws_context *ctx, + mlx5hws_err(ctx, "No such stc_type: %d\n", stc_type); + pr_warn("HWS: Invalid stc_type: %d\n", stc_type); + ret = -EINVAL; +- goto unlock_and_out; ++ goto free_shared_stc; + } + + ret = mlx5hws_action_alloc_single_stc(ctx, &stc_attr, tbl_type, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1483-net-mlx5-hws-fix-uninitialized-variables-in-mlx5hws-pat-calc.patch b/SOURCES/1483-net-mlx5-hws-fix-uninitialized-variables-in-mlx5hws-pat-calc.patch new file mode 100644 index 000000000..65275df14 --- /dev/null +++ b/SOURCES/1483-net-mlx5-hws-fix-uninitialized-variables-in-mlx5hws-pat-calc.patch @@ -0,0 +1,57 @@ +From 74dac1eedfab4ceb5c4e79d3fd50685dd9d1f812 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix uninitialized variables in + mlx5hws_pat_calc_nop error flow + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 24b6e53140475b56cadcccd4e82a93aa5bacf1eb +Author: Lama Kayal +Date: Mon Aug 25 17:34:26 2025 +0300 + + net/mlx5: HWS, Fix uninitialized variables in mlx5hws_pat_calc_nop error flow + + In mlx5hws_pat_calc_nop(), src_field and dst_field are passed to + hws_action_modify_get_target_fields() which should set their values. + However, if an invalid action type is encountered, these variables + remain uninitialized and are later used to update prev_src_field + and prev_dst_field. + + Initialize both variables to INVALID_FIELD to ensure they have + defined values in all code paths. + + Fixes: 01e035fd0380 ("net/mlx5: HWS, handle modify header actions dependency") + Signed-off-by: Lama Kayal + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-4-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +index 51e4c551e0ef..622fd579f140 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +@@ -527,7 +527,6 @@ int mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, + u32 *nop_locations, __be64 *new_pat) + { + u16 prev_src_field = INVALID_FIELD, prev_dst_field = INVALID_FIELD; +- u16 src_field, dst_field; + u8 action_type; + bool dependent; + size_t i, j; +@@ -539,6 +538,9 @@ int mlx5hws_pat_calc_nop(__be64 *pattern, size_t num_actions, + return 0; + + for (i = 0, j = 0; i < num_actions; i++, j++) { ++ u16 src_field = INVALID_FIELD; ++ u16 dst_field = INVALID_FIELD; ++ + if (j >= max_actions) + return -EINVAL; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1484-net-mlx5-hws-fix-pattern-destruction-in-mlx5hws-pat-get-patt.patch b/SOURCES/1484-net-mlx5-hws-fix-pattern-destruction-in-mlx5hws-pat-get-patt.patch new file mode 100644 index 000000000..0ccdb92cd --- /dev/null +++ b/SOURCES/1484-net-mlx5-hws-fix-pattern-destruction-in-mlx5hws-pat-get-patt.patch @@ -0,0 +1,52 @@ +From d1f05a9096aedbf63150bffeb5a0d1b64c6fef49 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: HWS, Fix pattern destruction in + mlx5hws_pat_get_pattern error path + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 00a50e4e8974cbf5d6a1dc91cfa5cce4aa7af05a +Author: Lama Kayal +Date: Mon Aug 25 17:34:27 2025 +0300 + + net/mlx5: HWS, Fix pattern destruction in mlx5hws_pat_get_pattern error path + + In mlx5hws_pat_get_pattern(), when mlx5hws_pat_add_pattern_to_cache() + fails, the function attempts to clean up the pattern created by + mlx5hws_cmd_header_modify_pattern_create(). However, it incorrectly + uses *pattern_id which hasn't been set yet, instead of the local + ptrn_id variable that contains the actual pattern ID. + + This results in attempting to destroy a pattern using uninitialized + data from the output parameter, rather than the valid pattern ID + returned by the firmware. + + Use ptrn_id instead of *pattern_id in the cleanup path to properly + destroy the created pattern. + + Fixes: aefc15a0fa1c ("net/mlx5: HWS, added modify header pattern and args handling") + Signed-off-by: Lama Kayal + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +index 622fd579f140..d56271a9e4f0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pat_arg.c +@@ -279,7 +279,7 @@ int mlx5hws_pat_get_pattern(struct mlx5hws_context *ctx, + return ret; + + clean_pattern: +- mlx5hws_cmd_header_modify_pattern_destroy(ctx->mdev, *pattern_id); ++ mlx5hws_cmd_header_modify_pattern_destroy(ctx->mdev, ptrn_id); + out_unlock: + mutex_unlock(&ctx->pattern_cache->lock); + return ret; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1485-net-mlx5-reload-auxiliary-drivers-on-fw-activate.patch b/SOURCES/1485-net-mlx5-reload-auxiliary-drivers-on-fw-activate.patch new file mode 100644 index 000000000..84a92fdae --- /dev/null +++ b/SOURCES/1485-net-mlx5-reload-auxiliary-drivers-on-fw-activate.patch @@ -0,0 +1,53 @@ +From 16da4a8dd6e2c76f90cc71ed53bbe7c5864360be Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:01 -0400 +Subject: [PATCH] net/mlx5: Reload auxiliary drivers on fw_activate + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 34cc6a54914f478c93e176450fae6313404f9f74 +Author: Moshe Shemesh +Date: Mon Aug 25 17:34:28 2025 +0300 + + net/mlx5: Reload auxiliary drivers on fw_activate + + The devlink reload fw_activate command performs firmware activation + followed by driver reload, while devlink reload driver_reinit triggers + only driver reload. However, the driver reload logic differs between the + two modes, as on driver_reinit mode mlx5 also reloads auxiliary drivers, + while in fw_activate mode the auxiliary drivers are suspended where + applicable. + + Additionally, following the cited commit, if the device has multiple PFs, + the behavior during fw_activate may vary between PFs: one PF may suspend + auxiliary drivers, while another reloads them. + + Align devlink dev reload fw_activate behavior with devlink dev reload + driver_reinit, to reload all auxiliary drivers. + + Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") + Signed-off-by: Moshe Shemesh + Reviewed-by: Tariq Toukan + Reviewed-by: Akiva Goldberger + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index 204055be51c0..b97dfb6f005f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -160,7 +160,7 @@ static int mlx5_devlink_reload_fw_activate(struct devlink *devlink, struct netli + if (err) + return err; + +- mlx5_unload_one_devl_locked(dev, true); ++ mlx5_unload_one_devl_locked(dev, false); + err = mlx5_health_wait_pci_up(dev); + if (err) + NL_SET_ERR_MSG_MOD(extack, "FW activate aborted, PCI reads fail after reset"); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1486-net-mlx5-fix-lockdep-assertion-on-sync-reset-unload-event.patch b/SOURCES/1486-net-mlx5-fix-lockdep-assertion-on-sync-reset-unload-event.patch new file mode 100644 index 000000000..2f02b6250 --- /dev/null +++ b/SOURCES/1486-net-mlx5-fix-lockdep-assertion-on-sync-reset-unload-event.patch @@ -0,0 +1,259 @@ +From 8d67e5b6048a7a5d0a98c8990787da19ebb1b140 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5: Fix lockdep assertion on sync reset unload event + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 902a8bc23a24882200f57cadc270e15a2cfaf2bb +Author: Moshe Shemesh +Date: Mon Aug 25 17:34:29 2025 +0300 + + net/mlx5: Fix lockdep assertion on sync reset unload event + + Fix lockdep assertion triggered during sync reset unload event. When the + sync reset flow is initiated using the devlink reload fw_activate + option, the PF already holds the devlink lock while handling unload + event. In this case, delegate sync reset unload event handling back to + the devlink callback process to avoid double-locking and resolve the + lockdep warning. + + Kernel log: + WARNING: CPU: 9 PID: 1578 at devl_assert_locked+0x31/0x40 + [...] + Call Trace: + + mlx5_unload_one_devl_locked+0x2c/0xc0 [mlx5_core] + mlx5_sync_reset_unload_event+0xaf/0x2f0 [mlx5_core] + process_one_work+0x222/0x640 + worker_thread+0x199/0x350 + kthread+0x10b/0x230 + ? __pfx_worker_thread+0x10/0x10 + ? __pfx_kthread+0x10/0x10 + ret_from_fork+0x8e/0x100 + ? __pfx_kthread+0x10/0x10 + ret_from_fork_asm+0x1a/0x30 + + + Fixes: 7a9770f1bfea ("net/mlx5: Handle sync reset unload event") + Signed-off-by: Moshe Shemesh + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-7-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index b97dfb6f005f..8c53fe5aa306 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -160,7 +160,7 @@ static int mlx5_devlink_reload_fw_activate(struct devlink *devlink, struct netli + if (err) + return err; + +- mlx5_unload_one_devl_locked(dev, false); ++ mlx5_sync_reset_unload_flow(dev, true); + err = mlx5_health_wait_pci_up(dev); + if (err) + NL_SET_ERR_MSG_MOD(extack, "FW activate aborted, PCI reads fail after reset"); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +index 69933addd921..38b9b184ae01 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +@@ -12,7 +12,8 @@ enum { + MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, + MLX5_FW_RESET_FLAGS_PENDING_COMP, + MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS, +- MLX5_FW_RESET_FLAGS_RELOAD_REQUIRED ++ MLX5_FW_RESET_FLAGS_RELOAD_REQUIRED, ++ MLX5_FW_RESET_FLAGS_UNLOAD_EVENT, + }; + + struct mlx5_fw_reset { +@@ -219,7 +220,7 @@ int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev) + return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL0, 0, 0, false); + } + +-static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unloaded) ++static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev) + { + struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; + struct devlink *devlink = priv_to_devlink(dev); +@@ -228,8 +229,7 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unload + if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) { + complete(&fw_reset->done); + } else { +- if (!unloaded) +- mlx5_unload_one(dev, false); ++ mlx5_sync_reset_unload_flow(dev, false); + if (mlx5_health_wait_pci_up(dev)) + mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n"); + else +@@ -272,7 +272,7 @@ static void mlx5_sync_reset_reload_work(struct work_struct *work) + + mlx5_sync_reset_clear_reset_requested(dev, false); + mlx5_enter_error_state(dev, true); +- mlx5_fw_reset_complete_reload(dev, false); ++ mlx5_fw_reset_complete_reload(dev); + } + + #define MLX5_RESET_POLL_INTERVAL (HZ / 10) +@@ -586,6 +586,65 @@ static int mlx5_sync_pci_reset(struct mlx5_core_dev *dev, u8 reset_method) + return err; + } + ++void mlx5_sync_reset_unload_flow(struct mlx5_core_dev *dev, bool locked) ++{ ++ struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; ++ unsigned long timeout; ++ int poll_freq = 20; ++ bool reset_action; ++ u8 rst_state; ++ int err; ++ ++ if (locked) ++ mlx5_unload_one_devl_locked(dev, false); ++ else ++ mlx5_unload_one(dev, false); ++ ++ if (!test_bit(MLX5_FW_RESET_FLAGS_UNLOAD_EVENT, &fw_reset->reset_flags)) ++ return; ++ ++ mlx5_set_fw_rst_ack(dev); ++ mlx5_core_warn(dev, "Sync Reset Unload done, device reset expected\n"); ++ ++ reset_action = false; ++ timeout = jiffies + msecs_to_jiffies(mlx5_tout_ms(dev, RESET_UNLOAD)); ++ do { ++ rst_state = mlx5_get_fw_rst_state(dev); ++ if (rst_state == MLX5_FW_RST_STATE_TOGGLE_REQ || ++ rst_state == MLX5_FW_RST_STATE_IDLE) { ++ reset_action = true; ++ break; ++ } ++ if (rst_state == MLX5_FW_RST_STATE_DROP_MODE) { ++ mlx5_core_info(dev, "Sync Reset Drop mode ack\n"); ++ mlx5_set_fw_rst_ack(dev); ++ poll_freq = 1000; ++ } ++ msleep(poll_freq); ++ } while (!time_after(jiffies, timeout)); ++ ++ if (!reset_action) { ++ mlx5_core_err(dev, "Got timeout waiting for sync reset action, state = %u\n", ++ rst_state); ++ fw_reset->ret = -ETIMEDOUT; ++ goto done; ++ } ++ ++ mlx5_core_warn(dev, "Sync Reset, got reset action. rst_state = %u\n", ++ rst_state); ++ if (rst_state == MLX5_FW_RST_STATE_TOGGLE_REQ) { ++ err = mlx5_sync_pci_reset(dev, fw_reset->reset_method); ++ if (err) { ++ mlx5_core_warn(dev, "mlx5_sync_pci_reset failed, err %d\n", ++ err); ++ fw_reset->ret = err; ++ } ++ } ++ ++done: ++ clear_bit(MLX5_FW_RESET_FLAGS_UNLOAD_EVENT, &fw_reset->reset_flags); ++} ++ + static void mlx5_sync_reset_now_event(struct work_struct *work) + { + struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset, +@@ -613,17 +672,13 @@ static void mlx5_sync_reset_now_event(struct work_struct *work) + mlx5_enter_error_state(dev, true); + done: + fw_reset->ret = err; +- mlx5_fw_reset_complete_reload(dev, false); ++ mlx5_fw_reset_complete_reload(dev); + } + + static void mlx5_sync_reset_unload_event(struct work_struct *work) + { + struct mlx5_fw_reset *fw_reset; + struct mlx5_core_dev *dev; +- unsigned long timeout; +- int poll_freq = 20; +- bool reset_action; +- u8 rst_state; + int err; + + fw_reset = container_of(work, struct mlx5_fw_reset, reset_unload_work); +@@ -632,6 +687,7 @@ static void mlx5_sync_reset_unload_event(struct work_struct *work) + if (mlx5_sync_reset_clear_reset_requested(dev, false)) + return; + ++ set_bit(MLX5_FW_RESET_FLAGS_UNLOAD_EVENT, &fw_reset->reset_flags); + mlx5_core_warn(dev, "Sync Reset Unload. Function is forced down.\n"); + + err = mlx5_cmd_fast_teardown_hca(dev); +@@ -640,49 +696,7 @@ static void mlx5_sync_reset_unload_event(struct work_struct *work) + else + mlx5_enter_error_state(dev, true); + +- if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) +- mlx5_unload_one_devl_locked(dev, false); +- else +- mlx5_unload_one(dev, false); +- +- mlx5_set_fw_rst_ack(dev); +- mlx5_core_warn(dev, "Sync Reset Unload done, device reset expected\n"); +- +- reset_action = false; +- timeout = jiffies + msecs_to_jiffies(mlx5_tout_ms(dev, RESET_UNLOAD)); +- do { +- rst_state = mlx5_get_fw_rst_state(dev); +- if (rst_state == MLX5_FW_RST_STATE_TOGGLE_REQ || +- rst_state == MLX5_FW_RST_STATE_IDLE) { +- reset_action = true; +- break; +- } +- if (rst_state == MLX5_FW_RST_STATE_DROP_MODE) { +- mlx5_core_info(dev, "Sync Reset Drop mode ack\n"); +- mlx5_set_fw_rst_ack(dev); +- poll_freq = 1000; +- } +- msleep(poll_freq); +- } while (!time_after(jiffies, timeout)); +- +- if (!reset_action) { +- mlx5_core_err(dev, "Got timeout waiting for sync reset action, state = %u\n", +- rst_state); +- fw_reset->ret = -ETIMEDOUT; +- goto done; +- } +- +- mlx5_core_warn(dev, "Sync Reset, got reset action. rst_state = %u\n", rst_state); +- if (rst_state == MLX5_FW_RST_STATE_TOGGLE_REQ) { +- err = mlx5_sync_pci_reset(dev, fw_reset->reset_method); +- if (err) { +- mlx5_core_warn(dev, "mlx5_sync_pci_reset failed, err %d\n", err); +- fw_reset->ret = err; +- } +- } +- +-done: +- mlx5_fw_reset_complete_reload(dev, true); ++ mlx5_fw_reset_complete_reload(dev); + } + + static void mlx5_sync_reset_abort_event(struct work_struct *work) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h +index ea527d06a85f..d5b28525c960 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h +@@ -12,6 +12,7 @@ int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel, + int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev); + + int mlx5_fw_reset_wait_reset_done(struct mlx5_core_dev *dev); ++void mlx5_sync_reset_unload_flow(struct mlx5_core_dev *dev, bool locked); + int mlx5_fw_reset_verify_fw_complete(struct mlx5_core_dev *dev, + struct netlink_ext_ack *extack); + void mlx5_fw_reset_events_start(struct mlx5_core_dev *dev); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1487-net-mlx5-nack-sync-reset-when-sfs-are-present.patch b/SOURCES/1487-net-mlx5-nack-sync-reset-when-sfs-are-present.patch new file mode 100644 index 000000000..586b2864c --- /dev/null +++ b/SOURCES/1487-net-mlx5-nack-sync-reset-when-sfs-are-present.patch @@ -0,0 +1,100 @@ +From 7c29b174a5e289e35ebc9cf72567d198522b3183 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5: Nack sync reset when SFs are present + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 26e42ec7712d392d561964514b1f253b1a96f42d +Author: Moshe Shemesh +Date: Mon Aug 25 17:34:30 2025 +0300 + + net/mlx5: Nack sync reset when SFs are present + + If PF (Physical Function) has SFs (Sub-Functions), since the SFs are not + taking part in the synchronization flow, sync reset can lead to fatal + error on the SFs, as the function will be closed unexpectedly from the + SF point of view. + + Add a check to prevent sync reset when there are SFs on a PF device + which is not ECPF, as ECPF is teardowned gracefully before reset. + + Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") + Signed-off-by: Moshe Shemesh + Reviewed-by: Parav Pandit + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-8-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +index 38b9b184ae01..22995131824a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +@@ -6,6 +6,7 @@ + #include "fw_reset.h" + #include "diag/fw_tracer.h" + #include "lib/tout.h" ++#include "sf/sf.h" + + enum { + MLX5_FW_RESET_FLAGS_RESET_REQUESTED, +@@ -428,6 +429,11 @@ static bool mlx5_is_reset_now_capable(struct mlx5_core_dev *dev, + return false; + } + ++ if (!mlx5_core_is_ecpf(dev) && !mlx5_sf_table_empty(dev)) { ++ mlx5_core_warn(dev, "SFs should be removed before reset\n"); ++ return false; ++ } ++ + #if IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) + if (reset_method != MLX5_MFRL_REG_PCI_RESET_METHOD_HOT_RESET) { + err = mlx5_check_hotplug_interrupt(dev, bridge); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +index 0864ba625c07..3304f25cc805 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +@@ -518,3 +518,13 @@ void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev) + WARN_ON(!xa_empty(&table->function_ids)); + kfree(table); + } ++ ++bool mlx5_sf_table_empty(const struct mlx5_core_dev *dev) ++{ ++ struct mlx5_sf_table *table = dev->priv.sf_table; ++ ++ if (!table) ++ return true; ++ ++ return xa_empty(&table->function_ids); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h +index 860f9ddb7107..89559a37997a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h +@@ -17,6 +17,7 @@ void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev); + + int mlx5_sf_table_init(struct mlx5_core_dev *dev); + void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev); ++bool mlx5_sf_table_empty(const struct mlx5_core_dev *dev); + + int mlx5_devlink_sf_port_new(struct devlink *devlink, + const struct devlink_port_new_attrs *add_attr, +@@ -61,6 +62,11 @@ static inline void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev) + { + } + ++static inline bool mlx5_sf_table_empty(const struct mlx5_core_dev *dev) ++{ ++ return true; ++} ++ + #endif + + #endif +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1488-net-mlx5-prevent-flow-steering-mode-changes-in-switchdev-mod.patch b/SOURCES/1488-net-mlx5-prevent-flow-steering-mode-changes-in-switchdev-mod.patch new file mode 100644 index 000000000..f77e7cb1d --- /dev/null +++ b/SOURCES/1488-net-mlx5-prevent-flow-steering-mode-changes-in-switchdev-mod.patch @@ -0,0 +1,62 @@ +From ffafacde352f98dec23fba49d1430594a39ee7f8 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5: Prevent flow steering mode changes in switchdev + mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cf9a8627b9a369ba01d37be6f71b297beb688faa +Author: Moshe Shemesh +Date: Mon Aug 25 17:34:31 2025 +0300 + + net/mlx5: Prevent flow steering mode changes in switchdev mode + + Changing flow steering modes is not allowed when eswitch is in switchdev + mode. This fix ensures that any steering mode change, including to + firmware steering, is correctly blocked while eswitch mode is switchdev. + + Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter") + Signed-off-by: Moshe Shemesh + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-9-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 0de287392c32..fafccfe9fb64 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -3734,6 +3734,13 @@ static int mlx5_fs_mode_validate(struct devlink *devlink, u32 id, + char *value = val.vstr; + u8 eswitch_mode; + ++ eswitch_mode = mlx5_eswitch_mode(dev); ++ if (eswitch_mode == MLX5_ESWITCH_OFFLOADS) { ++ NL_SET_ERR_MSG_FMT_MOD(extack, ++ "Changing fs mode is not supported when eswitch offloads enabled."); ++ return -EOPNOTSUPP; ++ } ++ + if (!strcmp(value, "dmfs")) + return 0; + +@@ -3759,14 +3766,6 @@ static int mlx5_fs_mode_validate(struct devlink *devlink, u32 id, + return -EINVAL; + } + +- eswitch_mode = mlx5_eswitch_mode(dev); +- if (eswitch_mode == MLX5_ESWITCH_OFFLOADS) { +- NL_SET_ERR_MSG_FMT_MOD(extack, +- "Moving to %s is not supported when eswitch offloads enabled.", +- value); +- return -EOPNOTSUPP; +- } +- + return 0; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1489-net-mlx5e-set-local-xoff-after-fw-update.patch b/SOURCES/1489-net-mlx5e-set-local-xoff-after-fw-update.patch new file mode 100644 index 000000000..e5a486c43 --- /dev/null +++ b/SOURCES/1489-net-mlx5e-set-local-xoff-after-fw-update.patch @@ -0,0 +1,52 @@ +From ecbfbdf8a69145aac9f6ce649ff1f0df6e1492c6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5e: Set local Xoff after FW update + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit aca0c31af61e0d5cf1675a0cbd29460b95ae693c +Author: Alexei Lazar +Date: Mon Aug 25 17:34:34 2025 +0300 + + net/mlx5e: Set local Xoff after FW update + + The local Xoff value is being set before the firmware (FW) update. + In case of a failure where the FW is not updated with the new value, + there is no fallback to the previous value. + Update the local Xoff value after the FW has been successfully set. + + Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") + Signed-off-by: Alexei Lazar + Reviewed-by: Tariq Toukan + Reviewed-by: Dragos Tatulea + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250825143435.598584-12-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +index 3efa8bf1d14e..4720523813b9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +@@ -575,7 +575,6 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, + if (err) + return err; + } +- priv->dcbx.xoff = xoff; + + /* Apply the settings */ + if (update_buffer) { +@@ -584,6 +583,8 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, + return err; + } + ++ priv->dcbx.xoff = xoff; ++ + if (update_prio2buffer) + err = mlx5e_port_set_priority2buffer(priv->mdev, prio2buffer); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1490-net-mlx5e-harden-uplink-netdev-access-against-device-unbind.patch b/SOURCES/1490-net-mlx5e-harden-uplink-netdev-access-against-device-unbind.patch new file mode 100644 index 000000000..365d19abc --- /dev/null +++ b/SOURCES/1490-net-mlx5e-harden-uplink-netdev-access-against-device-unbind.patch @@ -0,0 +1,154 @@ +From 0c57717c5a423776e7d1d71db56227cca621d0d1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5e: Harden uplink netdev access against device unbind + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6b4be64fd9fec16418f365c2d8e47a7566e9eba5 +Author: Jianbo Liu +Date: Mon Sep 15 15:24:32 2025 +0300 + + net/mlx5e: Harden uplink netdev access against device unbind + + The function mlx5_uplink_netdev_get() gets the uplink netdevice + pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can + be removed and its pointer cleared when unbound from the mlx5_core.eth + driver. This results in a NULL pointer, causing a kernel panic. + + BUG: unable to handle page fault for address: 0000000000001300 + at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core] + Call Trace: + + mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core] + esw_offloads_enable+0x593/0x910 [mlx5_core] + mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core] + mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core] + devlink_nl_eswitch_set_doit+0x60/0xd0 + genl_family_rcv_msg_doit+0xe0/0x130 + genl_rcv_msg+0x183/0x290 + netlink_rcv_skb+0x4b/0xf0 + genl_rcv+0x24/0x40 + netlink_unicast+0x255/0x380 + netlink_sendmsg+0x1f3/0x420 + __sock_sendmsg+0x38/0x60 + __sys_sendto+0x119/0x180 + do_syscall_64+0x53/0x1d0 + entry_SYSCALL_64_after_hwframe+0x4b/0x53 + + Ensure the pointer is valid before use by checking it for NULL. If it + is valid, immediately call netdev_hold() to take a reference, and + preventing the netdevice from being freed while it is in use. + + Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") + Signed-off-by: Jianbo Liu + Reviewed-by: Cosmin Ratiu + Reviewed-by: Jiri Pirko + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +index 2640cace0f76..5766be2c0153 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +@@ -1499,12 +1499,21 @@ static const struct mlx5e_profile mlx5e_uplink_rep_profile = { + static int + mlx5e_vport_uplink_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep) + { +- struct mlx5e_priv *priv = netdev_priv(mlx5_uplink_netdev_get(dev)); + struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep); ++ struct net_device *netdev; ++ struct mlx5e_priv *priv; ++ int err; ++ ++ netdev = mlx5_uplink_netdev_get(dev); ++ if (!netdev) ++ return 0; + ++ priv = netdev_priv(netdev); + rpriv->netdev = priv->netdev; +- return mlx5e_netdev_change_profile(priv, &mlx5e_uplink_rep_profile, +- rpriv); ++ err = mlx5e_netdev_change_profile(priv, &mlx5e_uplink_rep_profile, ++ rpriv); ++ mlx5_uplink_netdev_put(dev, netdev); ++ return err; + } + + static void +@@ -1631,8 +1640,16 @@ mlx5e_vport_rep_unload(struct mlx5_eswitch_rep *rep) + { + struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep); + struct net_device *netdev = rpriv->netdev; +- struct mlx5e_priv *priv = netdev_priv(netdev); +- void *ppriv = priv->ppriv; ++ struct mlx5e_priv *priv; ++ void *ppriv; ++ ++ if (!netdev) { ++ ppriv = rpriv; ++ goto free_ppriv; ++ } ++ ++ priv = netdev_priv(netdev); ++ ppriv = priv->ppriv; + + if (rep->vport == MLX5_VPORT_UPLINK) { + mlx5e_vport_uplink_rep_unload(rpriv); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 8b4977650183..5f2d6c35f1ad 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -1515,6 +1515,7 @@ static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *mdev) + speed = lksettings.base.speed; + + out: ++ mlx5_uplink_netdev_put(mdev, slave); + return speed; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h +index b111ccd03b02..74ea5da58b7e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h +@@ -47,7 +47,20 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev, u32 *cr_data); + + static inline struct net_device *mlx5_uplink_netdev_get(struct mlx5_core_dev *mdev) + { +- return mdev->mlx5e_res.uplink_netdev; ++ struct mlx5e_resources *mlx5e_res = &mdev->mlx5e_res; ++ struct net_device *netdev; ++ ++ mutex_lock(&mlx5e_res->uplink_netdev_lock); ++ netdev = mlx5e_res->uplink_netdev; ++ netdev_hold(netdev, &mlx5e_res->tracker, GFP_KERNEL); ++ mutex_unlock(&mlx5e_res->uplink_netdev_lock); ++ return netdev; ++} ++ ++static inline void mlx5_uplink_netdev_put(struct mlx5_core_dev *mdev, ++ struct net_device *netdev) ++{ ++ netdev_put(netdev, &mdev->mlx5e_res.tracker); + } + + struct mlx5_sd; +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 8c5fbfb85749..10fe492e1fed 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -663,6 +663,7 @@ struct mlx5e_resources { + bool tisn_valid; + } hw_objs; + struct net_device *uplink_netdev; ++ netdevice_tracker tracker; + struct mutex uplink_netdev_lock; + struct mlx5_crypto_dek_priv *dek_priv; + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1491-net-mlx5e-add-a-miss-level-for-ipsec-crypto-offload.patch b/SOURCES/1491-net-mlx5e-add-a-miss-level-for-ipsec-crypto-offload.patch new file mode 100644 index 000000000..590c50689 --- /dev/null +++ b/SOURCES/1491-net-mlx5e-add-a-miss-level-for-ipsec-crypto-offload.patch @@ -0,0 +1,103 @@ +From b7ac00cd85d83a5a3b439546e01e6a1f7fd35a2b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5e: Add a miss level for ipsec crypto offload + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7601a0a46216f4ba05adff2de75923b4e8e585c2 +Author: Lama Kayal +Date: Mon Sep 15 15:24:34 2025 +0300 + + net/mlx5e: Add a miss level for ipsec crypto offload + + The cited commit adds a miss table for switchdev mode. But it + uses the same level as policy table. Will hit the following error + when running command: + + # ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \ + esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \ + 0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \ + mode tunnel offload dev enp8s0f0 dir in + Error: mlx5_core: Device failed to offload this state. + + The dmesg error is: + + mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22 + + Fix it by adding a new miss level to avoid the error. + + Fixes: 7d9e292ecd67 ("net/mlx5e: Move IPSec policy check after decryption") + Signed-off-by: Jianbo Liu + Signed-off-by: Chris Mi + Signed-off-by: Lama Kayal + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +index 9560fcba643f..ac65e3191480 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +@@ -92,6 +92,7 @@ enum { + MLX5E_ACCEL_FS_ESP_FT_LEVEL = MLX5E_INNER_TTC_FT_LEVEL + 1, + MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL, + MLX5E_ACCEL_FS_POL_FT_LEVEL, ++ MLX5E_ACCEL_FS_POL_MISS_FT_LEVEL, + MLX5E_ACCEL_FS_ESP_FT_ROCE_LEVEL, + #endif + }; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +index ffcd0cdeb775..23703f28386a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +@@ -185,6 +185,7 @@ struct mlx5e_ipsec_rx_create_attr { + u32 family; + int prio; + int pol_level; ++ int pol_miss_level; + int sa_level; + int status_level; + enum mlx5_flow_namespace_type chains_ns; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index 98b6a3a623f9..65dc3529283b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -747,6 +747,7 @@ static void ipsec_rx_create_attr_set(struct mlx5e_ipsec *ipsec, + attr->family = family; + attr->prio = MLX5E_NIC_PRIO; + attr->pol_level = MLX5E_ACCEL_FS_POL_FT_LEVEL; ++ attr->pol_miss_level = MLX5E_ACCEL_FS_POL_MISS_FT_LEVEL; + attr->sa_level = MLX5E_ACCEL_FS_ESP_FT_LEVEL; + attr->status_level = MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL; + attr->chains_ns = MLX5_FLOW_NAMESPACE_KERNEL; +@@ -833,7 +834,7 @@ static int ipsec_rx_chains_create_miss(struct mlx5e_ipsec *ipsec, + + ft_attr.max_fte = 1; + ft_attr.autogroup.max_num_groups = 1; +- ft_attr.level = attr->pol_level; ++ ft_attr.level = attr->pol_miss_level; + ft_attr.prio = attr->prio; + + ft = mlx5_create_auto_grouped_flow_table(attr->ns, &ft_attr); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index fafccfe9fb64..80245c38dbad 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -114,9 +114,9 @@ + #define ETHTOOL_NUM_PRIOS 11 + #define ETHTOOL_MIN_LEVEL (KERNEL_MIN_LEVEL + ETHTOOL_NUM_PRIOS) + /* Vlan, mac, ttc, inner ttc, {UDP/ANY/aRFS/accel/{esp, esp_err}}, IPsec policy, +- * {IPsec RoCE MPV,Alias table},IPsec RoCE policy ++ * IPsec policy miss, {IPsec RoCE MPV,Alias table},IPsec RoCE policy + */ +-#define KERNEL_NIC_PRIO_NUM_LEVELS 10 ++#define KERNEL_NIC_PRIO_NUM_LEVELS 11 + #define KERNEL_NIC_NUM_PRIOS 1 + /* One more level for tc, and one more for promisc */ + #define KERNEL_MIN_LEVEL (KERNEL_NIC_PRIO_NUM_LEVELS + 2) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1492-net-mlx5-hws-ignore-flow-level-for-multi-dest-table.patch b/SOURCES/1492-net-mlx5-hws-ignore-flow-level-for-multi-dest-table.patch new file mode 100644 index 000000000..705061a59 --- /dev/null +++ b/SOURCES/1492-net-mlx5-hws-ignore-flow-level-for-multi-dest-table.patch @@ -0,0 +1,119 @@ +From 51c27a17e77ddff742da7ca6798930dae1987ff4 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5: HWS, ignore flow level for multi-dest table + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit efb877cf27e300e47e1c051f4e8fd80fc42325d5 +Author: Yevgeny Kliteynik +Date: Mon Sep 22 10:11:33 2025 +0300 + + net/mlx5: HWS, ignore flow level for multi-dest table + + When HWS creates multi-dest FW table and adds rules to + forward to other tables, ignore the flow level enforcement + in FW, because HWS is responsible for table levels. + + This fixes the following error: + + mlx5_core 0000:08:00.0: mlx5_cmd_out_err:818:(pid 192306): + SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, + status bad parameter(0x3), syndrome (0x6ae84c), err(-22) + + Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling") + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Moshe Shemesh + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758525094-816583-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +index 6b36a4a7d895..fe56b59e24c5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c +@@ -1360,7 +1360,7 @@ mlx5hws_action_create_modify_header(struct mlx5hws_context *ctx, + struct mlx5hws_action * + mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, size_t num_dest, + struct mlx5hws_action_dest_attr *dests, +- bool ignore_flow_level, u32 flags) ++ u32 flags) + { + struct mlx5hws_cmd_set_fte_dest *dest_list = NULL; + struct mlx5hws_cmd_ft_create_attr ft_attr = {0}; +@@ -1397,7 +1397,7 @@ mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, size_t num_dest, + MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; + dest_list[i].destination_id = dests[i].dest->dest_obj.obj_id; + fte_attr.action_flags |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; +- fte_attr.ignore_flow_level = ignore_flow_level; ++ fte_attr.ignore_flow_level = 1; + if (dests[i].is_wire_ft) + last_dest_idx = i; + break; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +index 131e74b2b774..6a4c4cccd643 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c +@@ -572,12 +572,12 @@ static void mlx5_fs_put_dest_action_sampler(struct mlx5_fs_hws_context *fs_ctx, + static struct mlx5hws_action * + mlx5_fs_create_action_dest_array(struct mlx5hws_context *ctx, + struct mlx5hws_action_dest_attr *dests, +- u32 num_of_dests, bool ignore_flow_level) ++ u32 num_of_dests) + { + u32 flags = MLX5HWS_ACTION_FLAG_HWS_FDB | MLX5HWS_ACTION_FLAG_SHARED; + + return mlx5hws_action_create_dest_array(ctx, num_of_dests, dests, +- ignore_flow_level, flags); ++ flags); + } + + static struct mlx5hws_action * +@@ -1014,19 +1014,14 @@ static int mlx5_fs_fte_get_hws_actions(struct mlx5_flow_root_namespace *ns, + } + (*ractions)[num_actions++].action = dest_actions->dest; + } else if (num_dest_actions > 1) { +- bool ignore_flow_level; +- + if (num_actions == MLX5_FLOW_CONTEXT_ACTION_MAX || + num_fs_actions == MLX5_FLOW_CONTEXT_ACTION_MAX) { + err = -EOPNOTSUPP; + goto free_actions; + } +- ignore_flow_level = +- !!(fte_action->flags & FLOW_ACT_IGNORE_FLOW_LEVEL); + tmp_action = + mlx5_fs_create_action_dest_array(ctx, dest_actions, +- num_dest_actions, +- ignore_flow_level); ++ num_dest_actions); + if (!tmp_action) { + err = -EOPNOTSUPP; + goto free_actions; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +index 2498ceff2060..1ad7a50d938b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h +@@ -735,7 +735,6 @@ mlx5hws_action_create_push_vlan(struct mlx5hws_context *ctx, u32 flags); + * @num_dest: The number of dests attributes. + * @dests: The destination array. Each contains a destination action and can + * have additional actions. +- * @ignore_flow_level: Whether to turn on 'ignore_flow_level' for this dest. + * @flags: Action creation flags (enum mlx5hws_action_flags). + * + * Return: pointer to mlx5hws_action on success NULL otherwise. +@@ -743,7 +742,7 @@ mlx5hws_action_create_push_vlan(struct mlx5hws_context *ctx, u32 flags); + struct mlx5hws_action * + mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, size_t num_dest, + struct mlx5hws_action_dest_attr *dests, +- bool ignore_flow_level, u32 flags); ++ u32 flags); + + /** + * mlx5hws_action_create_insert_header - Create insert header action. +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1493-net-mlx5e-fix-missing-fec-rs-stats-for-rs-544-514-interleave.patch b/SOURCES/1493-net-mlx5e-fix-missing-fec-rs-stats-for-rs-544-514-interleave.patch new file mode 100644 index 000000000..ed71aa7e6 --- /dev/null +++ b/SOURCES/1493-net-mlx5e-fix-missing-fec-rs-stats-for-rs-544-514-interleave.patch @@ -0,0 +1,43 @@ +From a462cd4d181450388f7a92e030eb445a9e26d143 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:27:02 -0400 +Subject: [PATCH] net/mlx5e: Fix missing FEC RS stats for + RS_544_514_INTERLEAVED_QUAD + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6d0477d0d067a53c1d48d0aff1fd52e151721871 +Author: Carolina Jubran +Date: Mon Sep 22 10:11:34 2025 +0300 + + net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD + + Include MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD in the FEC RS stats + handling. This addresses a gap introduced when adding support for + 200G/lane link modes. + + Fixes: 4e343c11efbb ("net/mlx5e: Support FEC settings for 200G per lane link modes") + Signed-off-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Reviewed-by: Yael Chemla + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758525094-816583-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +index 87536f158d07..c6185ddba04b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +@@ -1466,6 +1466,7 @@ static void fec_set_block_stats(struct mlx5e_priv *priv, + case MLX5E_FEC_RS_528_514: + case MLX5E_FEC_RS_544_514: + case MLX5E_FEC_LLRS_272_257_1: ++ case MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD: + fec_set_rs_stats(fec_stats, out); + return; + case MLX5E_FEC_FIRECODE: +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1494-rdma-mlx5-support-driver-apis-pre-destroy-cq-and-post-destro.patch b/SOURCES/1494-rdma-mlx5-support-driver-apis-pre-destroy-cq-and-post-destro.patch new file mode 100644 index 000000000..d5634e7db --- /dev/null +++ b/SOURCES/1494-rdma-mlx5-support-driver-apis-pre-destroy-cq-and-post-destro.patch @@ -0,0 +1,148 @@ +From 7bb84b07914794814787055dc7b0ac6ae57a3e6c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:08 -0400 +Subject: [PATCH] RDMA/mlx5: Support driver APIs pre_destroy_cq and + post_destroy_cq + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b5eeb8365d196c95dbb0fd0a5b5a69a44832f16f +Author: Mark Zhang +Date: Mon Jun 16 13:42:38 2025 +0300 + + RDMA/mlx5: Support driver APIs pre_destroy_cq and post_destroy_cq + + - pre_destroy_cq: Destroy FW CQ object so that no new CQ event would + be generated; + - post_destroy_cq: Release all resources. + + This patch, along with last one, fixes the crash below. + + Unable to handle kernel paging request at virtual address ffff8000114b1180 + Mem abort info: + ESR = 0x96000047 + EC = 0x25: DABT (current EL), IL = 32 bits + SET = 0, FnV = 0 + EA = 0, S1PTW = 0 + Data abort info: + ISV = 0, ISS = 0x00000047 + CM = 0, WnR = 1 + swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000f4582000 + [ffff8000114b1180] pgd=00000447fffff003, p4d=00000447fffff003, pud=00000447ffffe003, pmd=00000447ffffb003, pte=0000000000000000 + Internal error: Oops: 96000047 [#1] SMP + Modules linked in: udp_diag uio_pci_generic uio tcp_diag inet_diag binfmt_misc sn_core_odd(OE) rpcrdma(OE) xprtrdma(OE) ib_isert(OE) ib_iser(OE) ib_srpt(OE) ib_srp(OE) ib_ipoib(OE) kpatch_9658536(OK) kpatch_9322385(OK) kpatch_8843421(OK) kpatch_8636216(OK) vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce sm4_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt sg acpi_ipmi ipmi_si ipmi_msghandler m1_uncore_ddrss_pmu m1_uncore_cmn_pmu team_yosemite9rc6(OE) vnic(OE) ip_tables mlx5_ib(OE) sd_mod ast mlx5_core(OE) i2c_algo_bit drm_vram_helper psample drm_kms_helper mlxdevm(OE) auxiliary(OE) mlxfw(OE) syscopyarea sysfillrect tls sysimgblt fb_sys_fops drm_ttm_helper nvme ttm nvme_core drm t10_pi i2c_designware_platform i2c_designware_core i2c_core ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE) [last unloaded: ipmi_devintf] + CPU: 83 PID: 59375 Comm: kworker/u253:1 Kdump: loaded Tainted: G OE K 5.10.84-004.ali5000.alios7.aarch64 #1 + Hardware name: Inspur AliServer-Xuanwu2.0AM-02-2UM1P-5B/AS1221MG1, BIOS 1.2.M1.AL.P.158.00 08/31/2023 + Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] + pstate: 82c00089 (Nzcv daIf +PAN +UAO +TCO BTYPE=--) + pc : native_queued_spin_lock_slowpath+0x1c4/0x31c + lr : mlx5_ib_poll_cq+0x18c/0x2f8 [mlx5_ib] + sp : ffff80002be1bc80 + x29: ffff80002be1bc80 x28: ffff000810e69000 + x27: ffff000810e69000 x26: ffff000810e69200 + x25: 0000000000000000 x24: ffff8000117db000 + x23: ffff04000156b780 x22: 0000000000000000 + x21: ffff04000ce6c160 x20: ffff0008196a4000 + x19: 0000000000000010 x18: 0000000000000020 + x17: 0000000000000000 x16: 0000000000000000 + x15: ffff040055a364e8 x14: ffffffffffffffff + x13: ffff80002318bda8 x12: ffff0400358836e8 + x11: 0000000000000040 x10: 0000000000000eb0 + x9 : 0000000000000000 x8 : 0000000000000000 + x7 : ffff04477fa20140 x6 : ffff8000114b1140 + x5 : ffff04477fa20140 x4 : ffff8000114b1180 + x3 : ffff000810e69200 x2 : ffff8000114b1180 + x1 : 0000000001500000 x0 : ffff04477fa20148 + Call trace: + native_queued_spin_lock_slowpath+0x1c4/0x31c + __ib_process_cq+0x74/0x1b8 [ib_core] + ib_cq_poll_work+0x34/0xa0 [ib_core] + process_one_work+0x1d8/0x4b0 + worker_thread+0x180/0x440 + kthread+0x114/0x120 + Code: 910020e0 8b0400c4 f862d929 aa0403e2 (f8296847) + ---[ end trace 387be2290557729c ]--- + Kernel panic - not syncing: Oops: Fatal exception + SMP: stopping secondary CPUs + Kernel Offset: disabled + CPU features: 0x9850817,7a60aa38 + Memory Limit: none + Starting crashdump kernel... + Bye! + + Signed-off-by: Mark Zhang + Link: https://patch.msgid.link/aaf0072f350d1c7e8731f43b79e11a560bafb9e0.1750070205.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c +index 3657ab8f848c..78cd3151f2ed 100644 +--- a/drivers/infiniband/hw/mlx5/cq.c ++++ b/drivers/infiniband/hw/mlx5/cq.c +@@ -1052,20 +1052,31 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, + return err; + } + +-int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata) ++int mlx5_ib_pre_destroy_cq(struct ib_cq *cq) + { + struct mlx5_ib_dev *dev = to_mdev(cq->device); + struct mlx5_ib_cq *mcq = to_mcq(cq); ++ ++ return mlx5_core_destroy_cq(dev->mdev, &mcq->mcq); ++} ++ ++void mlx5_ib_post_destroy_cq(struct ib_cq *cq) ++{ ++ destroy_cq_kernel(to_mdev(cq->device), to_mcq(cq)); ++} ++ ++int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata) ++{ + int ret; + +- ret = mlx5_core_destroy_cq(dev->mdev, &mcq->mcq); ++ ret = mlx5_ib_pre_destroy_cq(cq); + if (ret) + return ret; + + if (udata) +- destroy_cq_user(mcq, udata); ++ destroy_cq_user(to_mcq(cq), udata); + else +- destroy_cq_kernel(dev, mcq); ++ mlx5_ib_post_destroy_cq(cq); + return 0; + } + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index c03743daeaa8..bdd82b371ba0 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -4190,7 +4190,9 @@ static const struct ib_device_ops mlx5_ib_dev_ops = { + .modify_port = mlx5_ib_modify_port, + .modify_qp = mlx5_ib_modify_qp, + .modify_srq = mlx5_ib_modify_srq, ++ .pre_destroy_cq = mlx5_ib_pre_destroy_cq, + .poll_cq = mlx5_ib_poll_cq, ++ .post_destroy_cq = mlx5_ib_post_destroy_cq, + .post_recv = mlx5_ib_post_recv_nodrain, + .post_send = mlx5_ib_post_send_nodrain, + .post_srq_recv = mlx5_ib_post_srq_recv, +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 54ca6e010bd4..66cec408df2c 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -1368,6 +1368,8 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, + struct uverbs_attr_bundle *attrs); + int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata); + int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); ++int mlx5_ib_pre_destroy_cq(struct ib_cq *cq); ++void mlx5_ib_post_destroy_cq(struct ib_cq *cq); + int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags); + int mlx5_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); + int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1495-rdma-mlx5-add-multiple-priorities-support-to-rdma-transport-.patch b/SOURCES/1495-rdma-mlx5-add-multiple-priorities-support-to-rdma-transport-.patch new file mode 100644 index 000000000..5be779070 --- /dev/null +++ b/SOURCES/1495-rdma-mlx5-add-multiple-priorities-support-to-rdma-transport-.patch @@ -0,0 +1,148 @@ +From 2a9f1cd9f95b3225269fbcc0b684b17d422103c1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:08 -0400 +Subject: [PATCH] RDMA/mlx5: Add multiple priorities support to RDMA TRANSPORT + userspace tables + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 40852c890119ebf39a741f50db13ae941f230d05 +Author: Patrisious Haddad +Date: Tue Jun 17 11:19:16 2025 +0300 + + RDMA/mlx5: Add multiple priorities support to RDMA TRANSPORT userspace tables + + Support the creation of RDMA TRANSPORT tables over multiple priorities + via matcher creation. + + Signed-off-by: Patrisious Haddad + Reviewed-by: Mark Bloch + Link: https://patch.msgid.link/bb38e50ae4504e979c6568d41939402a4cf15635.1750148083.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c +index 680627f1de33..ebcc05f766e1 100644 +--- a/drivers/infiniband/hw/mlx5/fs.c ++++ b/drivers/infiniband/hw/mlx5/fs.c +@@ -1966,7 +1966,8 @@ _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, + break; + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX: + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX: +- if (ib_port == 0 || user_priority > MLX5_RDMA_TRANSPORT_BYPASS_PRIO) ++ if (ib_port == 0 || ++ user_priority >= MLX5_RDMA_TRANSPORT_BYPASS_PRIO) + return ERR_PTR(-EINVAL); + ret = mlx5_ib_fill_transport_ns_info(dev, ns_type, &flags, + &vport_idx, &vport, +@@ -2016,10 +2017,10 @@ _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, + prio = &dev->flow_db->rdma_tx[priority]; + break; + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX: +- prio = &dev->flow_db->rdma_transport_rx[ib_port - 1]; ++ prio = &dev->flow_db->rdma_transport_rx[priority][ib_port - 1]; + break; + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX: +- prio = &dev->flow_db->rdma_transport_tx[ib_port - 1]; ++ prio = &dev->flow_db->rdma_transport_tx[priority][ib_port - 1]; + break; + default: return ERR_PTR(-EINVAL); + } +@@ -3466,31 +3467,40 @@ static const struct ib_device_ops flow_ops = { + + int mlx5_ib_fs_init(struct mlx5_ib_dev *dev) + { ++ int i, j; ++ + dev->flow_db = kzalloc(sizeof(*dev->flow_db), GFP_KERNEL); + + if (!dev->flow_db) + return -ENOMEM; + +- dev->flow_db->rdma_transport_rx = kcalloc(dev->num_ports, +- sizeof(struct mlx5_ib_flow_prio), +- GFP_KERNEL); +- if (!dev->flow_db->rdma_transport_rx) +- goto free_flow_db; ++ for (i = 0; i < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; i++) { ++ dev->flow_db->rdma_transport_rx[i] = ++ kcalloc(dev->num_ports, ++ sizeof(struct mlx5_ib_flow_prio), GFP_KERNEL); ++ if (!dev->flow_db->rdma_transport_rx[i]) ++ goto free_rdma_transport_rx; ++ } + +- dev->flow_db->rdma_transport_tx = kcalloc(dev->num_ports, +- sizeof(struct mlx5_ib_flow_prio), +- GFP_KERNEL); +- if (!dev->flow_db->rdma_transport_tx) +- goto free_rdma_transport_rx; ++ for (j = 0; j < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; j++) { ++ dev->flow_db->rdma_transport_tx[j] = ++ kcalloc(dev->num_ports, ++ sizeof(struct mlx5_ib_flow_prio), GFP_KERNEL); ++ if (!dev->flow_db->rdma_transport_tx[j]) ++ goto free_rdma_transport_tx; ++ } + + mutex_init(&dev->flow_db->lock); + + ib_set_device_ops(&dev->ib_dev, &flow_ops); + return 0; + ++free_rdma_transport_tx: ++ while (j--) ++ kfree(dev->flow_db->rdma_transport_tx[j]); + free_rdma_transport_rx: +- kfree(dev->flow_db->rdma_transport_rx); +-free_flow_db: ++ while (i--) ++ kfree(dev->flow_db->rdma_transport_rx[i]); + kfree(dev->flow_db); + return -ENOMEM; + } +diff --git a/drivers/infiniband/hw/mlx5/fs.h b/drivers/infiniband/hw/mlx5/fs.h +index 2ebe86e5be10..7abba0e2837c 100644 +--- a/drivers/infiniband/hw/mlx5/fs.h ++++ b/drivers/infiniband/hw/mlx5/fs.h +@@ -13,6 +13,8 @@ void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev); + + static inline void mlx5_ib_fs_cleanup(struct mlx5_ib_dev *dev) + { ++ int i; ++ + /* When a steering anchor is created, a special flow table is also + * created for the user to reference. Since the user can reference it, + * the kernel cannot trust that when the user destroys the steering +@@ -25,8 +27,10 @@ static inline void mlx5_ib_fs_cleanup(struct mlx5_ib_dev *dev) + * is a safe assumption that all references are gone. + */ + mlx5_ib_fs_cleanup_anchor(dev); +- kfree(dev->flow_db->rdma_transport_tx); +- kfree(dev->flow_db->rdma_transport_rx); ++ for (i = 0; i < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; i++) ++ kfree(dev->flow_db->rdma_transport_tx[i]); ++ for (i = 0; i < MLX5_RDMA_TRANSPORT_BYPASS_PRIO; i++) ++ kfree(dev->flow_db->rdma_transport_rx[i]); + kfree(dev->flow_db); + } + #endif /* _MLX5_IB_FS_H */ +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 66cec408df2c..37f902860210 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -316,8 +316,8 @@ struct mlx5_ib_flow_db { + struct mlx5_ib_flow_prio rdma_tx[MLX5_IB_NUM_FLOW_FT]; + struct mlx5_ib_flow_prio opfcs[MLX5_IB_OPCOUNTER_MAX]; + struct mlx5_flow_table *lag_demux_ft; +- struct mlx5_ib_flow_prio *rdma_transport_rx; +- struct mlx5_ib_flow_prio *rdma_transport_tx; ++ struct mlx5_ib_flow_prio *rdma_transport_rx[MLX5_RDMA_TRANSPORT_BYPASS_PRIO]; ++ struct mlx5_ib_flow_prio *rdma_transport_tx[MLX5_RDMA_TRANSPORT_BYPASS_PRIO]; + /* Protect flow steering bypass flow tables + * when add/del flow rules. + * only single add/removal of flow steering rule could be done +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1496-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-flow-creat.patch b/SOURCES/1496-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-flow-creat.patch new file mode 100644 index 000000000..c6b77205d --- /dev/null +++ b/SOURCES/1496-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-flow-creat.patch @@ -0,0 +1,47 @@ +From 677a231096d3c49d2b9a19995ba2cec1db49ba01 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:08 -0400 +Subject: [PATCH] RDMA/mlx5: Check CAP_NET_RAW in user namespace for flow + create + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 95a89ec304c38f7447cdbf271f2d1cbad4c3bf81 +Author: Parav Pandit +Date: Thu Jun 26 21:58:05 2025 +0300 + + RDMA/mlx5: Check CAP_NET_RAW in user namespace for flow create + + Currently, the capability check is done in the default + init_user_ns user namespace. When a process runs in a + non default user namespace, such check fails. Due to this + when a process is running using Podman, it fails to create + the flow. + + Since the RDMA device is a resource within a network namespace, + use the network namespace associated with the RDMA device to + determine its owning user namespace. + + Fixes: 322694412400 ("IB/mlx5: Introduce driver create and destroy flow methods") + Signed-off-by: Parav Pandit + Link: https://patch.msgid.link/a4dcd5e3ac6904ef50b19e56942ca6ab0728794c.1750963874.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c +index ebcc05f766e1..58e058c067d3 100644 +--- a/drivers/infiniband/hw/mlx5/fs.c ++++ b/drivers/infiniband/hw/mlx5/fs.c +@@ -2459,7 +2459,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_CREATE_FLOW)( + struct mlx5_ib_dev *dev; + u32 flags; + +- if (!capable(CAP_NET_RAW)) ++ if (!rdma_uattrs_has_raw_cap(attrs)) + return -EPERM; + + fs_matcher = uverbs_attr_get_obj(attrs, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1497-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-anchor-cre.patch b/SOURCES/1497-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-anchor-cre.patch new file mode 100644 index 000000000..e222c2f43 --- /dev/null +++ b/SOURCES/1497-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-anchor-cre.patch @@ -0,0 +1,47 @@ +From 45706e874cdf6c8294b8830c9069356170a15b38 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:08 -0400 +Subject: [PATCH] RDMA/mlx5: Check CAP_NET_RAW in user namespace for anchor + create + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 14957e8125e767bfd40a3ac61b1d6b8e62ee0a98 +Author: Parav Pandit +Date: Thu Jun 26 21:58:06 2025 +0300 + + RDMA/mlx5: Check CAP_NET_RAW in user namespace for anchor create + + Currently, the capability check is done in the default + init_user_ns user namespace. When a process runs in a + non default user namespace, such check fails. Due to this + when a process is running using Podman, it fails to create + the anchor. + + Since the RDMA device is a resource within a network namespace, + use the network namespace associated with the RDMA device to + determine its owning user namespace. + + Fixes: 0c6ab0ca9a66 ("RDMA/mlx5: Expose steering anchor to userspace") + Signed-off-by: Parav Pandit + Link: https://patch.msgid.link/c2376ca75e7658e2cbd1f619cf28fbe98c906419.1750963874.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c +index 58e058c067d3..bab2f58240c9 100644 +--- a/drivers/infiniband/hw/mlx5/fs.c ++++ b/drivers/infiniband/hw/mlx5/fs.c +@@ -2990,7 +2990,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_STEERING_ANCHOR_CREATE)( + u32 ft_id; + int err; + +- if (!capable(CAP_NET_RAW)) ++ if (!rdma_dev_has_raw_cap(&dev->ib_dev)) + return -EPERM; + + err = uverbs_get_const(&ib_uapi_ft_type, attrs, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1498-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-devx-creat.patch b/SOURCES/1498-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-devx-creat.patch new file mode 100644 index 000000000..3247b4877 --- /dev/null +++ b/SOURCES/1498-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-devx-creat.patch @@ -0,0 +1,47 @@ +From 2fd1c308495b0ae84f02a16b4d4b58df99920114 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:08 -0400 +Subject: [PATCH] RDMA/mlx5: Check CAP_NET_RAW in user namespace for devx + create + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit bd82467f17e0940c6f6a5396278cda586c9cb6fd +Author: Parav Pandit +Date: Thu Jun 26 21:58:10 2025 +0300 + + RDMA/mlx5: Check CAP_NET_RAW in user namespace for devx create + + Currently, the capability check is done in the default + init_user_ns user namespace. When a process runs in a + non default user namespace, such check fails. Due to this + when a process is running using Podman, it fails to create + the devx object. + + Since the RDMA device is a resource within a network namespace, + use the network namespace associated with the RDMA device to + determine its owning user namespace. + + Fixes: a8b92ca1b0e5 ("IB/mlx5: Introduce DEVX") + Signed-off-by: Parav Pandit + Link: https://patch.msgid.link/36ee87e92defd81410c6a2b33f9d6c0d6dcfd64c.1750963874.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c +index fae11535b1a0..f96c46b93bec 100644 +--- a/drivers/infiniband/hw/mlx5/devx.c ++++ b/drivers/infiniband/hw/mlx5/devx.c +@@ -159,7 +159,7 @@ int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user, u64 req_ucaps) + uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx); + if (is_user && + (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RAW_TX) && +- capable(CAP_NET_RAW)) ++ rdma_dev_has_raw_cap(&dev->ib_dev)) + cap |= MLX5_UCTX_CAP_RAW_TX; + if (is_user && + (MLX5_CAP_GEN(dev->mdev, uctx_cap) & +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1499-rdma-mlx5-align-mkc-page-size-capability-check-to-prm.patch b/SOURCES/1499-rdma-mlx5-align-mkc-page-size-capability-check-to-prm.patch new file mode 100644 index 000000000..f3ed387f3 --- /dev/null +++ b/SOURCES/1499-rdma-mlx5-align-mkc-page-size-capability-check-to-prm.patch @@ -0,0 +1,142 @@ +From 0eddc9480134b5ba251ae340546845445fbfe6e5 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: Align mkc page size capability check to PRM +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fcfb03597b7d7737aac6bdfda1f7b5d152cfed73 +Author: Michael Guralnik +Date: Wed Jul 9 09:42:10 2025 +0300 + + RDMA/mlx5: Align mkc page size capability check to PRM + + Align the capabilities checked when using the log_page_size 6th bit in the + mkey context to the PRM definition. The upper and lower bounds are set by + max/min caps, and modification of the 6th bit by UMR is allowed only when + a specific UMR cap is set. + Current implementation falsely assumes all page sizes up-to 2^63 are + supported when the UMR cap is set. In case the upper bound cap is lower + than 63, this might result a FW syndrome on mkey creation, e.g: + mlx5_core 0000:c1:00.0: mlx5_cmd_out_err:832:(pid 0): CREATE_MKEY(0×200) op_mod(0×0) failed, status bad parameter(0×3), syndrome (0×38a711), err(-22) + + Previous cap enforcement is still correct for all current HW, FW and + driver combinations. However, this patch aligns the code to be PRM + compliant in the general case. + + Signed-off-by: Michael Guralnik + Link: https://patch.msgid.link/eab4eeb4785105a4bb5eb362dc0b3662cd840412.1751979184.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 37f902860210..6095d8c58ff6 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -1748,18 +1748,59 @@ static inline u32 smi_to_native_portnum(struct mlx5_ib_dev *dev, u32 port) + return (port - 1) / dev->num_ports + 1; + } + ++static inline unsigned int get_max_log_entity_size_cap(struct mlx5_ib_dev *dev, ++ int access_mode) ++{ ++ int max_log_size = 0; ++ ++ if (access_mode == MLX5_MKC_ACCESS_MODE_MTT) ++ max_log_size = ++ MLX5_CAP_GEN_2(dev->mdev, max_mkey_log_entity_size_mtt); ++ else if (access_mode == MLX5_MKC_ACCESS_MODE_KSM) ++ max_log_size = MLX5_CAP_GEN_2( ++ dev->mdev, max_mkey_log_entity_size_fixed_buffer); ++ ++ if (!max_log_size || ++ (max_log_size > 31 && ++ !MLX5_CAP_GEN_2(dev->mdev, umr_log_entity_size_5))) ++ max_log_size = 31; ++ ++ return max_log_size; ++} ++ ++static inline unsigned int get_min_log_entity_size_cap(struct mlx5_ib_dev *dev, ++ int access_mode) ++{ ++ int min_log_size = 0; ++ ++ if (access_mode == MLX5_MKC_ACCESS_MODE_KSM && ++ MLX5_CAP_GEN_2(dev->mdev, ++ min_mkey_log_entity_size_fixed_buffer_valid)) ++ min_log_size = MLX5_CAP_GEN_2( ++ dev->mdev, min_mkey_log_entity_size_fixed_buffer); ++ else ++ min_log_size = ++ MLX5_CAP_GEN_2(dev->mdev, log_min_mkey_entity_size); ++ ++ min_log_size = max(min_log_size, MLX5_ADAPTER_PAGE_SHIFT); ++ return min_log_size; ++} ++ + /* + * For mkc users, instead of a page_offset the command has a start_iova which + * specifies both the page_offset and the on-the-wire IOVA + */ + static __always_inline unsigned long + mlx5_umem_mkc_find_best_pgsz(struct mlx5_ib_dev *dev, struct ib_umem *umem, +- u64 iova) ++ u64 iova, int access_mode) + { +- int page_size_bits = +- MLX5_CAP_GEN_2(dev->mdev, umr_log_entity_size_5) ? 6 : 5; +- unsigned long bitmap = +- __mlx5_log_page_size_to_bitmap(page_size_bits, 0); ++ unsigned int max_log_entity_size_cap, min_log_entity_size_cap; ++ unsigned long bitmap; ++ ++ max_log_entity_size_cap = get_max_log_entity_size_cap(dev, access_mode); ++ min_log_entity_size_cap = get_min_log_entity_size_cap(dev, access_mode); ++ ++ bitmap = GENMASK_ULL(max_log_entity_size_cap, min_log_entity_size_cap); + + return ib_umem_find_best_pgsz(umem, bitmap, iova); + } +diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c +index 5a7b234bdfd9..0a31f8c248e4 100644 +--- a/drivers/infiniband/hw/mlx5/mr.c ++++ b/drivers/infiniband/hw/mlx5/mr.c +@@ -1130,7 +1130,8 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd, + if (umem->is_dmabuf) + page_size = mlx5_umem_dmabuf_default_pgsz(umem, iova); + else +- page_size = mlx5_umem_mkc_find_best_pgsz(dev, umem, iova); ++ page_size = mlx5_umem_mkc_find_best_pgsz(dev, umem, iova, ++ access_mode); + if (WARN_ON(!page_size)) + return ERR_PTR(-EINVAL); + +@@ -1435,8 +1436,8 @@ static struct ib_mr *create_real_mr(struct ib_pd *pd, struct ib_umem *umem, + mr = alloc_cacheable_mr(pd, umem, iova, access_flags, + MLX5_MKC_ACCESS_MODE_MTT); + } else { +- unsigned long page_size = +- mlx5_umem_mkc_find_best_pgsz(dev, umem, iova); ++ unsigned long page_size = mlx5_umem_mkc_find_best_pgsz( ++ dev, umem, iova, MLX5_MKC_ACCESS_MODE_MTT); + + mutex_lock(&dev->slow_path_mutex); + mr = reg_create(pd, umem, iova, access_flags, page_size, +@@ -1756,7 +1757,8 @@ static bool can_use_umr_rereg_pas(struct mlx5_ib_mr *mr, + if (!mlx5r_umr_can_load_pas(dev, new_umem->length)) + return false; + +- *page_size = mlx5_umem_mkc_find_best_pgsz(dev, new_umem, iova); ++ *page_size = mlx5_umem_mkc_find_best_pgsz( ++ dev, new_umem, iova, mr->mmkey.cache_ent->rb_key.access_mode); + if (WARN_ON(!*page_size)) + return false; + return (mr->mmkey.cache_ent->rb_key.ndescs) >= +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1500-rdma-mlx5-optimize-dmabuf-mkey-page-size.patch b/SOURCES/1500-rdma-mlx5-optimize-dmabuf-mkey-page-size.patch new file mode 100644 index 000000000..6b84ad0a0 --- /dev/null +++ b/SOURCES/1500-rdma-mlx5-optimize-dmabuf-mkey-page-size.patch @@ -0,0 +1,571 @@ +From 7b4607e2c48935209521c40fd141d9f44f49c7a1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: Optimize DMABUF mkey page size + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e73242aa14d2ec7f4a1a13688366bb36dc0fe5b7 +Author: Edward Srouji +Date: Wed Jul 9 09:42:11 2025 +0300 + + RDMA/mlx5: Optimize DMABUF mkey page size + + The current implementation of DMABUF memory registration uses a fixed + page size for the memory key (mkey), which can lead to suboptimal + performance when the underlying memory layout may offer better page + size. + + The optimization improves performance by reducing the number of page + table entries required for the mkey, leading to less MTT/KSM descriptors + that the HCA must go through to find translations, fewer cache-lines, + and shorter UMR work requests on mkey updates such as when + re-registering or reusing a cacheable mkey. + + To ensure safe page size updates, the implementation uses a 5-step + process: + 1. Make the first X entries non-present, while X is calculated to be + minimal according to a large page shift that can be used to cover the + MR length. + 2. Update the page size to the large supported page size + 3. Load the remaining N-X entries according to the (optimized) + page shift + 4. Update the page size according to the (optimized) page shift + 5. Load the first X entries with the correct translations + + This ensures that at no point is the MR accessible with a partially + updated translation table, maintaining correctness and preventing + access to stale or inconsistent mappings, such as having an mkey + advertising the new page size while some of the underlying page table + entries still contain the old page size translations. + + Signed-off-by: Edward Srouji + Reviewed-by: Michael Guralnik + Link: https://patch.msgid.link/bc05a6b2142c02f96a90635f9a4458ee4bbbf39f.1751979184.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 6095d8c58ff6..326fb13484de 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -100,19 +100,6 @@ unsigned long __mlx5_umem_find_best_quantized_pgoff( + __mlx5_bit_sz(typ, page_offset_fld), 0, scale, \ + page_offset_quantized) + +-static inline unsigned long +-mlx5_umem_dmabuf_find_best_pgsz(struct ib_umem_dmabuf *umem_dmabuf) +-{ +- /* +- * mkeys used for dmabuf are fixed at PAGE_SIZE because we must be able +- * to hold any sgl after a move operation. Ideally the mkc page size +- * could be changed at runtime to be optimal, but right now the driver +- * cannot do that. +- */ +- return ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE, +- umem_dmabuf->umem.iova); +-} +- + enum { + MLX5_IB_MMAP_OFFSET_START = 9, + MLX5_IB_MMAP_OFFSET_END = 255, +@@ -348,6 +335,7 @@ struct mlx5_ib_flow_db { + #define MLX5_IB_UPD_XLT_ACCESS BIT(5) + #define MLX5_IB_UPD_XLT_INDIRECT BIT(6) + #define MLX5_IB_UPD_XLT_DOWNGRADE BIT(7) ++#define MLX5_IB_UPD_XLT_KEEP_PGSZ BIT(8) + + /* Private QP creation flags to be passed in ib_qp_init_attr.create_flags. + * +@@ -735,6 +723,8 @@ struct mlx5_ib_mr { + struct mlx5_ib_mr *dd_crossed_mr; + struct list_head dd_node; + u8 revoked :1; ++ /* Indicates previous dmabuf page fault occurred */ ++ u8 dmabuf_faulted:1; + struct mlx5_ib_mkey null_mmkey; + }; + }; +@@ -1805,4 +1795,14 @@ mlx5_umem_mkc_find_best_pgsz(struct mlx5_ib_dev *dev, struct ib_umem *umem, + return ib_umem_find_best_pgsz(umem, bitmap, iova); + } + ++static inline unsigned long ++mlx5_umem_dmabuf_find_best_pgsz(struct ib_umem_dmabuf *umem_dmabuf, ++ int access_mode) ++{ ++ return mlx5_umem_mkc_find_best_pgsz(to_mdev(umem_dmabuf->umem.ibdev), ++ &umem_dmabuf->umem, ++ umem_dmabuf->umem.iova, ++ access_mode); ++} ++ + #endif /* MLX5_IB_H */ +diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c +index f6abd64f07f7..1c63cc0b9409 100644 +--- a/drivers/infiniband/hw/mlx5/odp.c ++++ b/drivers/infiniband/hw/mlx5/odp.c +@@ -836,9 +836,13 @@ static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, size_t bcnt, + u32 *bytes_mapped, u32 flags) + { + struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem); ++ int access_mode = mr->data_direct ? MLX5_MKC_ACCESS_MODE_KSM : ++ MLX5_MKC_ACCESS_MODE_MTT; ++ unsigned int old_page_shift = mr->page_shift; ++ unsigned int page_shift; ++ unsigned long page_size; + u32 xlt_flags = 0; + int err; +- unsigned long page_size; + + if (flags & MLX5_PF_FLAGS_ENABLE) + xlt_flags |= MLX5_IB_UPD_XLT_ENABLE; +@@ -850,20 +854,33 @@ static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, size_t bcnt, + return err; + } + +- page_size = mlx5_umem_dmabuf_find_best_pgsz(umem_dmabuf); ++ page_size = mlx5_umem_dmabuf_find_best_pgsz(umem_dmabuf, access_mode); + if (!page_size) { + ib_umem_dmabuf_unmap_pages(umem_dmabuf); + err = -EINVAL; + } else { +- if (mr->data_direct) +- err = mlx5r_umr_update_data_direct_ksm_pas(mr, xlt_flags); +- else +- err = mlx5r_umr_update_mr_pas(mr, xlt_flags); ++ page_shift = order_base_2(page_size); ++ if (page_shift != mr->page_shift && mr->dmabuf_faulted) { ++ err = mlx5r_umr_dmabuf_update_pgsz(mr, xlt_flags, ++ page_shift); ++ } else { ++ mr->page_shift = page_shift; ++ if (mr->data_direct) ++ err = mlx5r_umr_update_data_direct_ksm_pas( ++ mr, xlt_flags); ++ else ++ err = mlx5r_umr_update_mr_pas(mr, ++ xlt_flags); ++ } + } + dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); + +- if (err) ++ if (err) { ++ mr->page_shift = old_page_shift; + return err; ++ } ++ ++ mr->dmabuf_faulted = 1; + + if (bytes_mapped) + *bytes_mapped += bcnt; +diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c +index 25601dea9e30..b097d8839cad 100644 +--- a/drivers/infiniband/hw/mlx5/umr.c ++++ b/drivers/infiniband/hw/mlx5/umr.c +@@ -659,6 +659,8 @@ static void mlx5r_umr_final_update_xlt(struct mlx5_ib_dev *dev, + wqe->ctrl_seg.mkey_mask |= get_umr_update_translation_mask(dev); + if (!mr->ibmr.length) + MLX5_SET(mkc, &wqe->mkey_seg, length64, 1); ++ if (flags & MLX5_IB_UPD_XLT_KEEP_PGSZ) ++ wqe->ctrl_seg.mkey_mask &= ~MLX5_MKEY_MASK_PAGE_SIZE; + } + + wqe->ctrl_seg.xlt_octowords = +@@ -666,46 +668,78 @@ static void mlx5r_umr_final_update_xlt(struct mlx5_ib_dev *dev, + wqe->data_seg.byte_count = cpu_to_be32(sg->length); + } + ++static void ++_mlx5r_umr_init_wqe(struct mlx5_ib_mr *mr, struct mlx5r_umr_wqe *wqe, ++ struct ib_sge *sg, unsigned int flags, ++ unsigned int page_shift, bool dd) ++{ ++ struct mlx5_ib_dev *dev = mr_to_mdev(mr); ++ ++ mlx5r_umr_set_update_xlt_ctrl_seg(&wqe->ctrl_seg, flags, sg); ++ mlx5r_umr_set_update_xlt_mkey_seg(dev, &wqe->mkey_seg, mr, page_shift); ++ if (dd) /* Use the data direct internal kernel PD */ ++ MLX5_SET(mkc, &wqe->mkey_seg, pd, dev->ddr.pdn); ++ mlx5r_umr_set_update_xlt_data_seg(&wqe->data_seg, sg); ++} ++ + static int +-_mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags, bool dd) ++_mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags, bool dd, ++ size_t start_block, size_t nblocks) + { + size_t ent_size = dd ? sizeof(struct mlx5_ksm) : sizeof(struct mlx5_mtt); + struct mlx5_ib_dev *dev = mr_to_mdev(mr); + struct device *ddev = &dev->mdev->pdev->dev; + struct mlx5r_umr_wqe wqe = {}; ++ size_t processed_blocks = 0; + struct ib_block_iter biter; ++ size_t cur_block_idx = 0; + struct mlx5_ksm *cur_ksm; + struct mlx5_mtt *cur_mtt; + size_t orig_sg_length; ++ size_t total_blocks; + size_t final_size; + void *curr_entry; + struct ib_sge sg; + void *entry; +- u64 offset = 0; ++ u64 offset; + int err = 0; + +- entry = mlx5r_umr_create_xlt(dev, &sg, +- ib_umem_num_dma_blocks(mr->umem, 1 << mr->page_shift), +- ent_size, flags); ++ total_blocks = ib_umem_num_dma_blocks(mr->umem, 1UL << mr->page_shift); ++ if (start_block > total_blocks) ++ return -EINVAL; ++ ++ /* nblocks 0 means update all blocks starting from start_block */ ++ if (nblocks) ++ total_blocks = nblocks; ++ ++ entry = mlx5r_umr_create_xlt(dev, &sg, total_blocks, ent_size, flags); + if (!entry) + return -ENOMEM; + + orig_sg_length = sg.length; +- mlx5r_umr_set_update_xlt_ctrl_seg(&wqe.ctrl_seg, flags, &sg); +- mlx5r_umr_set_update_xlt_mkey_seg(dev, &wqe.mkey_seg, mr, +- mr->page_shift); +- if (dd) { +- /* Use the data direct internal kernel PD */ +- MLX5_SET(mkc, &wqe.mkey_seg, pd, dev->ddr.pdn); ++ ++ _mlx5r_umr_init_wqe(mr, &wqe, &sg, flags, mr->page_shift, dd); ++ ++ /* Set initial translation offset to start_block */ ++ offset = (u64)start_block * ent_size; ++ mlx5r_umr_update_offset(&wqe.ctrl_seg, offset); ++ ++ if (dd) + cur_ksm = entry; +- } else { ++ else + cur_mtt = entry; +- } +- +- mlx5r_umr_set_update_xlt_data_seg(&wqe.data_seg, &sg); + + curr_entry = entry; ++ + rdma_umem_for_each_dma_block(mr->umem, &biter, BIT(mr->page_shift)) { ++ if (cur_block_idx < start_block) { ++ cur_block_idx++; ++ continue; ++ } ++ ++ if (nblocks && processed_blocks >= nblocks) ++ break; ++ + if (curr_entry == entry + sg.length) { + dma_sync_single_for_device(ddev, sg.addr, sg.length, + DMA_TO_DEVICE); +@@ -727,6 +761,11 @@ _mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags, bool dd) + if (dd) { + cur_ksm->va = cpu_to_be64(rdma_block_iter_dma_address(&biter)); + cur_ksm->key = cpu_to_be32(dev->ddr.mkey); ++ if (mr->umem->is_dmabuf && ++ (flags & MLX5_IB_UPD_XLT_ZAP)) { ++ cur_ksm->va = 0; ++ cur_ksm->key = 0; ++ } + cur_ksm++; + curr_entry = cur_ksm; + } else { +@@ -738,6 +777,8 @@ _mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags, bool dd) + cur_mtt++; + curr_entry = cur_mtt; + } ++ ++ processed_blocks++; + } + + final_size = curr_entry - entry; +@@ -754,13 +795,32 @@ _mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags, bool dd) + return err; + } + +-int mlx5r_umr_update_data_direct_ksm_pas(struct mlx5_ib_mr *mr, unsigned int flags) ++int mlx5r_umr_update_data_direct_ksm_pas_range(struct mlx5_ib_mr *mr, ++ unsigned int flags, ++ size_t start_block, ++ size_t nblocks) + { + /* No invalidation flow is expected */ +- if (WARN_ON(!mr->umem->is_dmabuf) || (flags & MLX5_IB_UPD_XLT_ZAP)) ++ if (WARN_ON(!mr->umem->is_dmabuf) || ((flags & MLX5_IB_UPD_XLT_ZAP) && ++ !(flags & MLX5_IB_UPD_XLT_KEEP_PGSZ))) + return -EINVAL; + +- return _mlx5r_umr_update_mr_pas(mr, flags, true); ++ return _mlx5r_umr_update_mr_pas(mr, flags, true, start_block, nblocks); ++} ++ ++int mlx5r_umr_update_data_direct_ksm_pas(struct mlx5_ib_mr *mr, ++ unsigned int flags) ++{ ++ return mlx5r_umr_update_data_direct_ksm_pas_range(mr, flags, 0, 0); ++} ++ ++int mlx5r_umr_update_mr_pas_range(struct mlx5_ib_mr *mr, unsigned int flags, ++ size_t start_block, size_t nblocks) ++{ ++ if (WARN_ON(mr->umem->is_odp)) ++ return -EINVAL; ++ ++ return _mlx5r_umr_update_mr_pas(mr, flags, false, start_block, nblocks); + } + + /* +@@ -770,10 +830,7 @@ int mlx5r_umr_update_data_direct_ksm_pas(struct mlx5_ib_mr *mr, unsigned int fla + */ + int mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags) + { +- if (WARN_ON(mr->umem->is_odp)) +- return -EINVAL; +- +- return _mlx5r_umr_update_mr_pas(mr, flags, false); ++ return mlx5r_umr_update_mr_pas_range(mr, flags, 0, 0); + } + + static bool umr_can_use_indirect_mkey(struct mlx5_ib_dev *dev) +@@ -866,3 +923,202 @@ int mlx5r_umr_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages, + mlx5r_umr_unmap_free_xlt(dev, xlt, &sg); + return err; + } ++ ++/* ++ * Update only the page-size (log_page_size) field of an existing memory key ++ * using UMR. This is useful when the MR's physical layout stays the same ++ * but the optimal page shift has changed (e.g. dmabuf after pages are ++ * pinned and the HW can switch from 4K to huge-page alignment). ++ */ ++int mlx5r_umr_update_mr_page_shift(struct mlx5_ib_mr *mr, ++ unsigned int page_shift, ++ bool dd) ++{ ++ struct mlx5_ib_dev *dev = mr_to_mdev(mr); ++ struct mlx5r_umr_wqe wqe = {}; ++ int err; ++ ++ /* Build UMR wqe: we touch only PAGE_SIZE, so use the dedicated mask */ ++ wqe.ctrl_seg.mkey_mask = get_umr_update_translation_mask(dev); ++ ++ /* MR must be free while page size is modified */ ++ wqe.ctrl_seg.flags = MLX5_UMR_CHECK_FREE | MLX5_UMR_INLINE; ++ ++ /* Fill mkey segment with the new page size, keep the rest unchanged */ ++ MLX5_SET(mkc, &wqe.mkey_seg, log_page_size, page_shift); ++ ++ if (dd) ++ MLX5_SET(mkc, &wqe.mkey_seg, pd, dev->ddr.pdn); ++ else ++ MLX5_SET(mkc, &wqe.mkey_seg, pd, to_mpd(mr->ibmr.pd)->pdn); ++ ++ MLX5_SET64(mkc, &wqe.mkey_seg, start_addr, mr->ibmr.iova); ++ MLX5_SET64(mkc, &wqe.mkey_seg, len, mr->ibmr.length); ++ MLX5_SET(mkc, &wqe.mkey_seg, qpn, 0xffffff); ++ MLX5_SET(mkc, &wqe.mkey_seg, mkey_7_0, ++ mlx5_mkey_variant(mr->mmkey.key)); ++ ++ err = mlx5r_umr_post_send_wait(dev, mr->mmkey.key, &wqe, false); ++ if (!err) ++ mr->page_shift = page_shift; ++ ++ return err; ++} ++ ++static inline int ++_mlx5r_dmabuf_umr_update_pas(struct mlx5_ib_mr *mr, unsigned int flags, ++ size_t start_block, size_t nblocks, bool dd) ++{ ++ if (dd) ++ return mlx5r_umr_update_data_direct_ksm_pas_range(mr, flags, ++ start_block, ++ nblocks); ++ else ++ return mlx5r_umr_update_mr_pas_range(mr, flags, start_block, ++ nblocks); ++} ++ ++/** ++ * This function makes an mkey non-present by zapping the translation entries of ++ * the mkey by zapping (zeroing out) the first N entries, where N is determined ++ * by the largest page size supported by the device and the MR length. ++ * It then updates the mkey's page size to the largest possible value, ensuring ++ * the MR is completely non-present and safe for further updates. ++ * It is useful to update the page size of a dmabuf MR on a page fault. ++ * ++ * Return: On success, returns the number of entries that were zapped. ++ * On error, returns a negative error code. ++ */ ++static int _mlx5r_umr_zap_mkey(struct mlx5_ib_mr *mr, ++ unsigned int flags, ++ unsigned int page_shift, ++ bool dd) ++{ ++ unsigned int old_page_shift = mr->page_shift; ++ struct mlx5_ib_dev *dev = mr_to_mdev(mr); ++ unsigned int max_page_shift; ++ size_t page_shift_nblocks; ++ unsigned int max_log_size; ++ int access_mode; ++ size_t nblocks; ++ int err; ++ ++ access_mode = dd ? MLX5_MKC_ACCESS_MODE_KSM : MLX5_MKC_ACCESS_MODE_MTT; ++ flags |= MLX5_IB_UPD_XLT_KEEP_PGSZ | MLX5_IB_UPD_XLT_ZAP | ++ MLX5_IB_UPD_XLT_ATOMIC; ++ max_log_size = get_max_log_entity_size_cap(dev, access_mode); ++ max_page_shift = order_base_2(mr->ibmr.length); ++ max_page_shift = min(max(max_page_shift, page_shift), max_log_size); ++ /* Count blocks in units of max_page_shift, we will zap exactly this ++ * many to make the whole MR non-present. ++ * Block size must be aligned to MLX5_UMR_FLEX_ALIGNMENT since it may ++ * be used as offset into the XLT later on. ++ */ ++ nblocks = ib_umem_num_dma_blocks(mr->umem, 1UL << max_page_shift); ++ if (dd) ++ nblocks = ALIGN(nblocks, MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT); ++ else ++ nblocks = ALIGN(nblocks, MLX5_UMR_MTT_NUM_ENTRIES_ALIGNMENT); ++ page_shift_nblocks = ib_umem_num_dma_blocks(mr->umem, ++ 1UL << page_shift); ++ /* If the number of blocks at max possible page shift is greater than ++ * the number of blocks at the new page size, we should just go over the ++ * whole mkey entries. ++ */ ++ if (nblocks >= page_shift_nblocks) ++ nblocks = 0; ++ ++ /* Make the first nblocks entries non-present without changing ++ * page size yet. ++ */ ++ if (nblocks) ++ mr->page_shift = max_page_shift; ++ err = _mlx5r_dmabuf_umr_update_pas(mr, flags, 0, nblocks, dd); ++ if (err) { ++ mr->page_shift = old_page_shift; ++ return err; ++ } ++ ++ /* Change page size to the max page size now that the MR is completely ++ * non-present. ++ */ ++ if (nblocks) { ++ err = mlx5r_umr_update_mr_page_shift(mr, max_page_shift, dd); ++ if (err) { ++ mr->page_shift = old_page_shift; ++ return err; ++ } ++ } ++ ++ return err ? err : nblocks; ++} ++ ++/** ++ * mlx5r_umr_dmabuf_update_pgsz - Safely update DMABUF MR page size and its ++ * entries accordingly ++ * @mr: The memory region to update ++ * @xlt_flags: Translation table update flags ++ * @page_shift: The new (optimized) page shift to use ++ * ++ * This function updates the page size and mkey translation entries for a DMABUF ++ * MR in a safe, multi-step process to avoid exposing partially updated mappings ++ * The update is performed in 5 steps: ++ * 1. Make the first X entries non-present, while X is calculated to be ++ * minimal according to a large page shift that can be used to cover the ++ * MR length. ++ * 2. Update the page size to the large supported page size ++ * 3. Load the remaining N-X entries according to the (optimized) page_shift ++ * 4. Update the page size according to the (optimized) page_shift ++ * 5. Load the first X entries with the correct translations ++ * ++ * This ensures that at no point is the MR accessible with a partially updated ++ * translation table, maintaining correctness and preventing access to stale or ++ * inconsistent mappings. ++ * ++ * Returns 0 on success or a negative error code on failure. ++ */ ++int mlx5r_umr_dmabuf_update_pgsz(struct mlx5_ib_mr *mr, u32 xlt_flags, ++ unsigned int page_shift) ++{ ++ unsigned int old_page_shift = mr->page_shift; ++ size_t zapped_blocks; ++ size_t total_blocks; ++ int err; ++ ++ zapped_blocks = _mlx5r_umr_zap_mkey(mr, xlt_flags, page_shift, ++ mr->data_direct); ++ if (zapped_blocks < 0) ++ return zapped_blocks; ++ ++ /* _mlx5r_umr_zap_mkey already enables the mkey */ ++ xlt_flags &= ~MLX5_IB_UPD_XLT_ENABLE; ++ mr->page_shift = page_shift; ++ total_blocks = ib_umem_num_dma_blocks(mr->umem, 1UL << mr->page_shift); ++ if (zapped_blocks && zapped_blocks < total_blocks) { ++ /* Update PAS according to the new page size but don't update ++ * the page size in the mkey yet. ++ */ ++ err = _mlx5r_dmabuf_umr_update_pas( ++ mr, ++ xlt_flags | MLX5_IB_UPD_XLT_KEEP_PGSZ, ++ zapped_blocks, ++ total_blocks - zapped_blocks, ++ mr->data_direct); ++ if (err) ++ goto err; ++ } ++ ++ err = mlx5r_umr_update_mr_page_shift(mr, mr->page_shift, ++ mr->data_direct); ++ if (err) ++ goto err; ++ err = _mlx5r_dmabuf_umr_update_pas(mr, xlt_flags, 0, zapped_blocks, ++ mr->data_direct); ++ if (err) ++ goto err; ++ ++ return 0; ++err: ++ mr->page_shift = old_page_shift; ++ return err; ++} +diff --git a/drivers/infiniband/hw/mlx5/umr.h b/drivers/infiniband/hw/mlx5/umr.h +index 4a02c9b5aad8..e9361f0140e7 100644 +--- a/drivers/infiniband/hw/mlx5/umr.h ++++ b/drivers/infiniband/hw/mlx5/umr.h +@@ -94,9 +94,20 @@ struct mlx5r_umr_wqe { + int mlx5r_umr_revoke_mr(struct mlx5_ib_mr *mr); + int mlx5r_umr_rereg_pd_access(struct mlx5_ib_mr *mr, struct ib_pd *pd, + int access_flags); +-int mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags); ++int mlx5r_umr_update_data_direct_ksm_pas_range(struct mlx5_ib_mr *mr, ++ unsigned int flags, ++ size_t start_block, ++ size_t nblocks); + int mlx5r_umr_update_data_direct_ksm_pas(struct mlx5_ib_mr *mr, unsigned int flags); ++int mlx5r_umr_update_mr_pas_range(struct mlx5_ib_mr *mr, unsigned int flags, ++ size_t start_block, size_t nblocks); ++int mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags); + int mlx5r_umr_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages, + int page_shift, int flags); ++int mlx5r_umr_update_mr_page_shift(struct mlx5_ib_mr *mr, ++ unsigned int page_shift, ++ bool dd); ++int mlx5r_umr_dmabuf_update_pgsz(struct mlx5_ib_mr *mr, u32 xlt_flags, ++ unsigned int page_shift); + + #endif /* _MLX5_IB_UMR_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1501-rdma-mlx5-remove-redundant-check-on-err-on-return-expression.patch b/SOURCES/1501-rdma-mlx5-remove-redundant-check-on-err-on-return-expression.patch new file mode 100644 index 000000000..7a807cde2 --- /dev/null +++ b/SOURCES/1501-rdma-mlx5-remove-redundant-check-on-err-on-return-expression.patch @@ -0,0 +1,41 @@ +From 46a6cf3e0875d7e11fe831ffceb067c6f3707c78 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: remove redundant check on err on return expression + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit aee80e6ffc5878a90ca5c16760b2c4f3f3d7343f +Author: Colin Ian King +Date: Thu Jul 17 12:21:08 2025 +0100 + + RDMA/mlx5: remove redundant check on err on return expression + + Currently all paths that set err and then check it for an error + perform immediate returns, hence err always zero at the end of + the function _mlx5r_umr_zap_mkey. The return expression + err ? err : nblocks has a redundant check on the err since err + is always zero, so just return nblocks instead. + + Signed-off-by: Colin Ian King + Link: https://patch.msgid.link/20250717112108.4036171-1-colin.i.king@gmail.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c +index b097d8839cad..fa5c4ea685b9 100644 +--- a/drivers/infiniband/hw/mlx5/umr.c ++++ b/drivers/infiniband/hw/mlx5/umr.c +@@ -1050,7 +1050,7 @@ static int _mlx5r_umr_zap_mkey(struct mlx5_ib_mr *mr, + } + } + +- return err ? err : nblocks; ++ return nblocks; + } + + /** +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1502-rdma-mlx5-fix-returned-type-from-mlx5r-umr-zap-mkey.patch b/SOURCES/1502-rdma-mlx5-fix-returned-type-from-mlx5r-umr-zap-mkey.patch new file mode 100644 index 000000000..f5791c5e1 --- /dev/null +++ b/SOURCES/1502-rdma-mlx5-fix-returned-type-from-mlx5r-umr-zap-mkey.patch @@ -0,0 +1,119 @@ +From e83f0e3b0976e6266badf6107c3ea4f65e21687c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d59ebb4549ff9bdba7abf6a5246a749e7f4a36ed +Author: Leon Romanovsky +Date: Sun Jul 20 12:25:34 2025 +0300 + + RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() + + As Colin reported: + "The variable zapped_blocks is a size_t type and is being assigned a int + return value from the call to _mlx5r_umr_zap_mkey. Since zapped_blocks is an + unsigned type, the error check for zapped_blocks < 0 will never be true." + + So separate return error and nblocks assignment. + + Fixes: e73242aa14d2 ("RDMA/mlx5: Optimize DMABUF mkey page size") + Reported-by: Colin King (gmail) + Closes: https://lore.kernel.org/all/79166fb1-3b73-4d37-af02-a17b22eb8e64@gmail.com + Link: https://patch.msgid.link/71d8ea208ac7eaa4438af683b9afaed78625e419.1753003467.git.leon@kernel.org + Reviewed-by: Zhu Yanjun + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c +index fa5c4ea685b9..054f6dae2415 100644 +--- a/drivers/infiniband/hw/mlx5/umr.c ++++ b/drivers/infiniband/hw/mlx5/umr.c +@@ -992,6 +992,7 @@ _mlx5r_dmabuf_umr_update_pas(struct mlx5_ib_mr *mr, unsigned int flags, + static int _mlx5r_umr_zap_mkey(struct mlx5_ib_mr *mr, + unsigned int flags, + unsigned int page_shift, ++ size_t *nblocks, + bool dd) + { + unsigned int old_page_shift = mr->page_shift; +@@ -1000,7 +1001,6 @@ static int _mlx5r_umr_zap_mkey(struct mlx5_ib_mr *mr, + size_t page_shift_nblocks; + unsigned int max_log_size; + int access_mode; +- size_t nblocks; + int err; + + access_mode = dd ? MLX5_MKC_ACCESS_MODE_KSM : MLX5_MKC_ACCESS_MODE_MTT; +@@ -1014,26 +1014,26 @@ static int _mlx5r_umr_zap_mkey(struct mlx5_ib_mr *mr, + * Block size must be aligned to MLX5_UMR_FLEX_ALIGNMENT since it may + * be used as offset into the XLT later on. + */ +- nblocks = ib_umem_num_dma_blocks(mr->umem, 1UL << max_page_shift); ++ *nblocks = ib_umem_num_dma_blocks(mr->umem, 1UL << max_page_shift); + if (dd) +- nblocks = ALIGN(nblocks, MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT); ++ *nblocks = ALIGN(*nblocks, MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT); + else +- nblocks = ALIGN(nblocks, MLX5_UMR_MTT_NUM_ENTRIES_ALIGNMENT); ++ *nblocks = ALIGN(*nblocks, MLX5_UMR_MTT_NUM_ENTRIES_ALIGNMENT); + page_shift_nblocks = ib_umem_num_dma_blocks(mr->umem, + 1UL << page_shift); + /* If the number of blocks at max possible page shift is greater than + * the number of blocks at the new page size, we should just go over the + * whole mkey entries. + */ +- if (nblocks >= page_shift_nblocks) +- nblocks = 0; ++ if (*nblocks >= page_shift_nblocks) ++ *nblocks = 0; + + /* Make the first nblocks entries non-present without changing + * page size yet. + */ +- if (nblocks) ++ if (*nblocks) + mr->page_shift = max_page_shift; +- err = _mlx5r_dmabuf_umr_update_pas(mr, flags, 0, nblocks, dd); ++ err = _mlx5r_dmabuf_umr_update_pas(mr, flags, 0, *nblocks, dd); + if (err) { + mr->page_shift = old_page_shift; + return err; +@@ -1042,7 +1042,7 @@ static int _mlx5r_umr_zap_mkey(struct mlx5_ib_mr *mr, + /* Change page size to the max page size now that the MR is completely + * non-present. + */ +- if (nblocks) { ++ if (*nblocks) { + err = mlx5r_umr_update_mr_page_shift(mr, max_page_shift, dd); + if (err) { + mr->page_shift = old_page_shift; +@@ -1050,7 +1050,7 @@ static int _mlx5r_umr_zap_mkey(struct mlx5_ib_mr *mr, + } + } + +- return nblocks; ++ return 0; + } + + /** +@@ -1085,10 +1085,10 @@ int mlx5r_umr_dmabuf_update_pgsz(struct mlx5_ib_mr *mr, u32 xlt_flags, + size_t total_blocks; + int err; + +- zapped_blocks = _mlx5r_umr_zap_mkey(mr, xlt_flags, page_shift, +- mr->data_direct); +- if (zapped_blocks < 0) +- return zapped_blocks; ++ err = _mlx5r_umr_zap_mkey(mr, xlt_flags, page_shift, &zapped_blocks, ++ mr->data_direct); ++ if (err) ++ return err; + + /* _mlx5r_umr_zap_mkey already enables the mkey */ + xlt_flags &= ~MLX5_IB_UPD_XLT_ENABLE; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1503-rdma-mlx5-fix-incorrect-mkey-masking.patch b/SOURCES/1503-rdma-mlx5-fix-incorrect-mkey-masking.patch new file mode 100644 index 000000000..75d4f7aca --- /dev/null +++ b/SOURCES/1503-rdma-mlx5-fix-incorrect-mkey-masking.patch @@ -0,0 +1,43 @@ +From 303a5934f15321dfa8360cd86a148655b9737ff7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: Fix incorrect MKEY masking + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b83440736864ad96f863666fea49bd14ab17547d +Author: Leon Romanovsky +Date: Sun Jul 20 12:25:35 2025 +0300 + + RDMA/mlx5: Fix incorrect MKEY masking + + mkey_mask is __be64 type, while MLX5_MKEY_MASK_PAGE_SIZE is declared as + unsigned long long. This causes to the static checkers errors: + + drivers/infiniband/hw/mlx5/umr.c:663:49: warning: invalid assignment: &= + drivers/infiniband/hw/mlx5/umr.c:663:49: left side has type restricted __be64 + drivers/infiniband/hw/mlx5/umr.c:663:49: right side has type int + + Fixes: e73242aa14d2 ("RDMA/mlx5: Optimize DMABUF mkey page size") + Link: https://patch.msgid.link/e354d70b98dfa5ecf4c236a36cd36b64add9d9de.1753003467.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c +index 054f6dae2415..7ef35cddce81 100644 +--- a/drivers/infiniband/hw/mlx5/umr.c ++++ b/drivers/infiniband/hw/mlx5/umr.c +@@ -660,7 +660,8 @@ static void mlx5r_umr_final_update_xlt(struct mlx5_ib_dev *dev, + if (!mr->ibmr.length) + MLX5_SET(mkc, &wqe->mkey_seg, length64, 1); + if (flags & MLX5_IB_UPD_XLT_KEEP_PGSZ) +- wqe->ctrl_seg.mkey_mask &= ~MLX5_MKEY_MASK_PAGE_SIZE; ++ wqe->ctrl_seg.mkey_mask &= ++ cpu_to_be64(~MLX5_MKEY_MASK_PAGE_SIZE); + } + + wqe->ctrl_seg.xlt_octowords = +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1504-rdma-mlx5-add-dmah-object-support.patch b/SOURCES/1504-rdma-mlx5-add-dmah-object-support.patch new file mode 100644 index 000000000..dab2e1f59 --- /dev/null +++ b/SOURCES/1504-rdma-mlx5-add-dmah-object-support.patch @@ -0,0 +1,171 @@ +From af56cc5d21a505cbc7dd28b09e0de1b09cb1fc3e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: Add DMAH object support + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3c819070754c3e81ad7be07e77fad83a658022f7 +Author: Yishai Hadas +Date: Thu Jul 17 15:17:30 2025 +0300 + + RDMA/mlx5: Add DMAH object support + + This patch introduces support for allocating and deallocating the DMAH + object. + + Further details: + ---------------- + The DMAH API is exposed to upper layers only if the underlying device + supports TPH. + + It uses the mlx5_core steering tag (ST) APIs to get a steering tag index + based on the provided input. + + The obtained index is stored in the device-specific mlx5_dmah structure + for future use. + + Upcoming patches in the series will integrate the allocated DMAH into + the memory region (MR) registration process. + + Signed-off-by: Yishai Hadas + Reviewed-by: Edward Srouji + Link: https://patch.msgid.link/778550776799d82edb4d05da249a1cff00160b50.1752752567.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile +index 11878ddf7cc7..dd7bb377f491 100644 +--- a/drivers/infiniband/hw/mlx5/Makefile ++++ b/drivers/infiniband/hw/mlx5/Makefile +@@ -8,6 +8,7 @@ mlx5_ib-y := ah.o \ + cq.o \ + data_direct.o \ + dm.o \ ++ dmah.o \ + doorbell.o \ + fs.o \ + gsi.o \ +diff --git a/drivers/infiniband/hw/mlx5/dmah.c b/drivers/infiniband/hw/mlx5/dmah.c +new file mode 100644 +index 000000000000..362a88992ffa +--- /dev/null ++++ b/drivers/infiniband/hw/mlx5/dmah.c +@@ -0,0 +1,54 @@ ++// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB ++/* ++ * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved ++ */ ++ ++#include ++#include ++#include "dmah.h" ++ ++#define UVERBS_MODULE_NAME mlx5_ib ++#include ++ ++static int mlx5_ib_alloc_dmah(struct ib_dmah *ibdmah, ++ struct uverbs_attr_bundle *attrs) ++{ ++ struct mlx5_core_dev *mdev = to_mdev(ibdmah->device)->mdev; ++ struct mlx5_ib_dmah *dmah = to_mdmah(ibdmah); ++ u16 st_bits = BIT(IB_DMAH_CPU_ID_EXISTS) | ++ BIT(IB_DMAH_MEM_TYPE_EXISTS); ++ int err; ++ ++ /* PH is a must for TPH following PCIe spec 6.2-1.0 */ ++ if (!(ibdmah->valid_fields & BIT(IB_DMAH_PH_EXISTS))) ++ return -EINVAL; ++ ++ /* ST is optional; however, partial data for it is not allowed */ ++ if (ibdmah->valid_fields & st_bits) { ++ if ((ibdmah->valid_fields & st_bits) != st_bits) ++ return -EINVAL; ++ err = mlx5_st_alloc_index(mdev, ibdmah->mem_type, ++ ibdmah->cpu_id, &dmah->st_index); ++ if (err) ++ return err; ++ } ++ ++ return 0; ++} ++ ++static int mlx5_ib_dealloc_dmah(struct ib_dmah *ibdmah, ++ struct uverbs_attr_bundle *attrs) ++{ ++ struct mlx5_ib_dmah *dmah = to_mdmah(ibdmah); ++ struct mlx5_core_dev *mdev = to_mdev(ibdmah->device)->mdev; ++ ++ if (ibdmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) ++ return mlx5_st_dealloc_index(mdev, dmah->st_index); ++ ++ return 0; ++} ++ ++const struct ib_device_ops mlx5_ib_dev_dmah_ops = { ++ .alloc_dmah = mlx5_ib_alloc_dmah, ++ .dealloc_dmah = mlx5_ib_dealloc_dmah, ++}; +diff --git a/drivers/infiniband/hw/mlx5/dmah.h b/drivers/infiniband/hw/mlx5/dmah.h +new file mode 100644 +index 000000000000..68de72b4744a +--- /dev/null ++++ b/drivers/infiniband/hw/mlx5/dmah.h +@@ -0,0 +1,23 @@ ++/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ ++/* ++ * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved ++ */ ++ ++#ifndef _MLX5_IB_DMAH_H ++#define _MLX5_IB_DMAH_H ++ ++#include "mlx5_ib.h" ++ ++extern const struct ib_device_ops mlx5_ib_dev_dmah_ops; ++ ++struct mlx5_ib_dmah { ++ struct ib_dmah ibdmah; ++ u16 st_index; ++}; ++ ++static inline struct mlx5_ib_dmah *to_mdmah(struct ib_dmah *ibdmah) ++{ ++ return container_of(ibdmah, struct mlx5_ib_dmah, ibdmah); ++} ++ ++#endif /* _MLX5_IB_DMAH_H */ +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index bdd82b371ba0..08e4e6d85f7b 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -50,6 +50,7 @@ + #include + #include "macsec.h" + #include "data_direct.h" ++#include "dmah.h" + + #define UVERBS_MODULE_NAME mlx5_ib + #include +@@ -4214,6 +4215,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = { + INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah), + INIT_RDMA_OBJ_SIZE(ib_counters, mlx5_ib_mcounters, ibcntrs), + INIT_RDMA_OBJ_SIZE(ib_cq, mlx5_ib_cq, ibcq), ++ INIT_RDMA_OBJ_SIZE(ib_dmah, mlx5_ib_dmah, ibdmah), + INIT_RDMA_OBJ_SIZE(ib_pd, mlx5_ib_pd, ibpd), + INIT_RDMA_OBJ_SIZE(ib_qp, mlx5_ib_qp, ibqp), + INIT_RDMA_OBJ_SIZE(ib_srq, mlx5_ib_srq, ibsrq), +@@ -4341,6 +4343,9 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev) + MLX5_GENERAL_OBJ_TYPES_CAP_SW_ICM) + ib_set_device_ops(&dev->ib_dev, &mlx5_ib_dev_dm_ops); + ++ if (mdev->st) ++ ib_set_device_ops(&dev->ib_dev, &mlx5_ib_dev_dmah_ops); ++ + ib_set_device_ops(&dev->ib_dev, &mlx5_ib_dev_ops); + + if (IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1505-rdma-mlx5-add-dmah-support-for-reg-user-mr-reg-user-dmabuf-m.patch b/SOURCES/1505-rdma-mlx5-add-dmah-support-for-reg-user-mr-reg-user-dmabuf-m.patch new file mode 100644 index 000000000..30f8479c4 --- /dev/null +++ b/SOURCES/1505-rdma-mlx5-add-dmah-support-for-reg-user-mr-reg-user-dmabuf-m.patch @@ -0,0 +1,394 @@ +From 44f646c103b3c937ffa87d5233c889c6cf0092dd Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: Add DMAH support for + reg_user_mr/reg_user_dmabuf_mr + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e1bed9a94da86a7c01b985c2e9a030207269cbc7 +Author: Yishai Hadas +Date: Thu Jul 17 15:17:32 2025 +0300 + + RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr + + As part of this enhancement, allow the creation of an MKEY associated + with a DMA handle. + + Additional notes: + + MKEYs with TPH (i.e. TLP Processing Hints) attributes are currently not + UMR-capable unless explicitly enabled by firmware or hardware. + Therefore, to maintain such MKEYs in the MR cache, the TPH fields have + been added to the rb_key structure, with a dedicated hash bucket. + + The ability to bypass the kernel verbs flow and create an MKEY with TPH + attributes using DEVX has been restricted. TPH must follow the standard + InfiniBand flow, where a DMAH is created with the appropriate security + checks and management mechanisms in place. + + DMA handles are currently not supported in conjunction with On-Demand + Paging (ODP). + + Re-registration of memory regions originally created with TPH attributes + is currently not supported. + + Signed-off-by: Yishai Hadas + Reviewed-by: Edward Srouji + Link: https://patch.msgid.link/1c485651cf8417694ddebb80446c5093d5a791a9.1752752567.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c +index f96c46b93bec..8694df5bf5ae 100644 +--- a/drivers/infiniband/hw/mlx5/devx.c ++++ b/drivers/infiniband/hw/mlx5/devx.c +@@ -1393,6 +1393,10 @@ static int devx_handle_mkey_create(struct mlx5_ib_dev *dev, + } + + MLX5_SET(create_mkey_in, in, mkey_umem_valid, 1); ++ /* TPH is not allowed to bypass the regular kernel's verbs flow */ ++ MLX5_SET(mkc, mkc, pcie_tph_en, 0); ++ MLX5_SET(mkc, mkc, pcie_tph_steering_tag_index, ++ MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX); + return 0; + } + +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 326fb13484de..e64997ba2f59 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -634,8 +634,13 @@ enum mlx5_mkey_type { + MLX5_MKEY_IMPLICIT_CHILD, + }; + ++/* Used for non-existent ph value */ ++#define MLX5_IB_NO_PH 0xff ++ + struct mlx5r_cache_rb_key { + u8 ats:1; ++ u8 ph; ++ u16 st_index; + unsigned int access_mode; + unsigned int access_flags; + unsigned int ndescs; +diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c +index 0a31f8c248e4..1317f2cb38a4 100644 +--- a/drivers/infiniband/hw/mlx5/mr.c ++++ b/drivers/infiniband/hw/mlx5/mr.c +@@ -44,6 +44,7 @@ + #include "mlx5_ib.h" + #include "umr.h" + #include "data_direct.h" ++#include "dmah.h" + + enum { + MAX_PENDING_REG_MR = 8, +@@ -57,7 +58,7 @@ create_mkey_callback(int status, struct mlx5_async_work *context); + static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem, + u64 iova, int access_flags, + unsigned long page_size, bool populate, +- int access_mode); ++ int access_mode, u16 st_index, u8 ph); + static int __mlx5_ib_dereg_mr(struct ib_mr *ibmr); + + static void set_mkc_access_pd_addr_fields(void *mkc, int acc, u64 start_addr, +@@ -256,6 +257,14 @@ static void set_cache_mkc(struct mlx5_cache_ent *ent, void *mkc) + get_mkc_octo_size(ent->rb_key.access_mode, + ent->rb_key.ndescs)); + MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); ++ ++ if (ent->rb_key.ph != MLX5_IB_NO_PH) { ++ MLX5_SET(mkc, mkc, pcie_tph_en, 1); ++ MLX5_SET(mkc, mkc, pcie_tph_ph, ent->rb_key.ph); ++ if (ent->rb_key.st_index != MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX) ++ MLX5_SET(mkc, mkc, pcie_tph_steering_tag_index, ++ ent->rb_key.st_index); ++ } + } + + /* Asynchronously schedule new MRs to be populated in the cache. */ +@@ -641,6 +650,14 @@ static int cache_ent_key_cmp(struct mlx5r_cache_rb_key key1, + if (res) + return res; + ++ res = key1.st_index - key2.st_index; ++ if (res) ++ return res; ++ ++ res = key1.ph - key2.ph; ++ if (res) ++ return res; ++ + /* + * keep ndescs the last in the compare table since the find function + * searches for an exact match on all properties and only closest +@@ -712,6 +729,8 @@ mkey_cache_ent_from_rb_key(struct mlx5_ib_dev *dev, + smallest->rb_key.access_mode == rb_key.access_mode && + smallest->rb_key.access_flags == rb_key.access_flags && + smallest->rb_key.ats == rb_key.ats && ++ smallest->rb_key.st_index == rb_key.st_index && ++ smallest->rb_key.ph == rb_key.ph && + smallest->rb_key.ndescs <= ndescs_limit) ? + smallest : + NULL; +@@ -786,7 +805,8 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, + struct mlx5r_cache_rb_key rb_key = { + .ndescs = ndescs, + .access_mode = access_mode, +- .access_flags = get_unchangeable_access_flags(dev, access_flags) ++ .access_flags = get_unchangeable_access_flags(dev, access_flags), ++ .ph = MLX5_IB_NO_PH, + }; + struct mlx5_cache_ent *ent = mkey_cache_ent_from_rb_key(dev, rb_key); + +@@ -943,6 +963,7 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev) + struct rb_root *root = &dev->cache.rb_root; + struct mlx5r_cache_rb_key rb_key = { + .access_mode = MLX5_MKC_ACCESS_MODE_MTT, ++ .ph = MLX5_IB_NO_PH, + }; + struct mlx5_cache_ent *ent; + struct rb_node *node; +@@ -1119,7 +1140,8 @@ static unsigned int mlx5_umem_dmabuf_default_pgsz(struct ib_umem *umem, + + static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd, + struct ib_umem *umem, u64 iova, +- int access_flags, int access_mode) ++ int access_flags, int access_mode, ++ u16 st_index, u8 ph) + { + struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct mlx5r_cache_rb_key rb_key = {}; +@@ -1139,6 +1161,8 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd, + rb_key.ndescs = ib_umem_num_dma_blocks(umem, page_size); + rb_key.ats = mlx5_umem_needs_ats(dev, umem, access_flags); + rb_key.access_flags = get_unchangeable_access_flags(dev, access_flags); ++ rb_key.st_index = st_index; ++ rb_key.ph = ph; + ent = mkey_cache_ent_from_rb_key(dev, rb_key); + /* + * If the MR can't come from the cache then synchronously create an uncached +@@ -1146,7 +1170,8 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd, + */ + if (!ent) { + mutex_lock(&dev->slow_path_mutex); +- mr = reg_create(pd, umem, iova, access_flags, page_size, false, access_mode); ++ mr = reg_create(pd, umem, iova, access_flags, page_size, false, access_mode, ++ st_index, ph); + mutex_unlock(&dev->slow_path_mutex); + if (IS_ERR(mr)) + return mr; +@@ -1231,7 +1256,7 @@ reg_create_crossing_vhca_mr(struct ib_pd *pd, u64 iova, u64 length, int access_f + static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem, + u64 iova, int access_flags, + unsigned long page_size, bool populate, +- int access_mode) ++ int access_mode, u16 st_index, u8 ph) + { + struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct mlx5_ib_mr *mr; +@@ -1241,7 +1266,8 @@ static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem, + u32 *in; + int err; + bool pg_cap = !!(MLX5_CAP_GEN(dev->mdev, pg)) && +- (access_mode == MLX5_MKC_ACCESS_MODE_MTT); ++ (access_mode == MLX5_MKC_ACCESS_MODE_MTT) && ++ (ph == MLX5_IB_NO_PH); + bool ksm_mode = (access_mode == MLX5_MKC_ACCESS_MODE_KSM); + + if (!page_size) +@@ -1305,6 +1331,13 @@ static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem, + get_octo_len(iova, umem->length, mr->page_shift)); + } + ++ if (ph != MLX5_IB_NO_PH) { ++ MLX5_SET(mkc, mkc, pcie_tph_en, 1); ++ MLX5_SET(mkc, mkc, pcie_tph_ph, ph); ++ if (st_index != MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX) ++ MLX5_SET(mkc, mkc, pcie_tph_steering_tag_index, st_index); ++ } ++ + err = mlx5_ib_create_mkey(dev, &mr->mmkey, in, inlen); + if (err) { + mlx5_ib_warn(dev, "create mkey failed\n"); +@@ -1424,24 +1457,37 @@ struct ib_mr *mlx5_ib_reg_dm_mr(struct ib_pd *pd, struct ib_dm *dm, + } + + static struct ib_mr *create_real_mr(struct ib_pd *pd, struct ib_umem *umem, +- u64 iova, int access_flags) ++ u64 iova, int access_flags, ++ struct ib_dmah *dmah) + { + struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct mlx5_ib_mr *mr = NULL; + bool xlt_with_umr; ++ u16 st_index = MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX; ++ u8 ph = MLX5_IB_NO_PH; + int err; + ++ if (dmah) { ++ struct mlx5_ib_dmah *mdmah = to_mdmah(dmah); ++ ++ ph = dmah->ph; ++ if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) ++ st_index = mdmah->st_index; ++ } ++ + xlt_with_umr = mlx5r_umr_can_load_pas(dev, umem->length); + if (xlt_with_umr) { + mr = alloc_cacheable_mr(pd, umem, iova, access_flags, +- MLX5_MKC_ACCESS_MODE_MTT); ++ MLX5_MKC_ACCESS_MODE_MTT, ++ st_index, ph); + } else { + unsigned long page_size = mlx5_umem_mkc_find_best_pgsz( + dev, umem, iova, MLX5_MKC_ACCESS_MODE_MTT); + + mutex_lock(&dev->slow_path_mutex); + mr = reg_create(pd, umem, iova, access_flags, page_size, +- true, MLX5_MKC_ACCESS_MODE_MTT); ++ true, MLX5_MKC_ACCESS_MODE_MTT, ++ st_index, ph); + mutex_unlock(&dev->slow_path_mutex); + } + if (IS_ERR(mr)) { +@@ -1505,7 +1551,9 @@ static struct ib_mr *create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, + return ERR_CAST(odp); + + mr = alloc_cacheable_mr(pd, &odp->umem, iova, access_flags, +- MLX5_MKC_ACCESS_MODE_MTT); ++ MLX5_MKC_ACCESS_MODE_MTT, ++ MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX, ++ MLX5_IB_NO_PH); + if (IS_ERR(mr)) { + ib_umem_release(&odp->umem); + return ERR_CAST(mr); +@@ -1536,7 +1584,8 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, + struct ib_umem *umem; + int err; + +- if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) || dmah) ++ if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) || ++ ((access_flags & IB_ACCESS_ON_DEMAND) && dmah)) + return ERR_PTR(-EOPNOTSUPP); + + mlx5_ib_dbg(dev, "start 0x%llx, iova 0x%llx, length 0x%llx, access_flags 0x%x\n", +@@ -1552,7 +1601,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, + umem = ib_umem_get(&dev->ib_dev, start, length, access_flags); + if (IS_ERR(umem)) + return ERR_CAST(umem); +- return create_real_mr(pd, umem, iova, access_flags); ++ return create_real_mr(pd, umem, iova, access_flags, dmah); + } + + static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach) +@@ -1577,12 +1626,15 @@ static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = { + static struct ib_mr * + reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, + u64 offset, u64 length, u64 virt_addr, +- int fd, int access_flags, int access_mode) ++ int fd, int access_flags, int access_mode, ++ struct ib_dmah *dmah) + { + bool pinned_mode = (access_mode == MLX5_MKC_ACCESS_MODE_KSM); + struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct mlx5_ib_mr *mr = NULL; + struct ib_umem_dmabuf *umem_dmabuf; ++ u16 st_index = MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX; ++ u8 ph = MLX5_IB_NO_PH; + int err; + + err = mlx5r_umr_resource_init(dev); +@@ -1605,8 +1657,17 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, + return ERR_CAST(umem_dmabuf); + } + ++ if (dmah) { ++ struct mlx5_ib_dmah *mdmah = to_mdmah(dmah); ++ ++ ph = dmah->ph; ++ if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) ++ st_index = mdmah->st_index; ++ } ++ + mr = alloc_cacheable_mr(pd, &umem_dmabuf->umem, virt_addr, +- access_flags, access_mode); ++ access_flags, access_mode, ++ st_index, ph); + if (IS_ERR(mr)) { + ib_umem_release(&umem_dmabuf->umem); + return ERR_CAST(mr); +@@ -1663,7 +1724,8 @@ reg_user_mr_dmabuf_by_data_direct(struct ib_pd *pd, u64 offset, + access_flags &= ~IB_ACCESS_RELAXED_ORDERING; + crossed_mr = reg_user_mr_dmabuf(pd, &data_direct_dev->pdev->dev, + offset, length, virt_addr, fd, +- access_flags, MLX5_MKC_ACCESS_MODE_KSM); ++ access_flags, MLX5_MKC_ACCESS_MODE_KSM, ++ NULL); + if (IS_ERR(crossed_mr)) { + ret = PTR_ERR(crossed_mr); + goto end; +@@ -1698,7 +1760,7 @@ struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset, + int err; + + if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) || +- !IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) || dmah) ++ !IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) + return ERR_PTR(-EOPNOTSUPP); + + if (uverbs_attr_is_valid(attrs, MLX5_IB_ATTR_REG_DMABUF_MR_ACCESS_FLAGS)) { +@@ -1723,7 +1785,8 @@ struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset, + + return reg_user_mr_dmabuf(pd, pd->device->dma_device, + offset, length, virt_addr, +- fd, access_flags, MLX5_MKC_ACCESS_MODE_MTT); ++ fd, access_flags, MLX5_MKC_ACCESS_MODE_MTT, ++ dmah); + } + + /* +@@ -1821,7 +1884,8 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, + struct mlx5_ib_mr *mr = to_mmr(ib_mr); + int err; + +- if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) || mr->data_direct) ++ if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) || mr->data_direct || ++ mr->mmkey.rb_key.ph != MLX5_IB_NO_PH) + return ERR_PTR(-EOPNOTSUPP); + + mlx5_ib_dbg( +@@ -1865,7 +1929,7 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, + atomic_sub(ib_umem_num_pages(umem), &dev->mdev->priv.reg_pages); + + return create_real_mr(new_pd, umem, mr->ibmr.iova, +- new_access_flags); ++ new_access_flags, NULL); + } + + /* +@@ -1896,7 +1960,7 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, + } + return NULL; + } +- return create_real_mr(new_pd, new_umem, iova, new_access_flags); ++ return create_real_mr(new_pd, new_umem, iova, new_access_flags, NULL); + } + + /* +diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c +index 1c63cc0b9409..0e8ae85af5a6 100644 +--- a/drivers/infiniband/hw/mlx5/odp.c ++++ b/drivers/infiniband/hw/mlx5/odp.c +@@ -1883,6 +1883,7 @@ int mlx5_odp_init_mkey_cache(struct mlx5_ib_dev *dev) + struct mlx5r_cache_rb_key rb_key = { + .access_mode = MLX5_MKC_ACCESS_MODE_KSM, + .ndescs = mlx5_imr_ksm_entries, ++ .ph = MLX5_IB_NO_PH, + }; + struct mlx5_cache_ent *ent; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1506-rdma-mlx5-refactor-optional-counters-steering-code.patch b/SOURCES/1506-rdma-mlx5-refactor-optional-counters-steering-code.patch new file mode 100644 index 000000000..e8cc4a08c --- /dev/null +++ b/SOURCES/1506-rdma-mlx5-refactor-optional-counters-steering-code.patch @@ -0,0 +1,358 @@ +From ff4e6069b1f973258676f846a6bb2bea72eecebf Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] RDMA/mlx5: Refactor optional counters steering code + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 10d4de4189533840f296ce7e48eac05b985d96bc +Author: Patrisious Haddad +Date: Sun Jul 20 12:37:24 2025 +0300 + + RDMA/mlx5: Refactor optional counters steering code + + Currently there isn't a clear layer separation between the counters and + the steering code, whereas the steering code is doing redundant access + to the counter struct. + + Separate the fs.c and counters.c, where fs code won't access or be + aware of counter structs but only the objects it needs. + + As a result, move mlx5_rdma_counter struct from the header file to be + an internal struct for the counters file only. + + Signed-off-by: Patrisious Haddad + Reviewed-by: Edward Srouji + Link: https://patch.msgid.link/9854d1fdb140e4a6552b7a2fd1a028cfe488a935.1753004208.git.leon@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c +index a506fafd2b15..e042e0719ead 100644 +--- a/drivers/infiniband/hw/mlx5/counters.c ++++ b/drivers/infiniband/hw/mlx5/counters.c +@@ -16,6 +16,18 @@ struct mlx5_ib_counter { + u32 type; + }; + ++struct mlx5_rdma_counter { ++ struct rdma_counter rdma_counter; ++ ++ struct mlx5_fc *fc[MLX5_IB_OPCOUNTER_MAX]; ++ struct xarray qpn_opfc_xa; ++}; ++ ++static struct mlx5_rdma_counter *to_mcounter(struct rdma_counter *counter) ++{ ++ return container_of(counter, struct mlx5_rdma_counter, rdma_counter); ++} ++ + #define INIT_Q_COUNTER(_name) \ + { .name = #_name, .offset = MLX5_BYTE_OFF(query_q_counter_out, _name)} + +@@ -602,7 +614,7 @@ static int mlx5_ib_counter_dealloc(struct rdma_counter *counter) + return 0; + + WARN_ON(!xa_empty(&mcounter->qpn_opfc_xa)); +- mlx5r_fs_destroy_fcs(dev, counter); ++ mlx5r_fs_destroy_fcs(dev, mcounter->fc); + MLX5_SET(dealloc_q_counter_in, in, opcode, + MLX5_CMD_OP_DEALLOC_Q_COUNTER); + MLX5_SET(dealloc_q_counter_in, in, counter_set_id, counter->id); +@@ -612,6 +624,7 @@ static int mlx5_ib_counter_dealloc(struct rdma_counter *counter) + static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, + struct ib_qp *qp, u32 port) + { ++ struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + struct mlx5_ib_dev *dev = to_mdev(qp->device); + bool new = false; + int err; +@@ -635,7 +648,11 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, + if (err) + goto fail_set_counter; + +- err = mlx5r_fs_bind_op_fc(qp, counter, port); ++ if (!counter->mode.bind_opcnt) ++ return 0; ++ ++ err = mlx5r_fs_bind_op_fc(qp, mcounter->fc, &mcounter->qpn_opfc_xa, ++ port); + if (err) + goto fail_bind_op_fc; + +@@ -655,9 +672,12 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, + static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp, u32 port) + { + struct rdma_counter *counter = qp->counter; ++ struct mlx5_rdma_counter *mcounter; + int err; + +- mlx5r_fs_unbind_op_fc(qp, counter); ++ mcounter = to_mcounter(counter); ++ ++ mlx5r_fs_unbind_op_fc(qp, &mcounter->qpn_opfc_xa); + + err = mlx5_ib_qp_set_counter(qp, NULL); + if (err) +@@ -666,7 +686,9 @@ static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp, u32 port) + return 0; + + fail_set_counter: +- mlx5r_fs_bind_op_fc(qp, counter, port); ++ if (counter->mode.bind_opcnt) ++ mlx5r_fs_bind_op_fc(qp, mcounter->fc, ++ &mcounter->qpn_opfc_xa, port); + return err; + } + +diff --git a/drivers/infiniband/hw/mlx5/counters.h b/drivers/infiniband/hw/mlx5/counters.h +index bd03cee42014..a04e7dd59455 100644 +--- a/drivers/infiniband/hw/mlx5/counters.h ++++ b/drivers/infiniband/hw/mlx5/counters.h +@@ -8,19 +8,6 @@ + + #include "mlx5_ib.h" + +-struct mlx5_rdma_counter { +- struct rdma_counter rdma_counter; +- +- struct mlx5_fc *fc[MLX5_IB_OPCOUNTER_MAX]; +- struct xarray qpn_opfc_xa; +-}; +- +-static inline struct mlx5_rdma_counter * +-to_mcounter(struct rdma_counter *counter) +-{ +- return container_of(counter, struct mlx5_rdma_counter, rdma_counter); +-} +- + int mlx5_ib_counters_init(struct mlx5_ib_dev *dev); + void mlx5_ib_counters_cleanup(struct mlx5_ib_dev *dev); + void mlx5_ib_counters_clear_description(struct ib_counters *counters); +diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c +index bab2f58240c9..b0f7663c24c1 100644 +--- a/drivers/infiniband/hw/mlx5/fs.c ++++ b/drivers/infiniband/hw/mlx5/fs.c +@@ -1012,14 +1012,14 @@ static int get_per_qp_prio(struct mlx5_ib_dev *dev, + return 0; + } + +-static struct mlx5_per_qp_opfc * +-get_per_qp_opfc(struct mlx5_rdma_counter *mcounter, u32 qp_num, bool *new) ++static struct mlx5_per_qp_opfc *get_per_qp_opfc(struct xarray *qpn_opfc_xa, ++ u32 qp_num, bool *new) + { + struct mlx5_per_qp_opfc *per_qp_opfc; + + *new = false; + +- per_qp_opfc = xa_load(&mcounter->qpn_opfc_xa, qp_num); ++ per_qp_opfc = xa_load(qpn_opfc_xa, qp_num); + if (per_qp_opfc) + return per_qp_opfc; + per_qp_opfc = kzalloc(sizeof(*per_qp_opfc), GFP_KERNEL); +@@ -1032,7 +1032,8 @@ get_per_qp_opfc(struct mlx5_rdma_counter *mcounter, u32 qp_num, bool *new) + } + + static int add_op_fc_rules(struct mlx5_ib_dev *dev, +- struct mlx5_rdma_counter *mcounter, ++ struct mlx5_fc *fc_arr[MLX5_IB_OPCOUNTER_MAX], ++ struct xarray *qpn_opfc_xa, + struct mlx5_per_qp_opfc *per_qp_opfc, + struct mlx5_ib_flow_prio *prio, + enum mlx5_ib_optional_counter_type type, +@@ -1055,7 +1056,7 @@ static int add_op_fc_rules(struct mlx5_ib_dev *dev, + return 0; + } + +- opfc->fc = mcounter->fc[type]; ++ opfc->fc = fc_arr[type]; + + spec = kcalloc(MAX_OPFC_RULES, sizeof(*spec), GFP_KERNEL); + if (!spec) { +@@ -1148,8 +1149,7 @@ static int add_op_fc_rules(struct mlx5_ib_dev *dev, + } + prio->refcount += spec_num; + +- err = xa_err(xa_store(&mcounter->qpn_opfc_xa, qp_num, per_qp_opfc, +- GFP_KERNEL)); ++ err = xa_err(xa_store(qpn_opfc_xa, qp_num, per_qp_opfc, GFP_KERNEL)); + if (err) + goto del_rules; + +@@ -1168,8 +1168,9 @@ static int add_op_fc_rules(struct mlx5_ib_dev *dev, + return err; + } + +-static bool is_fc_shared_and_in_use(struct mlx5_rdma_counter *mcounter, +- u32 type, struct mlx5_fc **fc) ++static bool ++is_fc_shared_and_in_use(struct mlx5_fc *fc_arr[MLX5_IB_OPCOUNTER_MAX], u32 type, ++ struct mlx5_fc **fc) + { + u32 shared_fc_type; + +@@ -1190,7 +1191,7 @@ static bool is_fc_shared_and_in_use(struct mlx5_rdma_counter *mcounter, + return false; + } + +- *fc = mcounter->fc[shared_fc_type]; ++ *fc = fc_arr[shared_fc_type]; + if (!(*fc)) + return false; + +@@ -1198,24 +1199,23 @@ static bool is_fc_shared_and_in_use(struct mlx5_rdma_counter *mcounter, + } + + void mlx5r_fs_destroy_fcs(struct mlx5_ib_dev *dev, +- struct rdma_counter *counter) ++ struct mlx5_fc *fc_arr[MLX5_IB_OPCOUNTER_MAX]) + { +- struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + struct mlx5_fc *in_use_fc; + int i; + + for (i = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; + i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; i++) { +- if (!mcounter->fc[i]) ++ if (!fc_arr[i]) + continue; + +- if (is_fc_shared_and_in_use(mcounter, i, &in_use_fc)) { +- mcounter->fc[i] = NULL; ++ if (is_fc_shared_and_in_use(fc_arr, i, &in_use_fc)) { ++ fc_arr[i] = NULL; + continue; + } + +- mlx5_fc_destroy(dev->mdev, mcounter->fc[i]); +- mcounter->fc[i] = NULL; ++ mlx5_fc_destroy(dev->mdev, fc_arr[i]); ++ fc_arr[i] = NULL; + } + } + +@@ -1359,16 +1359,15 @@ void mlx5_ib_fs_remove_op_fc(struct mlx5_ib_dev *dev, + put_per_qp_prio(dev, type); + } + +-void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct rdma_counter *counter) ++void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct xarray *qpn_opfc_xa) + { +- struct mlx5_rdma_counter *mcounter = to_mcounter(counter); +- struct mlx5_ib_dev *dev = to_mdev(counter->device); ++ struct mlx5_ib_dev *dev = to_mdev(qp->device); + struct mlx5_per_qp_opfc *per_qp_opfc; + struct mlx5_ib_op_fc *in_use_opfc; + struct mlx5_ib_flow_prio *prio; + int i, j; + +- per_qp_opfc = xa_load(&mcounter->qpn_opfc_xa, qp->qp_num); ++ per_qp_opfc = xa_load(qpn_opfc_xa, qp->qp_num); + if (!per_qp_opfc) + return; + +@@ -1394,13 +1393,13 @@ void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct rdma_counter *counter) + } + + kfree(per_qp_opfc); +- xa_erase(&mcounter->qpn_opfc_xa, qp->qp_num); ++ xa_erase(qpn_opfc_xa, qp->qp_num); + } + +-int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, +- u32 port) ++int mlx5r_fs_bind_op_fc(struct ib_qp *qp, ++ struct mlx5_fc *fc_arr[MLX5_IB_OPCOUNTER_MAX], ++ struct xarray *qpn_opfc_xa, u32 port) + { +- struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + struct mlx5_ib_dev *dev = to_mdev(qp->device); + struct mlx5_per_qp_opfc *per_qp_opfc; + struct mlx5_ib_flow_prio *prio; +@@ -1410,9 +1409,6 @@ int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, + int i, err, per_qp_type; + bool new; + +- if (!counter->mode.bind_opcnt) +- return 0; +- + cnts = &dev->port[port - 1].cnts; + + for (i = 0; i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES; i++) { +@@ -1424,23 +1420,22 @@ int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, + prio = get_opfc_prio(dev, per_qp_type); + WARN_ON(!prio->flow_table); + +- if (is_fc_shared_and_in_use(mcounter, per_qp_type, &in_use_fc)) +- mcounter->fc[per_qp_type] = in_use_fc; ++ if (is_fc_shared_and_in_use(fc_arr, per_qp_type, &in_use_fc)) ++ fc_arr[per_qp_type] = in_use_fc; + +- if (!mcounter->fc[per_qp_type]) { +- mcounter->fc[per_qp_type] = mlx5_fc_create(dev->mdev, +- false); +- if (IS_ERR(mcounter->fc[per_qp_type])) +- return PTR_ERR(mcounter->fc[per_qp_type]); ++ if (!fc_arr[per_qp_type]) { ++ fc_arr[per_qp_type] = mlx5_fc_create(dev->mdev, false); ++ if (IS_ERR(fc_arr[per_qp_type])) ++ return PTR_ERR(fc_arr[per_qp_type]); + } + +- per_qp_opfc = get_per_qp_opfc(mcounter, qp->qp_num, &new); ++ per_qp_opfc = get_per_qp_opfc(qpn_opfc_xa, qp->qp_num, &new); + if (!per_qp_opfc) { + err = -ENOMEM; + goto free_fc; + } +- err = add_op_fc_rules(dev, mcounter, per_qp_opfc, prio, +- per_qp_type, qp->qp_num, port); ++ err = add_op_fc_rules(dev, fc_arr, qpn_opfc_xa, per_qp_opfc, ++ prio, per_qp_type, qp->qp_num, port); + if (err) + goto del_rules; + } +@@ -1448,12 +1443,12 @@ int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, + return 0; + + del_rules: +- mlx5r_fs_unbind_op_fc(qp, counter); ++ mlx5r_fs_unbind_op_fc(qp, qpn_opfc_xa); + if (new) + kfree(per_qp_opfc); + free_fc: +- if (xa_empty(&mcounter->qpn_opfc_xa)) +- mlx5r_fs_destroy_fcs(dev, counter); ++ if (xa_empty(qpn_opfc_xa)) ++ mlx5r_fs_destroy_fcs(dev, fc_arr); + return err; + } + +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index e64997ba2f59..1b646761d5d5 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -890,13 +890,14 @@ void mlx5_ib_fs_remove_op_fc(struct mlx5_ib_dev *dev, + struct mlx5_ib_op_fc *opfc, + enum mlx5_ib_optional_counter_type type); + +-int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, +- u32 port); ++int mlx5r_fs_bind_op_fc(struct ib_qp *qp, ++ struct mlx5_fc *fc_arr[MLX5_IB_OPCOUNTER_MAX], ++ struct xarray *qpn_opfc_xa, u32 port); + +-void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct rdma_counter *counter); ++void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct xarray *qpn_opfc_xa); + + void mlx5r_fs_destroy_fcs(struct mlx5_ib_dev *dev, +- struct rdma_counter *counter); ++ struct mlx5_fc *fc_arr[MLX5_IB_OPCOUNTER_MAX]); + + struct mlx5_ib_multiport_info; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1507-ib-mlx5-fix-obj-type-mismatch-for-srq-event-subscriptions.patch b/SOURCES/1507-ib-mlx5-fix-obj-type-mismatch-for-srq-event-subscriptions.patch new file mode 100644 index 000000000..a199fb627 --- /dev/null +++ b/SOURCES/1507-ib-mlx5-fix-obj-type-mismatch-for-srq-event-subscriptions.patch @@ -0,0 +1,52 @@ +From 421118c839b06cb8a5813d102897509002765133 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Fri, 17 Apr 2026 11:35:09 -0400 +Subject: [PATCH] IB/mlx5: Fix obj_type mismatch for SRQ event subscriptions + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 85fe9f565d2d5af95ac2bbaa5082b8ce62b039f5 +Author: Or Har-Toov +Date: Wed Aug 13 15:43:20 2025 +0300 + + IB/mlx5: Fix obj_type mismatch for SRQ event subscriptions + + Fix a bug where the driver's event subscription logic for SRQ-related + events incorrectly sets obj_type for RMP objects. + + When subscribing to SRQ events, get_legacy_obj_type() did not handle + the MLX5_CMD_OP_CREATE_RMP case, which caused obj_type to be 0 + (default). + This led to a mismatch between the obj_type used during subscription + (0) and the value used during notification (1, taken from the event's + type field). As a result, event mapping for SRQ objects could fail and + event notification would not be delivered correctly. + + This fix adds handling for MLX5_CMD_OP_CREATE_RMP in get_legacy_obj_type, + returning MLX5_EVENT_QUEUE_TYPE_RQ so obj_type is consistent between + subscription and notification. + + Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") + Link: https://patch.msgid.link/r/8f1048e3fdd1fde6b90607ce0ed251afaf8a148c.1755088962.git.leon@kernel.org + Signed-off-by: Or Har-Toov + Reviewed-by: Edward Srouji + Signed-off-by: Leon Romanovsky + Signed-off-by: Jason Gunthorpe + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c +index 8694df5bf5ae..4e2edf3378d7 100644 +--- a/drivers/infiniband/hw/mlx5/devx.c ++++ b/drivers/infiniband/hw/mlx5/devx.c +@@ -233,6 +233,7 @@ static u16 get_legacy_obj_type(u16 opcode) + { + switch (opcode) { + case MLX5_CMD_OP_CREATE_RQ: ++ case MLX5_CMD_OP_CREATE_RMP: + return MLX5_EVENT_QUEUE_TYPE_RQ; + case MLX5_CMD_OP_CREATE_QP: + return MLX5_EVENT_QUEUE_TYPE_QP; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1508-net-mlx5-don-t-use-pk-through-tracepoints.patch b/SOURCES/1508-net-mlx5-don-t-use-pk-through-tracepoints.patch new file mode 100644 index 000000000..61497740a --- /dev/null +++ b/SOURCES/1508-net-mlx5-don-t-use-pk-through-tracepoints.patch @@ -0,0 +1,56 @@ +From f0f4ea25153362ece899e7a2d8d40ccc171e1e9e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sat, 18 Apr 2026 17:12:40 -0400 +Subject: [PATCH] net/mlx5: Don't use %pK through tracepoints +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e2068f74b97653356ad7d6ce456db1f5b7fb575e +Author: Thomas Weißschuh +Date: Mon Aug 11 11:43:19 2025 +0200 + + net/mlx5: Don't use %pK through tracepoints + + In the past %pK was preferable to %p as it would not leak raw pointer + values into the kernel log. + Since commit ad67b74d2469 ("printk: hash addresses printed with %p") + the regular %p has been improved to avoid this issue. + Furthermore, restricted pointers ("%pK") were never meant to be used + through tracepoints. They can still unintentionally leak raw pointers or + acquire sleeping locks in atomic contexts. + + Switch to the regular pointer formatting which is safer and + easier to reason about. + There are still a few users of %pK left, but these use it through seq_file, + for which its usage is safe. + + Signed-off-by: Thomas Weißschuh + Reviewed-by: Aleksandr Loktionov + Reviewed-by: Tariq Toukan + Reviewed-by: Simon Horman + Reviewed-by: Paul Menzel + Reviewed-by: Jacob Keller + Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-2-2e2fdc7d3f2c@linutronix.de + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h +index 7f7c9af5deed..ce834680f504 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h +@@ -28,7 +28,7 @@ DECLARE_EVENT_CLASS(mlx5_sf_dev_template, + __entry->hw_fn_id = sfdev->fn_id; + __entry->sfnum = sfdev->sfnum; + ), +- TP_printk("(%s) sfdev=%pK aux_id=%d hw_id=0x%x sfnum=%u\n", ++ TP_printk("(%s) sfdev=%p aux_id=%d hw_id=0x%x sfnum=%u\n", + __get_str(devname), __entry->sfdev, + __entry->aux_id, __entry->hw_fn_id, + __entry->sfnum) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1509-net-mlx5-mlx5-ifc-add-hardware-definitions-needed-for-adjace.patch b/SOURCES/1509-net-mlx5-mlx5-ifc-add-hardware-definitions-needed-for-adjace.patch new file mode 100644 index 000000000..5928c96e6 --- /dev/null +++ b/SOURCES/1509-net-mlx5-mlx5-ifc-add-hardware-definitions-needed-for-adjace.patch @@ -0,0 +1,215 @@ +From 857a7a08e1d720b6a9dd18630a3bae53e013a460 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sat, 18 Apr 2026 17:12:40 -0400 +Subject: [PATCH] net/mlx5: mlx5_ifc, Add hardware definitions needed for + adjacent vports + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2335b3f56690f76ac34b972fcaef368bab1f76f2 +Author: Saeed Mahameed +Date: Mon Jun 16 23:41:53 2025 -0700 + + net/mlx5: mlx5_ifc, Add hardware definitions needed for adjacent vports + + Next patches will implement the discovery and creation of adjacent + functions vports, this patch introduces the hardware structures + definitions needed for the driver implementation. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Mark Bloch + Reviewed-by: Parav Pandit + Reviewed-by: Jack Morgenstein + Signed-off-by: Alexei Lazar + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index fd3bedd8dbcb..4362dcb3b8fa 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -189,6 +189,9 @@ enum { + MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS = 0x727, + MLX5_CMD_OP_RELEASE_XRQ_ERROR = 0x729, + MLX5_CMD_OP_MODIFY_XRQ = 0x72a, ++ MLX5_CMD_OPCODE_QUERY_DELEGATED_VHCA = 0x732, ++ MLX5_CMD_OPCODE_CREATE_ESW_VPORT = 0x733, ++ MLX5_CMD_OPCODE_DESTROY_ESW_VPORT = 0x734, + MLX5_CMD_OP_QUERY_ESW_FUNCTIONS = 0x740, + MLX5_CMD_OP_QUERY_VPORT_STATE = 0x750, + MLX5_CMD_OP_MODIFY_VPORT_STATE = 0x751, +@@ -2206,7 +2209,19 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { + + u8 reserved_at_440[0x8]; + u8 max_num_eqs_24b[0x18]; +- u8 reserved_at_460[0x3a0]; ++ ++ u8 reserved_at_460[0x160]; ++ ++ u8 query_adjacent_functions_id[0x1]; ++ u8 ingress_egress_esw_vport_connect[0x1]; ++ u8 function_id_type_vhca_id[0x1]; ++ u8 reserved_at_5c3[0xd]; ++ u8 delegate_vhca_management_profiles[0x10]; ++ ++ u8 delegated_vhca_max[0x10]; ++ u8 delegate_vhca_max[0x10]; ++ ++ u8 reserved_at_600[0x200]; + }; + + enum mlx5_ifc_flow_destination_type { +@@ -5158,7 +5173,9 @@ struct mlx5_ifc_set_hca_cap_in_bits { + + u8 other_function[0x1]; + u8 ec_vf_function[0x1]; +- u8 reserved_at_42[0xe]; ++ u8 reserved_at_42[0x1]; ++ u8 function_id_type[0x1]; ++ u8 reserved_at_44[0xc]; + u8 function_id[0x10]; + + u8 reserved_at_60[0x20]; +@@ -6356,7 +6373,9 @@ struct mlx5_ifc_query_hca_cap_in_bits { + + u8 other_function[0x1]; + u8 ec_vf_function[0x1]; +- u8 reserved_at_42[0xe]; ++ u8 reserved_at_42[0x1]; ++ u8 function_id_type[0x1]; ++ u8 reserved_at_44[0xc]; + u8 function_id[0x10]; + + u8 reserved_at_60[0x20]; +@@ -6982,6 +7001,28 @@ struct mlx5_ifc_query_esw_vport_context_in_bits { + u8 reserved_at_60[0x20]; + }; + ++struct mlx5_ifc_destroy_esw_vport_out_bits { ++ u8 status[0x8]; ++ u8 reserved_at_8[0x18]; ++ ++ u8 syndrome[0x20]; ++ ++ u8 reserved_at_40[0x20]; ++}; ++ ++struct mlx5_ifc_destroy_esw_vport_in_bits { ++ u8 opcode[0x10]; ++ u8 uid[0x10]; ++ ++ u8 reserved_at_20[0x10]; ++ u8 op_mod[0x10]; ++ ++ u8 reserved_at_40[0x10]; ++ u8 vport_num[0x10]; ++ ++ u8 reserved_at_60[0x20]; ++}; ++ + struct mlx5_ifc_modify_esw_vport_context_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; +@@ -7483,6 +7524,85 @@ struct mlx5_ifc_query_adapter_in_bits { + u8 reserved_at_40[0x40]; + }; + ++struct mlx5_ifc_function_vhca_rid_info_reg_bits { ++ u8 host_number[0x8]; ++ u8 host_pci_device_function[0x8]; ++ u8 host_pci_bus[0x8]; ++ u8 reserved_at_18[0x3]; ++ u8 pci_bus_assigned[0x1]; ++ u8 function_type[0x4]; ++ ++ u8 parent_pci_device_function[0x8]; ++ u8 parent_pci_bus[0x8]; ++ u8 vhca_id[0x10]; ++ ++ u8 reserved_at_40[0x10]; ++ u8 function_id[0x10]; ++ ++ u8 reserved_at_60[0x20]; ++}; ++ ++struct mlx5_ifc_delegated_function_vhca_rid_info_bits { ++ struct mlx5_ifc_function_vhca_rid_info_reg_bits function_vhca_rid_info; ++ ++ u8 reserved_at_80[0x18]; ++ u8 manage_profile[0x8]; ++ ++ u8 reserved_at_a0[0x60]; ++}; ++ ++struct mlx5_ifc_query_delegated_vhca_out_bits { ++ u8 status[0x8]; ++ u8 reserved_at_8[0x18]; ++ ++ u8 syndrome[0x20]; ++ ++ u8 reserved_at_40[0x20]; ++ ++ u8 reserved_at_60[0x10]; ++ u8 functions_count[0x10]; ++ ++ u8 reserved_at_80[0x80]; ++ ++ struct mlx5_ifc_delegated_function_vhca_rid_info_bits ++ delegated_function_vhca_rid_info[]; ++}; ++ ++struct mlx5_ifc_query_delegated_vhca_in_bits { ++ u8 opcode[0x10]; ++ u8 uid[0x10]; ++ ++ u8 reserved_at_20[0x10]; ++ u8 op_mod[0x10]; ++ ++ u8 reserved_at_40[0x40]; ++}; ++ ++struct mlx5_ifc_create_esw_vport_out_bits { ++ u8 status[0x8]; ++ u8 reserved_at_8[0x18]; ++ ++ u8 syndrome[0x20]; ++ ++ u8 reserved_at_40[0x20]; ++ ++ u8 reserved_at_60[0x10]; ++ u8 vport_num[0x10]; ++}; ++ ++struct mlx5_ifc_create_esw_vport_in_bits { ++ u8 opcode[0x10]; ++ u8 reserved_at_10[0x10]; ++ ++ u8 reserved_at_20[0x10]; ++ u8 op_mod[0x10]; ++ ++ u8 reserved_at_40[0x10]; ++ u8 managed_vhca_id[0x10]; ++ ++ u8 reserved_at_60[0x20]; ++}; ++ + struct mlx5_ifc_qp_2rst_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; +@@ -7610,7 +7730,12 @@ struct mlx5_ifc_modify_vport_state_in_bits { + u8 reserved_at_41[0xf]; + u8 vport_number[0x10]; + +- u8 reserved_at_60[0x18]; ++ u8 reserved_at_60[0x10]; ++ u8 ingress_connect[0x1]; ++ u8 egress_connect[0x1]; ++ u8 ingress_connect_valid[0x1]; ++ u8 egress_connect_valid[0x1]; ++ u8 reserved_at_74[0x4]; + u8 admin_state[0x4]; + u8 reserved_at_7c[0x4]; + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1510-net-mlx5-e-switch-cache-vport-vhca-id-on-first-cap-query.patch b/SOURCES/1510-net-mlx5-e-switch-cache-vport-vhca-id-on-first-cap-query.patch new file mode 100644 index 000000000..1e208ec17 --- /dev/null +++ b/SOURCES/1510-net-mlx5-e-switch-cache-vport-vhca-id-on-first-cap-query.patch @@ -0,0 +1,167 @@ +From 9d3ab89a09acc99709471de7e52a0cd864903abc Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sat, 18 Apr 2026 17:12:40 -0400 +Subject: [PATCH] net/mlx5: E-Switch, Cache vport vhca id on first cap query + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 864c05b9bc4077c99730512babc991a9d92730e0 +Author: Saeed Mahameed +Date: Sat Jun 14 02:29:41 2025 -0700 + + net/mlx5: E-Switch, Cache vport vhca id on first cap query + + We need vhca_id to set up the vhca_id to vport mapping for every vport, + for that we query the firmware in mlx5_esw_vport_vhca_id_set, and it is + redundant since in esw_vport_setup, we already query hca caps which has + the vhca_id, cache it there and save 2 extra fw queries per vport. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Mark Bloch + Reviewed-by: Parav Pandit + Signed-off-by: Alexei Lazar + Reviewed-by: Feng Liu + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index 4917d185d0c3..eeffe9c4aa56 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -820,6 +820,7 @@ static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport * + + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); + vport->info.roce_enabled = MLX5_GET(cmd_hca_cap, hca_caps, roce); ++ vport->vhca_id = MLX5_GET(cmd_hca_cap, hca_caps, vhca_id); + + if (!MLX5_CAP_GEN_MAX(esw->dev, hca_cap_2)) + goto out_free; +@@ -929,7 +930,7 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, + + if (!mlx5_esw_is_manager_vport(esw, vport_num) && + MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { +- ret = mlx5_esw_vport_vhca_id_set(esw, vport_num); ++ ret = mlx5_esw_vport_vhca_id_map(esw, vport); + if (ret) + goto err_vhca_mapping; + } +@@ -973,7 +974,7 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport) + + if (!mlx5_esw_is_manager_vport(esw, vport_num) && + MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) +- mlx5_esw_vport_vhca_id_clear(esw, vport_num); ++ mlx5_esw_vport_vhca_id_unmap(esw, vport); + + if (vport->vport != MLX5_VPORT_PF && + (vport->info.ipsec_crypto_enabled || vport->info.ipsec_packet_enabled)) +@@ -1710,6 +1711,7 @@ static int mlx5_esw_vport_alloc(struct mlx5_eswitch *esw, + vport->vport = vport_num; + vport->index = index; + vport->info.link_state = MLX5_VPORT_ADMIN_STATE_AUTO; ++ vport->vhca_id = MLX5_VHCA_ID_INVALID; + INIT_WORK(&vport->vport_change_handler, esw_vport_change_handler); + err = xa_insert(&esw->vports, vport_num, vport, GFP_KERNEL); + if (err) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 45506ad56847..32aab5e1e673 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -197,6 +197,11 @@ static inline struct mlx5_vport *mlx5_devlink_port_vport_get(struct devlink_port + return mlx5_devlink_port_get(dl_port)->vport; + } + ++#define MLX5_VHCA_ID_INVALID (-1) ++ ++#define MLX5_VPORT_INVAL_VHCA_ID(vport) \ ++ ((vport)->vhca_id == MLX5_VHCA_ID_INVALID) ++ + struct mlx5_vport { + struct mlx5_core_dev *dev; + struct hlist_head uc_list[MLX5_L2_ADDR_HASH_SIZE]; +@@ -209,6 +214,7 @@ struct mlx5_vport { + struct vport_egress egress; + u32 default_metadata; + u32 metadata; ++ int vhca_id; + + struct mlx5_vport_info info; + +@@ -817,8 +823,10 @@ struct devlink_port *mlx5_esw_offloads_devlink_port(struct mlx5_eswitch *esw, u1 + + int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 *sf_base_id); + +-int mlx5_esw_vport_vhca_id_set(struct mlx5_eswitch *esw, u16 vport_num); +-void mlx5_esw_vport_vhca_id_clear(struct mlx5_eswitch *esw, u16 vport_num); ++int mlx5_esw_vport_vhca_id_map(struct mlx5_eswitch *esw, ++ struct mlx5_vport *vport); ++void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, ++ struct mlx5_vport *vport); + int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); + + /** +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index bee906661282..19decaa8a96e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -4161,23 +4161,28 @@ u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw, + } + EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_for_match); + +-int mlx5_esw_vport_vhca_id_set(struct mlx5_eswitch *esw, u16 vport_num) ++int mlx5_esw_vport_vhca_id_map(struct mlx5_eswitch *esw, ++ struct mlx5_vport *vport) + { + u16 *old_entry, *vhca_map_entry, vhca_id; +- int err; + +- err = mlx5_vport_get_vhca_id(esw->dev, vport_num, &vhca_id); +- if (err) { +- esw_warn(esw->dev, "Getting vhca_id for vport failed (vport=%u,err=%d)\n", +- vport_num, err); +- return err; ++ if (WARN_ONCE(MLX5_VPORT_INVAL_VHCA_ID(vport), ++ "vport %d vhca_id is not set", vport->vport)) { ++ int err; ++ ++ err = mlx5_vport_get_vhca_id(vport->dev, vport->vport, ++ &vhca_id); ++ if (err) ++ return err; ++ vport->vhca_id = vhca_id; + } + ++ vhca_id = vport->vhca_id; + vhca_map_entry = kmalloc(sizeof(*vhca_map_entry), GFP_KERNEL); + if (!vhca_map_entry) + return -ENOMEM; + +- *vhca_map_entry = vport_num; ++ *vhca_map_entry = vport->vport; + old_entry = xa_store(&esw->offloads.vhca_map, vhca_id, vhca_map_entry, GFP_KERNEL); + if (xa_is_err(old_entry)) { + kfree(vhca_map_entry); +@@ -4187,17 +4192,12 @@ int mlx5_esw_vport_vhca_id_set(struct mlx5_eswitch *esw, u16 vport_num) + return 0; + } + +-void mlx5_esw_vport_vhca_id_clear(struct mlx5_eswitch *esw, u16 vport_num) ++void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, ++ struct mlx5_vport *vport) + { +- u16 *vhca_map_entry, vhca_id; +- int err; +- +- err = mlx5_vport_get_vhca_id(esw->dev, vport_num, &vhca_id); +- if (err) +- esw_warn(esw->dev, "Getting vhca_id for vport failed (vport=%hu,err=%d)\n", +- vport_num, err); ++ u16 *vhca_map_entry; + +- vhca_map_entry = xa_erase(&esw->offloads.vhca_map, vhca_id); ++ vhca_map_entry = xa_erase(&esw->offloads.vhca_map, vport->vhca_id); + kfree(vhca_map_entry); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1511-net-mlx5-e-switch-set-query-hca-cap-via-vhca-id.patch b/SOURCES/1511-net-mlx5-e-switch-set-query-hca-cap-via-vhca-id.patch new file mode 100644 index 000000000..897f6507d --- /dev/null +++ b/SOURCES/1511-net-mlx5-e-switch-set-query-hca-cap-via-vhca-id.patch @@ -0,0 +1,180 @@ +From 3e3dbc4c44a45831254561a2f5423e61497e4d76 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sat, 18 Apr 2026 17:12:40 -0400 +Subject: [PATCH] net/mlx5: E-Switch, Set/Query hca cap via vhca id + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1baf30426553efb3ac518b0e9d5c1c3f8ed7762a +Author: Saeed Mahameed +Date: Mon Jun 16 15:07:59 2025 -0700 + + net/mlx5: E-Switch, Set/Query hca cap via vhca id + + Dynamically created vports require vhca id as input to set/query other + vport hca cap, when FW is capable and the vhca id of a vport is valid + use it instead of the local function id. + + Signed-off-by: Saeed Mahameed + Signed-off-by: Adithya Jayachandran + Reviewed-by: Parav Pandit + Reviewed-by: Feng Liu + Reviewed-by: William Tu + Reviewed-by: Mark Bloch + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index eeffe9c4aa56..21c42138d93c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -840,6 +840,18 @@ static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport * + return err; + } + ++bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) ++{ ++ struct mlx5_vport *vport; ++ ++ vport = mlx5_eswitch_get_vport(esw, vportn); ++ if (IS_ERR(vport) || MLX5_VPORT_INVAL_VHCA_ID(vport)) ++ return false; ++ ++ *vhca_id = vport->vhca_id; ++ return true; ++} ++ + static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) + { + bool vst_mode_steering = esw_vst_mode_is_steering(esw); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 32aab5e1e673..8e99d5e20c46 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -828,6 +828,7 @@ int mlx5_esw_vport_vhca_id_map(struct mlx5_eswitch *esw, + void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, + struct mlx5_vport *vport); + int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); ++bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id); + + /** + * struct mlx5_esw_event_info - Indicates eswitch mode changed/changing. +@@ -968,6 +969,13 @@ static inline bool mlx5_eswitch_block_ipsec(struct mlx5_core_dev *dev) + } + + static inline void mlx5_eswitch_unblock_ipsec(struct mlx5_core_dev *dev) {} ++ ++static inline bool ++mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) ++{ ++ return -EOPNOTSUPP; ++} ++ + #endif /* CONFIG_MLX5_ESWITCH */ + + #endif /* __MLX5_ESWITCH_H__ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +index da5c24fc7b30..231bedc6a252 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +@@ -36,6 +36,7 @@ + #include + #include + #include "mlx5_core.h" ++#include "eswitch.h" + #include "sf/sf.h" + + /* Mutex to hold while enabling or disabling RoCE */ +@@ -1189,18 +1190,44 @@ u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev) + } + EXPORT_SYMBOL_GPL(mlx5_query_nic_system_image_guid); + ++static bool mlx5_vport_use_vhca_id_as_func_id(struct mlx5_core_dev *dev, ++ u16 vport_num, u16 *vhca_id) ++{ ++ if (!MLX5_CAP_GEN_2(dev, function_id_type_vhca_id)) ++ return false; ++ ++ return mlx5_esw_vport_vhca_id(dev->priv.eswitch, vport_num, vhca_id); ++} ++ + int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 vport, void *out, + u16 opmod) + { +- bool ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); + u8 in[MLX5_ST_SZ_BYTES(query_hca_cap_in)] = {}; ++ u16 vhca_id = 0, function_id = 0; ++ bool ec_vf_func = false; ++ ++ /* if this vport is referring to a vport on the ec PF (embedded cpu ) ++ * let the FW know which domain we are querying since vport numbers or ++ * function_ids are not unique across the different PF domains, ++ * unless we use vhca_id as the function_id below. ++ */ ++ ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); ++ function_id = mlx5_vport_to_func_id(dev, vport, ec_vf_func); ++ ++ if (mlx5_vport_use_vhca_id_as_func_id(dev, vport, &vhca_id)) { ++ MLX5_SET(query_hca_cap_in, in, function_id_type, 1); ++ function_id = vhca_id; ++ ec_vf_func = false; ++ mlx5_core_dbg(dev, "%s using vhca_id as function_id for vport %d vhca_id 0x%x\n", ++ __func__, vport, vhca_id); ++ } + + opmod = (opmod << 1) | (HCA_CAP_OPMOD_GET_MAX & 0x01); + MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); + MLX5_SET(query_hca_cap_in, in, op_mod, opmod); +- MLX5_SET(query_hca_cap_in, in, function_id, mlx5_vport_to_func_id(dev, vport, ec_vf_func)); + MLX5_SET(query_hca_cap_in, in, other_function, true); + MLX5_SET(query_hca_cap_in, in, ec_vf_function, ec_vf_func); ++ MLX5_SET(query_hca_cap_in, in, function_id, function_id); + return mlx5_cmd_exec_inout(dev, query_hca_cap, in, out); + } + EXPORT_SYMBOL_GPL(mlx5_vport_get_other_func_cap); +@@ -1233,8 +1260,9 @@ int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id) + int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, + u16 vport, u16 opmod) + { +- bool ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); + int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in); ++ u16 vhca_id = 0, function_id = 0; ++ bool ec_vf_func = false; + void *set_hca_cap; + void *set_ctx; + int ret; +@@ -1243,14 +1271,29 @@ int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap + if (!set_ctx) + return -ENOMEM; + ++ /* if this vport is referring to a vport on the ec PF (embedded cpu ) ++ * let the FW know which domain we are querying since vport numbers or ++ * function_ids are not unique across the different PF domains, ++ * unless we use vhca_id as the function_id below. ++ */ ++ ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); ++ function_id = mlx5_vport_to_func_id(dev, vport, ec_vf_func); ++ ++ if (mlx5_vport_use_vhca_id_as_func_id(dev, vport, &vhca_id)) { ++ MLX5_SET(set_hca_cap_in, set_ctx, function_id_type, 1); ++ function_id = vhca_id; ++ ec_vf_func = false; ++ mlx5_core_dbg(dev, "%s using vhca_id as function_id for vport %d vhca_id 0x%x\n", ++ __func__, vport, vhca_id); ++ } ++ + MLX5_SET(set_hca_cap_in, set_ctx, opcode, MLX5_CMD_OP_SET_HCA_CAP); + MLX5_SET(set_hca_cap_in, set_ctx, op_mod, opmod << 1); + set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, capability); + memcpy(set_hca_cap, hca_cap, MLX5_ST_SZ_BYTES(cmd_hca_cap)); +- MLX5_SET(set_hca_cap_in, set_ctx, function_id, +- mlx5_vport_to_func_id(dev, vport, ec_vf_func)); + MLX5_SET(set_hca_cap_in, set_ctx, other_function, true); + MLX5_SET(set_hca_cap_in, set_ctx, ec_vf_function, ec_vf_func); ++ MLX5_SET(set_hca_cap_in, set_ctx, function_id, function_id); + ret = mlx5_cmd_exec_in(dev, set_hca_cap, set_ctx); + + kfree(set_ctx); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1512-rdma-net-mlx5-export-mlx5-vport-get-vhca-id.patch b/SOURCES/1512-rdma-net-mlx5-export-mlx5-vport-get-vhca-id.patch new file mode 100644 index 000000000..9b52bc1b9 --- /dev/null +++ b/SOURCES/1512-rdma-net-mlx5-export-mlx5-vport-get-vhca-id.patch @@ -0,0 +1,213 @@ +From 113633f8dbc3d4d8c2bc4327fa79c49e99a9170a Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sat, 18 Apr 2026 17:12:40 -0400 +Subject: [PATCH] {rdma,net}/mlx5: export mlx5_vport_get_vhca_id + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 40653f280b2640e5caa94eeedee43e0f1df97704 +Author: Saeed Mahameed +Date: Mon Jun 16 17:28:20 2025 -0700 + + {rdma,net}/mlx5: export mlx5_vport_get_vhca_id + + vhca id is already cached in the vport structure no need to query on + every mlx5 layer, use the mlx5_vport_get_vhca_id, where possible. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Mark Bloch + Reviewed-by: Parav Pandit + Signed-off-by: Alexei Lazar + Reviewed-by: Feng Liu + Reviewed-by: Tariq Toukan + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/std_types.c b/drivers/infiniband/hw/mlx5/std_types.c +index bdb568411091..2fcf553044e1 100644 +--- a/drivers/infiniband/hw/mlx5/std_types.c ++++ b/drivers/infiniband/hw/mlx5/std_types.c +@@ -83,33 +83,14 @@ static int fill_vport_icm_addr(struct mlx5_core_dev *mdev, u16 vport, + static int fill_vport_vhca_id(struct mlx5_core_dev *mdev, u16 vport, + struct mlx5_ib_uapi_query_port *info) + { +- size_t out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); +- u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; +- void *out; +- int err; +- +- out = kzalloc(out_sz, GFP_KERNEL); +- if (!out) +- return -ENOMEM; ++ int err = mlx5_vport_get_vhca_id(mdev, vport, &info->vport_vhca_id); + +- MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); +- MLX5_SET(query_hca_cap_in, in, other_function, true); +- MLX5_SET(query_hca_cap_in, in, function_id, vport); +- MLX5_SET(query_hca_cap_in, in, op_mod, +- MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE | +- HCA_CAP_OPMOD_GET_CUR); +- +- err = mlx5_cmd_exec(mdev, in, sizeof(in), out, out_sz); + if (err) +- goto out; +- +- info->vport_vhca_id = MLX5_GET(query_hca_cap_out, out, +- capability.cmd_hca_cap.vhca_id); ++ return err; + + info->flags |= MLX5_IB_UAPI_QUERY_PORT_VPORT_VHCA_ID; +-out: +- kfree(out); +- return err; ++ ++ return 0; + } + + static int fill_multiport_info(struct mlx5_ib_dev *dev, u32 port_num, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +index 878f9b46bf18..73f5b62b8c7f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +@@ -1,6 +1,8 @@ + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB + /* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. */ + ++#include ++ + #include "reporter_vnic.h" + #include "en_stats.h" + #include "devlink.h" +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index 9d3504f5abfa..082259b56816 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -449,8 +449,6 @@ int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap + #define mlx5_vport_get_other_func_general_cap(dev, vport, out) \ + mlx5_vport_get_other_func_cap(dev, vport, out, MLX5_CAP_GENERAL) + +-int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id); +- + static inline u32 mlx5_sriov_get_vf_total_msix(struct pci_dev *pdev) + { + struct mlx5_core_dev *dev = pci_get_drvdata(pdev); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +index 0bdcab2e5cf3..acb0317f930b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +@@ -1200,22 +1200,28 @@ int mlx5hws_cmd_query_caps(struct mlx5_core_dev *mdev, + int mlx5hws_cmd_query_gvmi(struct mlx5_core_dev *mdev, bool other_function, + u16 vport_number, u16 *gvmi) + { +- bool ec_vf_func = other_function ? mlx5_core_is_ec_vf_vport(mdev, vport_number) : false; + u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; + int out_size; + void *out; + int err; + ++ if (other_function) { ++ err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); ++ if (!err) ++ return 0; ++ ++ mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", ++ vport_number); ++ return err; ++ } ++ ++ /* get vhca_id for `this` function */ + out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out); + out = kzalloc(out_size, GFP_KERNEL); + if (!out) + return -ENOMEM; + + MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); +- MLX5_SET(query_hca_cap_in, in, other_function, other_function); +- MLX5_SET(query_hca_cap_in, in, function_id, +- mlx5_vport_to_func_id(mdev, vport_number, ec_vf_func)); +- MLX5_SET(query_hca_cap_in, in, ec_vf_function, ec_vf_func); + MLX5_SET(query_hca_cap_in, in, op_mod, + MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 | HCA_CAP_OPMOD_GET_CUR); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c +index baefb9a3fa05..bf99b933fd14 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c +@@ -2,6 +2,7 @@ + /* Copyright (c) 2019 Mellanox Technologies. */ + + #include "dr_types.h" ++#include "eswitch.h" + + int mlx5dr_cmd_query_esw_vport_context(struct mlx5_core_dev *mdev, + bool other_vport, +@@ -34,21 +35,28 @@ int mlx5dr_cmd_query_esw_vport_context(struct mlx5_core_dev *mdev, + int mlx5dr_cmd_query_gvmi(struct mlx5_core_dev *mdev, bool other_vport, + u16 vport_number, u16 *gvmi) + { +- bool ec_vf_func = other_vport ? mlx5_core_is_ec_vf_vport(mdev, vport_number) : false; + u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; + int out_size; + void *out; + int err; + ++ if (other_vport) { ++ err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); ++ if (!err) ++ return 0; ++ ++ mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", ++ vport_number); ++ return err; ++ } ++ ++ /* get vhca_id for `this` function */ + out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out); + out = kzalloc(out_size, GFP_KERNEL); + if (!out) + return -ENOMEM; + + MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); +- MLX5_SET(query_hca_cap_in, in, other_function, other_vport); +- MLX5_SET(query_hca_cap_in, in, function_id, mlx5_vport_to_func_id(mdev, vport_number, ec_vf_func)); +- MLX5_SET(query_hca_cap_in, in, ec_vf_function, ec_vf_func); + MLX5_SET(query_hca_cap_in, in, op_mod, + MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 | + HCA_CAP_OPMOD_GET_CUR); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +index 231bedc6a252..2ed2e530b07d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +@@ -1239,7 +1239,9 @@ int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id) + void *hca_caps; + int err; + +- *vhca_id = 0; ++ /* try get vhca_id via eswitch */ ++ if (mlx5_esw_vport_vhca_id(dev->priv.eswitch, vport, vhca_id)) ++ return 0; + + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); + if (!query_ctx) +@@ -1256,6 +1258,7 @@ int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id) + kfree(query_ctx); + return err; + } ++EXPORT_SYMBOL_GPL(mlx5_vport_get_vhca_id); + + int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, + u16 vport, u16 opmod) +diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h +index c36cc6d82926..c87b9507cfa1 100644 +--- a/include/linux/mlx5/vport.h ++++ b/include/linux/mlx5/vport.h +@@ -135,4 +135,6 @@ int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev); + u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev); + int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 vport, void *out, + u16 opmod); ++int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id); ++ + #endif /* __MLX5_VPORT_H__ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1513-net-mlx5-query-to-see-if-host-pf-is-disabled.patch b/SOURCES/1513-net-mlx5-query-to-see-if-host-pf-is-disabled.patch new file mode 100644 index 000000000..6268668d2 --- /dev/null +++ b/SOURCES/1513-net-mlx5-query-to-see-if-host-pf-is-disabled.patch @@ -0,0 +1,80 @@ +From bb5b306016d1fbae12cb837c79c4311c5a6449f2 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sat, 18 Apr 2026 17:12:40 -0400 +Subject: [PATCH] net/mlx5: Query to see if host PF is disabled + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9e84de72aef9bcf0e751a0bff3ac91b0cf52366f +Author: Daniel Jurgens +Date: Wed Aug 13 22:19:55 2025 +0300 + + net/mlx5: Query to see if host PF is disabled + + The host PF can be disabled, query firmware to check if the host PF of + this function exists. + + Signed-off-by: Daniel Jurgens + Reviewed-by: William Tu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1755112796-467444-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index 21c42138d93c..e8e053aa4111 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -1051,6 +1051,25 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev) + return ERR_PTR(err); + } + ++static int mlx5_esw_host_functions_enabled_query(struct mlx5_eswitch *esw) ++{ ++ const u32 *query_host_out; ++ ++ if (!mlx5_core_is_ecpf_esw_manager(esw->dev)) ++ return 0; ++ ++ query_host_out = mlx5_esw_query_functions(esw->dev); ++ if (IS_ERR(query_host_out)) ++ return PTR_ERR(query_host_out); ++ ++ esw->esw_funcs.host_funcs_disabled = ++ MLX5_GET(query_esw_functions_out, query_host_out, ++ host_params_context.host_pf_not_exist); ++ ++ kvfree(query_host_out); ++ return 0; ++} ++ + static void mlx5_eswitch_event_handler_register(struct mlx5_eswitch *esw) + { + if (esw->mode == MLX5_ESWITCH_OFFLOADS && mlx5_eswitch_is_funcs_handler(esw->dev)) { +@@ -1888,6 +1907,10 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) + goto abort; + } + ++ err = mlx5_esw_host_functions_enabled_query(esw); ++ if (err) ++ goto abort; ++ + err = mlx5_esw_vports_init(esw); + if (err) + goto abort; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 8e99d5e20c46..eacce5bece10 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -329,6 +329,7 @@ struct mlx5_host_work { + + struct mlx5_esw_functions { + struct mlx5_nb nb; ++ bool host_funcs_disabled; + u16 num_vfs; + u16 num_ec_vfs; + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1514-net-mlx5-support-disabling-host-pfs.patch b/SOURCES/1514-net-mlx5-support-disabling-host-pfs.patch new file mode 100644 index 000000000..6a3113c58 --- /dev/null +++ b/SOURCES/1514-net-mlx5-support-disabling-host-pfs.patch @@ -0,0 +1,257 @@ +From c6a2b2fe5d05fec43296bf69739c4cdee2efb6b5 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:19:36 -0400 +Subject: [PATCH] net/mlx5: Support disabling host PFs + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Context diff due to an upstream conflicts that fixed in the following: +38dad812bb50 ("Merge tag 'mlx5-next-vhca-id' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux") + +commit 520369ef43a8504f9d54ee219bb6c692d2e40028 +Author: Daniel Jurgens +Date: Wed Aug 13 22:19:56 2025 +0300 + + net/mlx5: Support disabling host PFs + + Some devices support disabling the physical function on the host. When + this is configured the vports for the host functions do not exist. + + This patch checks if host functions are enabled before trying to access + their vports. + + Signed-off-by: Daniel Jurgens + Reviewed-by: William Tu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1755112796-467444-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index e8e053aa4111..9fe5a45124fd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -1310,17 +1310,19 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, + esw->mode == MLX5_ESWITCH_LEGACY; + + /* Enable PF vport */ +- if (pf_needed) { ++ if (pf_needed && mlx5_esw_host_functions_enabled(esw->dev)) { + ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_PF, + enabled_events); + if (ret) + return ret; + } + +- /* Enable external host PF HCA */ +- ret = host_pf_enable_hca(esw->dev); +- if (ret) +- goto pf_hca_err; ++ if (mlx5_esw_host_functions_enabled(esw->dev)) { ++ /* Enable external host PF HCA */ ++ ret = host_pf_enable_hca(esw->dev); ++ if (ret) ++ goto pf_hca_err; ++ } + + /* Enable ECPF vport */ + if (mlx5_ecpf_vport_exists(esw->dev)) { +@@ -1352,9 +1354,10 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, + if (mlx5_ecpf_vport_exists(esw->dev)) + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF); + ecpf_err: +- host_pf_disable_hca(esw->dev); ++ if (mlx5_esw_host_functions_enabled(esw->dev)) ++ host_pf_disable_hca(esw->dev); + pf_hca_err: +- if (pf_needed) ++ if (pf_needed && mlx5_esw_host_functions_enabled(esw->dev)) + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); + return ret; + } +@@ -1374,10 +1377,12 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw) + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF); + } + +- host_pf_disable_hca(esw->dev); ++ if (mlx5_esw_host_functions_enabled(esw->dev)) ++ host_pf_disable_hca(esw->dev); + +- if (mlx5_core_is_ecpf_esw_manager(esw->dev) || +- esw->mode == MLX5_ESWITCH_LEGACY) ++ if ((mlx5_core_is_ecpf_esw_manager(esw->dev) || ++ esw->mode == MLX5_ESWITCH_LEGACY) && ++ mlx5_esw_host_functions_enabled(esw->dev)) + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); + } + +@@ -1706,7 +1711,8 @@ int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 * + void *hca_caps; + int err; + +- if (!mlx5_core_is_ecpf(dev)) { ++ if (!mlx5_core_is_ecpf(dev) || ++ !mlx5_esw_host_functions_enabled(dev)) { + *max_sfs = 0; + return 0; + } +@@ -1783,21 +1789,23 @@ static int mlx5_esw_vports_init(struct mlx5_eswitch *esw) + + xa_init(&esw->vports); + +- err = mlx5_esw_vport_alloc(esw, idx, MLX5_VPORT_PF); +- if (err) +- goto err; +- if (esw->first_host_vport == MLX5_VPORT_PF) +- xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); +- idx++; +- +- for (i = 0; i < mlx5_core_max_vfs(dev); i++) { +- err = mlx5_esw_vport_alloc(esw, idx, idx); ++ if (mlx5_esw_host_functions_enabled(dev)) { ++ err = mlx5_esw_vport_alloc(esw, idx, MLX5_VPORT_PF); + if (err) + goto err; +- xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_VF); +- xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); ++ if (esw->first_host_vport == MLX5_VPORT_PF) ++ xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); + idx++; ++ for (i = 0; i < mlx5_core_max_vfs(dev); i++) { ++ err = mlx5_esw_vport_alloc(esw, idx, idx); ++ if (err) ++ goto err; ++ xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_VF); ++ xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); ++ idx++; ++ } + } ++ + base_sf_num = mlx5_sf_start_function_id(dev); + for (i = 0; i < mlx5_sf_max_functions(dev); i++) { + err = mlx5_esw_vport_alloc(esw, idx, base_sf_num + i); +@@ -1897,6 +1905,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) + goto free_esw; + + esw->dev = dev; ++ dev->priv.eswitch = esw; + esw->manager_vport = mlx5_eswitch_manager_vport(dev); + esw->first_host_vport = mlx5_eswitch_first_host_vport_num(dev); + +@@ -1915,7 +1924,6 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) + if (err) + goto abort; + +- dev->priv.eswitch = esw; + err = esw_offloads_init(esw); + if (err) + goto reps_err; +@@ -2447,3 +2455,11 @@ void mlx5_eswitch_unblock_ipsec(struct mlx5_core_dev *dev) + dev->num_ipsec_offloads--; + mutex_unlock(&esw->state_lock); + } ++ ++bool mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev) ++{ ++ if (!dev->priv.eswitch) ++ return true; ++ ++ return !dev->priv.eswitch->esw_funcs.host_funcs_disabled; ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index eacce5bece10..cfd6b1b8c6f4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -903,6 +903,7 @@ int mlx5_esw_ipsec_vf_packet_offload_set(struct mlx5_eswitch *esw, struct mlx5_v + bool enable); + int mlx5_esw_ipsec_vf_packet_offload_supported(struct mlx5_core_dev *dev, + u16 vport_num); ++bool mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev); + #else /* CONFIG_MLX5_ESWITCH */ + /* eswitch API stubs */ + static inline int mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; } +@@ -971,6 +972,12 @@ static inline bool mlx5_eswitch_block_ipsec(struct mlx5_core_dev *dev) + + static inline void mlx5_eswitch_unblock_ipsec(struct mlx5_core_dev *dev) {} + ++static inline bool ++mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev) ++{ ++ return true; ++} ++ + static inline bool + mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) + { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 19decaa8a96e..cdba7bc448ee 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -1213,7 +1213,8 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, + misc_parameters); + +- if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { ++ if (mlx5_core_is_ecpf_esw_manager(peer_dev) && ++ mlx5_esw_host_functions_enabled(peer_dev)) { + peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); + esw_set_peer_miss_rule_source_port(esw, peer_esw, spec, + MLX5_VPORT_PF); +@@ -1239,19 +1240,21 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + flows[peer_vport->index] = flow; + } + +- mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, +- mlx5_core_max_vfs(peer_dev)) { +- esw_set_peer_miss_rule_source_port(esw, +- peer_esw, +- spec, peer_vport->vport); ++ if (mlx5_esw_host_functions_enabled(esw->dev)) { ++ mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_vfs(peer_dev)) { ++ esw_set_peer_miss_rule_source_port(esw, peer_esw, ++ spec, ++ peer_vport->vport); + +- flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), +- spec, &flow_act, &dest, 1); +- if (IS_ERR(flow)) { +- err = PTR_ERR(flow); +- goto add_vf_flow_err; ++ flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), ++ spec, &flow_act, &dest, 1); ++ if (IS_ERR(flow)) { ++ err = PTR_ERR(flow); ++ goto add_vf_flow_err; ++ } ++ flows[peer_vport->index] = flow; + } +- flows[peer_vport->index] = flow; + } + + if (mlx5_core_ec_sriov_enabled(peer_dev)) { +@@ -1301,7 +1304,9 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + mlx5_del_flow_rules(flows[peer_vport->index]); + } + add_ecpf_flow_err: +- if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { ++ ++ if (mlx5_core_is_ecpf_esw_manager(peer_dev) && ++ mlx5_esw_host_functions_enabled(peer_dev)) { + peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); + mlx5_del_flow_rules(flows[peer_vport->index]); + } +@@ -4059,7 +4064,8 @@ mlx5_eswitch_vport_has_rep(const struct mlx5_eswitch *esw, u16 vport_num) + { + /* Currently, only ECPF based device has representor for host PF. */ + if (vport_num == MLX5_VPORT_PF && +- !mlx5_core_is_ecpf_esw_manager(esw->dev)) ++ (!mlx5_core_is_ecpf_esw_manager(esw->dev) || ++ !mlx5_esw_host_functions_enabled(esw->dev))) + return false; + + if (vport_num == MLX5_VPORT_ECPF && +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1515-net-mlx5e-set-default-burst-period-for-tx-and-rx-reporters.patch b/SOURCES/1515-net-mlx5e-set-default-burst-period-for-tx-and-rx-reporters.patch new file mode 100644 index 000000000..136c0ba8d --- /dev/null +++ b/SOURCES/1515-net-mlx5e-set-default-burst-period-for-tx-and-rx-reporters.patch @@ -0,0 +1,89 @@ +From ae5fd3463475a68bcecf16609574d3df05474f3b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:56 -0400 +Subject: [PATCH] net/mlx5e: Set default burst period for TX and RX reporters + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2d5ccb93bbb4f161ccb677ce0b2ebcfe4a089d62 +Author: Shahar Shitrit +Date: Sun Aug 24 11:43:54 2025 +0300 + + net/mlx5e: Set default burst period for TX and RX reporters + + System errors can sometimes cause multiple errors to be reported + to the TX reporter at the same time. For instance, lost interrupts + may cause several SQs to time out simultaneously. When dev_watchdog + notifies the driver for that, it iterates over all SQs to trigger + recovery for the timed-out ones, via TX health reporter. + However, grace period allows only one recovery at a time, so only + the first SQ recovers while others remain blocked. Since no further + recoveries are allowed during the grace period, subsequent errors + cause the reporter to enter an ERROR state, requiring manual + intervention. + + To address this, set the TX reporter's default burst period + to 0.5 second. This allows the reporter to detect and handle all + timed-out SQs within this window before initiating the grace period. + + To account for the possibility of a similar issue in the RX reporter, + its default burst period is also configured. + + Additionally, while here, align the TX definition prefix with the RX, + as these are used only in EN driver. + + Signed-off-by: Shahar Shitrit + Reviewed-by: Dragos Tatulea + Reviewed-by: Carolina Jubran + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20250824084354.533182-6-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +index 1b9ea72abc5a..eb1cace5910c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +@@ -652,6 +652,7 @@ void mlx5e_reporter_icosq_resume_recovery(struct mlx5e_channel *c) + } + + #define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500 ++#define MLX5E_REPORTER_RX_BURST_PERIOD 500 + + static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = { + .name = "rx", +@@ -659,6 +660,7 @@ static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = { + .diagnose = mlx5e_rx_reporter_diagnose, + .dump = mlx5e_rx_reporter_dump, + .default_graceful_period = MLX5E_REPORTER_RX_GRACEFUL_PERIOD, ++ .default_burst_period = MLX5E_REPORTER_RX_BURST_PERIOD, + }; + + void mlx5e_reporter_rx_create(struct mlx5e_priv *priv) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +index 069ab8aaac5c..8907c5378f54 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +@@ -543,14 +543,16 @@ void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq) + mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx); + } + +-#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500 ++#define MLX5E_REPORTER_TX_GRACEFUL_PERIOD 500 ++#define MLX5E_REPORTER_TX_BURST_PERIOD 500 + + static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = { + .name = "tx", + .recover = mlx5e_tx_reporter_recover, + .diagnose = mlx5e_tx_reporter_diagnose, + .dump = mlx5e_tx_reporter_dump, +- .default_graceful_period = MLX5_REPORTER_TX_GRACEFUL_PERIOD, ++ .default_graceful_period = MLX5E_REPORTER_TX_GRACEFUL_PERIOD, ++ .default_burst_period = MLX5E_REPORTER_TX_BURST_PERIOD, + }; + + void mlx5e_reporter_tx_create(struct mlx5e_priv *priv) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1516-eth-mlx5-remove-kconfig-co-dependency-with-vxlan.patch b/SOURCES/1516-eth-mlx5-remove-kconfig-co-dependency-with-vxlan.patch new file mode 100644 index 000000000..c46fe66fb --- /dev/null +++ b/SOURCES/1516-eth-mlx5-remove-kconfig-co-dependency-with-vxlan.patch @@ -0,0 +1,57 @@ +From c2622b6a6d390d99cd9e0fee8ad6a711265c46d6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:56 -0400 +Subject: [PATCH] eth: mlx5: remove Kconfig co-dependency with VXLAN + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 15d157c3ad018a7b9d99d0cf35d6b163e570728e +Author: Jakub Kicinski +Date: Wed Aug 27 16:43:19 2025 -0700 + + eth: mlx5: remove Kconfig co-dependency with VXLAN + + mlx5 has a Kconfig co-dependency on VXLAN, even tho it doesn't + call any VXLAN function (unlike mlxsw). Perhaps this dates back + to very old days when tunnel ports were fetched directly from + VXLAN. + + Remove the dependency to allow MLX5=y + VXLAN=m kernel configs. + But still avoid compiling in the lib/vxlan code if VXLAN=n. + + Reviewed-by: Saeed Mahameed + Link: https://patch.msgid.link/20250827234319.3504852-1-kuba@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +index 6ec7d6e0181d..8ef2ac2060ba 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig ++++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +@@ -8,7 +8,6 @@ config MLX5_CORE + depends on PCI + select AUXILIARY_BUS + select NET_DEVLINK +- depends on VXLAN || !VXLAN + depends on MLXFW || !MLXFW + depends on PTP_1588_CLOCK_OPTIONAL + depends on PCI_HYPERV_INTERFACE || !PCI_HYPERV_INTERFACE +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +index a253c73db9e5..206223ce63a8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile ++++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +@@ -85,7 +85,9 @@ mlx5_core-$(CONFIG_MLX5_BRIDGE) += esw/bridge.o esw/bridge_mcast.o esw/bridge + + mlx5_core-$(CONFIG_HWMON) += hwmon.o + mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o +-mlx5_core-$(CONFIG_VXLAN) += lib/vxlan.o ++ifneq ($(CONFIG_VXLAN),) ++ mlx5_core-y += lib/vxlan.o ++endif + mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o + mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o lib/hv_vhca.o + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1517-net-mlx5-fs-convert-vport-acls-root-namespaces-to-xarray.patch b/SOURCES/1517-net-mlx5-fs-convert-vport-acls-root-namespaces-to-xarray.patch new file mode 100644 index 000000000..2dc5dcadb --- /dev/null +++ b/SOURCES/1517-net-mlx5-fs-convert-vport-acls-root-namespaces-to-xarray.patch @@ -0,0 +1,327 @@ +From 97520697555ea273161d35f4f201b27d6dbc02d9 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:56 -0400 +Subject: [PATCH] net/mlx5: FS, Convert vport acls root namespaces to xarray + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2e894b99c0172759f89af506cdd898ced0b14e13 +Author: Saeed Mahameed +Date: Fri Aug 29 15:37:16 2025 -0700 + + net/mlx5: FS, Convert vport acls root namespaces to xarray + + Before this patch it was a linear array and could only support a certain + number of vports, in the next patches, vport numbers are not bound to a + well known limit, thus convert acl root name space storage to xarray. + + In addition create fs_core public API to add/remove vport acl namespaces + as it is the eswitch responsibility to create the vports and their + root name spaces for acls, in the next patch we will move + mlx5_fs_ingress_acls_{init,cleanup} to eswitch and will use + the individual mlx5_fs_vport_{egress,ingresS}_acl_ns_{add,remove} + APIs for dynamically create vports. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250829223722.900629-2-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 80245c38dbad..6028c163d9a2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -2793,30 +2793,32 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev, + } + EXPORT_SYMBOL(mlx5_get_flow_namespace); + ++struct mlx5_vport_acl_root_ns { ++ u16 vport_idx; ++ struct mlx5_flow_root_namespace *root_ns; ++}; ++ + struct mlx5_flow_namespace * + mlx5_get_flow_vport_namespace(struct mlx5_core_dev *dev, + enum mlx5_flow_namespace_type type, int vport_idx) + { + struct mlx5_flow_steering *steering = dev->priv.steering; ++ struct mlx5_vport_acl_root_ns *vport_ns; + + if (!steering) + return NULL; + + switch (type) { + case MLX5_FLOW_NAMESPACE_ESW_EGRESS: +- if (vport_idx >= steering->esw_egress_acl_vports) +- return NULL; +- if (steering->esw_egress_root_ns && +- steering->esw_egress_root_ns[vport_idx]) +- return &steering->esw_egress_root_ns[vport_idx]->ns; ++ vport_ns = xa_load(&steering->esw_egress_root_ns, vport_idx); ++ if (vport_ns) ++ return &vport_ns->root_ns->ns; + else + return NULL; + case MLX5_FLOW_NAMESPACE_ESW_INGRESS: +- if (vport_idx >= steering->esw_ingress_acl_vports) +- return NULL; +- if (steering->esw_ingress_root_ns && +- steering->esw_ingress_root_ns[vport_idx]) +- return &steering->esw_ingress_root_ns[vport_idx]->ns; ++ vport_ns = xa_load(&steering->esw_ingress_root_ns, vport_idx); ++ if (vport_ns) ++ return &vport_ns->root_ns->ns; + else + return NULL; + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX: +@@ -3575,30 +3577,102 @@ static int init_fdb_root_ns(struct mlx5_flow_steering *steering) + return err; + } + +-static int init_egress_acl_root_ns(struct mlx5_flow_steering *steering, int vport) ++static void ++mlx5_fs_remove_vport_acl_root_ns(struct xarray *esw_acl_root_ns, u16 vport_idx) ++{ ++ struct mlx5_vport_acl_root_ns *vport_ns; ++ ++ vport_ns = xa_erase(esw_acl_root_ns, vport_idx); ++ if (vport_ns) { ++ cleanup_root_ns(vport_ns->root_ns); ++ kfree(vport_ns); ++ } ++} ++ ++static int ++mlx5_fs_add_vport_acl_root_ns(struct mlx5_flow_steering *steering, ++ struct xarray *esw_acl_root_ns, ++ enum fs_flow_table_type table_type, ++ u16 vport_idx) + { ++ struct mlx5_vport_acl_root_ns *vport_ns; + struct fs_prio *prio; ++ int err; ++ ++ /* sanity check, intended xarrays are used */ ++ if (WARN_ON(esw_acl_root_ns != &steering->esw_egress_root_ns && ++ esw_acl_root_ns != &steering->esw_ingress_root_ns)) ++ return -EINVAL; + +- steering->esw_egress_root_ns[vport] = create_root_ns(steering, FS_FT_ESW_EGRESS_ACL); +- if (!steering->esw_egress_root_ns[vport]) ++ if (table_type != FS_FT_ESW_EGRESS_ACL && ++ table_type != FS_FT_ESW_INGRESS_ACL) { ++ mlx5_core_err(steering->dev, ++ "Invalid table type %d for egress/ingress ACLs\n", ++ table_type); ++ return -EINVAL; ++ } ++ ++ if (xa_load(esw_acl_root_ns, vport_idx)) ++ return -EEXIST; ++ ++ vport_ns = kzalloc(sizeof(*vport_ns), GFP_KERNEL); ++ if (!vport_ns) + return -ENOMEM; + ++ vport_ns->root_ns = create_root_ns(steering, table_type); ++ if (!vport_ns->root_ns) { ++ err = -ENOMEM; ++ goto kfree_vport_ns; ++ } ++ + /* create 1 prio*/ +- prio = fs_create_prio(&steering->esw_egress_root_ns[vport]->ns, 0, 1); +- return PTR_ERR_OR_ZERO(prio); ++ prio = fs_create_prio(&vport_ns->root_ns->ns, 0, 1); ++ if (IS_ERR(prio)) { ++ err = PTR_ERR(prio); ++ goto cleanup_root_ns; ++ } ++ ++ vport_ns->vport_idx = vport_idx; ++ err = xa_insert(esw_acl_root_ns, vport_idx, vport_ns, GFP_KERNEL); ++ if (err) ++ goto cleanup_root_ns; ++ return 0; ++ ++cleanup_root_ns: ++ cleanup_root_ns(vport_ns->root_ns); ++kfree_vport_ns: ++ kfree(vport_ns); ++ return err; + } + +-static int init_ingress_acl_root_ns(struct mlx5_flow_steering *steering, int vport) ++int mlx5_fs_vport_egress_acl_ns_add(struct mlx5_flow_steering *steering, ++ u16 vport_idx) + { +- struct fs_prio *prio; ++ return mlx5_fs_add_vport_acl_root_ns(steering, ++ &steering->esw_egress_root_ns, ++ FS_FT_ESW_EGRESS_ACL, vport_idx); ++} + +- steering->esw_ingress_root_ns[vport] = create_root_ns(steering, FS_FT_ESW_INGRESS_ACL); +- if (!steering->esw_ingress_root_ns[vport]) +- return -ENOMEM; ++int mlx5_fs_vport_ingress_acl_ns_add(struct mlx5_flow_steering *steering, ++ u16 vport_idx) ++{ ++ return mlx5_fs_add_vport_acl_root_ns(steering, ++ &steering->esw_ingress_root_ns, ++ FS_FT_ESW_INGRESS_ACL, vport_idx); ++} + +- /* create 1 prio*/ +- prio = fs_create_prio(&steering->esw_ingress_root_ns[vport]->ns, 0, 1); +- return PTR_ERR_OR_ZERO(prio); ++void mlx5_fs_vport_egress_acl_ns_remove(struct mlx5_flow_steering *steering, ++ int vport_idx) ++{ ++ mlx5_fs_remove_vport_acl_root_ns(&steering->esw_egress_root_ns, ++ vport_idx); ++} ++ ++void mlx5_fs_vport_ingress_acl_ns_remove(struct mlx5_flow_steering *steering, ++ int vport_idx) ++{ ++ mlx5_fs_remove_vport_acl_root_ns(&steering->esw_ingress_root_ns, ++ vport_idx); + } + + int mlx5_fs_egress_acls_init(struct mlx5_core_dev *dev, int total_vports) +@@ -3607,15 +3681,10 @@ int mlx5_fs_egress_acls_init(struct mlx5_core_dev *dev, int total_vports) + int err; + int i; + +- steering->esw_egress_root_ns = +- kcalloc(total_vports, +- sizeof(*steering->esw_egress_root_ns), +- GFP_KERNEL); +- if (!steering->esw_egress_root_ns) +- return -ENOMEM; ++ xa_init(&steering->esw_egress_root_ns); + + for (i = 0; i < total_vports; i++) { +- err = init_egress_acl_root_ns(steering, i); ++ err = mlx5_fs_vport_egress_acl_ns_add(steering, i); + if (err) + goto cleanup_root_ns; + } +@@ -3623,10 +3692,9 @@ int mlx5_fs_egress_acls_init(struct mlx5_core_dev *dev, int total_vports) + return 0; + + cleanup_root_ns: +- for (i--; i >= 0; i--) +- cleanup_root_ns(steering->esw_egress_root_ns[i]); +- kfree(steering->esw_egress_root_ns); +- steering->esw_egress_root_ns = NULL; ++ while (i--) ++ mlx5_fs_vport_egress_acl_ns_remove(steering, i); ++ xa_destroy(&steering->esw_egress_root_ns); + return err; + } + +@@ -3635,14 +3703,10 @@ void mlx5_fs_egress_acls_cleanup(struct mlx5_core_dev *dev) + struct mlx5_flow_steering *steering = dev->priv.steering; + int i; + +- if (!steering->esw_egress_root_ns) +- return; +- + for (i = 0; i < steering->esw_egress_acl_vports; i++) +- cleanup_root_ns(steering->esw_egress_root_ns[i]); ++ mlx5_fs_vport_egress_acl_ns_remove(steering, i); + +- kfree(steering->esw_egress_root_ns); +- steering->esw_egress_root_ns = NULL; ++ xa_destroy(&steering->esw_egress_root_ns); + } + + int mlx5_fs_ingress_acls_init(struct mlx5_core_dev *dev, int total_vports) +@@ -3651,15 +3715,10 @@ int mlx5_fs_ingress_acls_init(struct mlx5_core_dev *dev, int total_vports) + int err; + int i; + +- steering->esw_ingress_root_ns = +- kcalloc(total_vports, +- sizeof(*steering->esw_ingress_root_ns), +- GFP_KERNEL); +- if (!steering->esw_ingress_root_ns) +- return -ENOMEM; ++ xa_init(&steering->esw_ingress_root_ns); + + for (i = 0; i < total_vports; i++) { +- err = init_ingress_acl_root_ns(steering, i); ++ err = mlx5_fs_vport_ingress_acl_ns_add(steering, i); + if (err) + goto cleanup_root_ns; + } +@@ -3667,10 +3726,10 @@ int mlx5_fs_ingress_acls_init(struct mlx5_core_dev *dev, int total_vports) + return 0; + + cleanup_root_ns: +- for (i--; i >= 0; i--) +- cleanup_root_ns(steering->esw_ingress_root_ns[i]); +- kfree(steering->esw_ingress_root_ns); +- steering->esw_ingress_root_ns = NULL; ++ while (i--) ++ mlx5_fs_vport_ingress_acl_ns_remove(steering, i); ++ ++ xa_destroy(&steering->esw_ingress_root_ns); + return err; + } + +@@ -3679,14 +3738,10 @@ void mlx5_fs_ingress_acls_cleanup(struct mlx5_core_dev *dev) + struct mlx5_flow_steering *steering = dev->priv.steering; + int i; + +- if (!steering->esw_ingress_root_ns) +- return; +- + for (i = 0; i < steering->esw_ingress_acl_vports; i++) +- cleanup_root_ns(steering->esw_ingress_root_ns[i]); ++ mlx5_fs_vport_ingress_acl_ns_remove(steering, i); + +- kfree(steering->esw_ingress_root_ns); +- steering->esw_ingress_root_ns = NULL; ++ xa_destroy(&steering->esw_ingress_root_ns); + } + + u32 mlx5_fs_get_capabilities(struct mlx5_core_dev *dev, enum mlx5_flow_namespace_type type) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +index e6a95b310b55..4dbf2485fb9e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +@@ -151,8 +151,8 @@ struct mlx5_flow_steering { + struct mlx5_flow_root_namespace *root_ns; + struct mlx5_flow_root_namespace *fdb_root_ns; + struct mlx5_flow_namespace **fdb_sub_ns; +- struct mlx5_flow_root_namespace **esw_egress_root_ns; +- struct mlx5_flow_root_namespace **esw_ingress_root_ns; ++ struct xarray esw_egress_root_ns; ++ struct xarray esw_ingress_root_ns; + struct mlx5_flow_root_namespace *sniffer_tx_root_ns; + struct mlx5_flow_root_namespace *sniffer_rx_root_ns; + struct mlx5_flow_root_namespace *rdma_rx_root_ns; +@@ -384,6 +384,15 @@ void mlx5_fs_egress_acls_cleanup(struct mlx5_core_dev *dev); + int mlx5_fs_ingress_acls_init(struct mlx5_core_dev *dev, int total_vports); + void mlx5_fs_ingress_acls_cleanup(struct mlx5_core_dev *dev); + ++int mlx5_fs_vport_egress_acl_ns_add(struct mlx5_flow_steering *steering, ++ u16 vport_idx); ++int mlx5_fs_vport_ingress_acl_ns_add(struct mlx5_flow_steering *steering, ++ u16 vport_idx); ++void mlx5_fs_vport_egress_acl_ns_remove(struct mlx5_flow_steering *steering, ++ int vport_idx); ++void mlx5_fs_vport_ingress_acl_ns_remove(struct mlx5_flow_steering *steering, ++ int vport_idx); ++ + u32 mlx5_fs_get_capabilities(struct mlx5_core_dev *dev, enum mlx5_flow_namespace_type type); + + struct mlx5_flow_root_namespace *find_root(struct fs_node *node); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1518-net-mlx5-e-switch-move-vport-acls-root-namespaces-creation-t.patch b/SOURCES/1518-net-mlx5-e-switch-move-vport-acls-root-namespaces-creation-t.patch new file mode 100644 index 000000000..464cb2626 --- /dev/null +++ b/SOURCES/1518-net-mlx5-e-switch-move-vport-acls-root-namespaces-creation-t.patch @@ -0,0 +1,271 @@ +From 57ec704d66d385d9c9b5c3aa7728dc04b0287551 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:56 -0400 +Subject: [PATCH] net/mlx5: E-Switch, Move vport acls root namespaces creation + to eswitch + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit faa6ac53cdaa26f80e4b44e6255a52bd67b83acb +Author: Saeed Mahameed +Date: Fri Aug 29 15:37:17 2025 -0700 + + net/mlx5: E-Switch, Move vport acls root namespaces creation to eswitch + + Move the loop that creates the vports ACLs root name spaces to eswitch, + since it is the eswitch responsibility to decide when and how many + vports ACLs root namespaces to create, in the next patches we will use + the fs_core vport ACL root namespace APIs to create/remove root ns + ACLs dynamically for dynamically created vports. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250829223722.900629-3-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index 9fe5a45124fd..900650a1a66e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -1439,19 +1439,76 @@ static void mlx5_esw_mode_change_notify(struct mlx5_eswitch *esw, u16 mode) + blocking_notifier_call_chain(&esw->n_head, 0, &info); + } + ++static int mlx5_esw_egress_acls_init(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_flow_steering *steering = dev->priv.steering; ++ int total_vports = mlx5_eswitch_get_total_vports(dev); ++ int err; ++ int i; ++ ++ for (i = 0; i < total_vports; i++) { ++ err = mlx5_fs_vport_egress_acl_ns_add(steering, i); ++ if (err) ++ goto acl_ns_remove; ++ } ++ return 0; ++ ++acl_ns_remove: ++ while (i--) ++ mlx5_fs_vport_egress_acl_ns_remove(steering, i); ++ return err; ++} ++ ++static void mlx5_esw_egress_acls_cleanup(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_flow_steering *steering = dev->priv.steering; ++ int total_vports = mlx5_eswitch_get_total_vports(dev); ++ int i; ++ ++ for (i = total_vports - 1; i >= 0; i--) ++ mlx5_fs_vport_egress_acl_ns_remove(steering, i); ++} ++ ++static int mlx5_esw_ingress_acls_init(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_flow_steering *steering = dev->priv.steering; ++ int total_vports = mlx5_eswitch_get_total_vports(dev); ++ int err; ++ int i; ++ ++ for (i = 0; i < total_vports; i++) { ++ err = mlx5_fs_vport_ingress_acl_ns_add(steering, i); ++ if (err) ++ goto acl_ns_remove; ++ } ++ return 0; ++ ++acl_ns_remove: ++ while (i--) ++ mlx5_fs_vport_ingress_acl_ns_remove(steering, i); ++ return err; ++} ++ ++static void mlx5_esw_ingress_acls_cleanup(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_flow_steering *steering = dev->priv.steering; ++ int total_vports = mlx5_eswitch_get_total_vports(dev); ++ int i; ++ ++ for (i = total_vports - 1; i >= 0; i--) ++ mlx5_fs_vport_ingress_acl_ns_remove(steering, i); ++} ++ + static int mlx5_esw_acls_ns_init(struct mlx5_eswitch *esw) + { + struct mlx5_core_dev *dev = esw->dev; +- int total_vports; + int err; + + if (esw->flags & MLX5_ESWITCH_VPORT_ACL_NS_CREATED) + return 0; + +- total_vports = mlx5_eswitch_get_total_vports(dev); +- + if (MLX5_CAP_ESW_EGRESS_ACL(dev, ft_support)) { +- err = mlx5_fs_egress_acls_init(dev, total_vports); ++ err = mlx5_esw_egress_acls_init(dev); + if (err) + return err; + } else { +@@ -1459,7 +1516,7 @@ static int mlx5_esw_acls_ns_init(struct mlx5_eswitch *esw) + } + + if (MLX5_CAP_ESW_INGRESS_ACL(dev, ft_support)) { +- err = mlx5_fs_ingress_acls_init(dev, total_vports); ++ err = mlx5_esw_ingress_acls_init(dev); + if (err) + goto err; + } else { +@@ -1470,7 +1527,7 @@ static int mlx5_esw_acls_ns_init(struct mlx5_eswitch *esw) + + err: + if (MLX5_CAP_ESW_EGRESS_ACL(dev, ft_support)) +- mlx5_fs_egress_acls_cleanup(dev); ++ mlx5_esw_egress_acls_cleanup(dev); + return err; + } + +@@ -1480,9 +1537,9 @@ static void mlx5_esw_acls_ns_cleanup(struct mlx5_eswitch *esw) + + esw->flags &= ~MLX5_ESWITCH_VPORT_ACL_NS_CREATED; + if (MLX5_CAP_ESW_INGRESS_ACL(dev, ft_support)) +- mlx5_fs_ingress_acls_cleanup(dev); ++ mlx5_esw_ingress_acls_cleanup(dev); + if (MLX5_CAP_ESW_EGRESS_ACL(dev, ft_support)) +- mlx5_fs_egress_acls_cleanup(dev); ++ mlx5_esw_egress_acls_cleanup(dev); + } + + /** +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 6028c163d9a2..2db3ffb0a2b2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -3675,75 +3675,6 @@ void mlx5_fs_vport_ingress_acl_ns_remove(struct mlx5_flow_steering *steering, + vport_idx); + } + +-int mlx5_fs_egress_acls_init(struct mlx5_core_dev *dev, int total_vports) +-{ +- struct mlx5_flow_steering *steering = dev->priv.steering; +- int err; +- int i; +- +- xa_init(&steering->esw_egress_root_ns); +- +- for (i = 0; i < total_vports; i++) { +- err = mlx5_fs_vport_egress_acl_ns_add(steering, i); +- if (err) +- goto cleanup_root_ns; +- } +- steering->esw_egress_acl_vports = total_vports; +- return 0; +- +-cleanup_root_ns: +- while (i--) +- mlx5_fs_vport_egress_acl_ns_remove(steering, i); +- xa_destroy(&steering->esw_egress_root_ns); +- return err; +-} +- +-void mlx5_fs_egress_acls_cleanup(struct mlx5_core_dev *dev) +-{ +- struct mlx5_flow_steering *steering = dev->priv.steering; +- int i; +- +- for (i = 0; i < steering->esw_egress_acl_vports; i++) +- mlx5_fs_vport_egress_acl_ns_remove(steering, i); +- +- xa_destroy(&steering->esw_egress_root_ns); +-} +- +-int mlx5_fs_ingress_acls_init(struct mlx5_core_dev *dev, int total_vports) +-{ +- struct mlx5_flow_steering *steering = dev->priv.steering; +- int err; +- int i; +- +- xa_init(&steering->esw_ingress_root_ns); +- +- for (i = 0; i < total_vports; i++) { +- err = mlx5_fs_vport_ingress_acl_ns_add(steering, i); +- if (err) +- goto cleanup_root_ns; +- } +- steering->esw_ingress_acl_vports = total_vports; +- return 0; +- +-cleanup_root_ns: +- while (i--) +- mlx5_fs_vport_ingress_acl_ns_remove(steering, i); +- +- xa_destroy(&steering->esw_ingress_root_ns); +- return err; +-} +- +-void mlx5_fs_ingress_acls_cleanup(struct mlx5_core_dev *dev) +-{ +- struct mlx5_flow_steering *steering = dev->priv.steering; +- int i; +- +- for (i = 0; i < steering->esw_ingress_acl_vports; i++) +- mlx5_fs_vport_ingress_acl_ns_remove(steering, i); +- +- xa_destroy(&steering->esw_ingress_root_ns); +-} +- + u32 mlx5_fs_get_capabilities(struct mlx5_core_dev *dev, enum mlx5_flow_namespace_type type) + { + struct mlx5_flow_root_namespace *root; +@@ -3873,6 +3804,11 @@ void mlx5_fs_core_cleanup(struct mlx5_core_dev *dev) + { + struct mlx5_flow_steering *steering = dev->priv.steering; + ++ WARN_ON(!xa_empty(&steering->esw_egress_root_ns)); ++ WARN_ON(!xa_empty(&steering->esw_ingress_root_ns)); ++ xa_destroy(&steering->esw_egress_root_ns); ++ xa_destroy(&steering->esw_ingress_root_ns); ++ + cleanup_root_ns(steering->root_ns); + cleanup_fdb_root_ns(steering); + cleanup_root_ns(steering->port_sel_root_ns); +@@ -3963,6 +3899,8 @@ int mlx5_fs_core_init(struct mlx5_core_dev *dev) + goto err; + } + ++ xa_init(&steering->esw_egress_root_ns); ++ xa_init(&steering->esw_ingress_root_ns); + return 0; + + err: +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +index 4dbf2485fb9e..8458ce203dac 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +@@ -159,8 +159,6 @@ struct mlx5_flow_steering { + struct mlx5_flow_root_namespace *rdma_tx_root_ns; + struct mlx5_flow_root_namespace *egress_root_ns; + struct mlx5_flow_root_namespace *port_sel_root_ns; +- int esw_egress_acl_vports; +- int esw_ingress_acl_vports; + struct mlx5_flow_root_namespace **rdma_transport_rx_root_ns; + struct mlx5_flow_root_namespace **rdma_transport_tx_root_ns; + int rdma_transport_rx_vports; +@@ -379,11 +377,6 @@ void mlx5_fs_core_free(struct mlx5_core_dev *dev); + int mlx5_fs_core_init(struct mlx5_core_dev *dev); + void mlx5_fs_core_cleanup(struct mlx5_core_dev *dev); + +-int mlx5_fs_egress_acls_init(struct mlx5_core_dev *dev, int total_vports); +-void mlx5_fs_egress_acls_cleanup(struct mlx5_core_dev *dev); +-int mlx5_fs_ingress_acls_init(struct mlx5_core_dev *dev, int total_vports); +-void mlx5_fs_ingress_acls_cleanup(struct mlx5_core_dev *dev); +- + int mlx5_fs_vport_egress_acl_ns_add(struct mlx5_flow_steering *steering, + u16 vport_idx); + int mlx5_fs_vport_ingress_acl_ns_add(struct mlx5_flow_steering *steering, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1519-net-mlx5-e-switch-add-support-for-adjacent-functions-vports-.patch b/SOURCES/1519-net-mlx5-e-switch-add-support-for-adjacent-functions-vports-.patch new file mode 100644 index 000000000..00cb75153 --- /dev/null +++ b/SOURCES/1519-net-mlx5-e-switch-add-support-for-adjacent-functions-vports-.patch @@ -0,0 +1,420 @@ +From 641a800bf548400ee76d7e052972fb3873894321 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:56 -0400 +Subject: [PATCH] net/mlx5: E-Switch, Add support for adjacent functions vports + discovery + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 17426c5d4b1db69c0aa34d0bf6a1552cc2c68d95 +Author: Adithya Jayachandran +Date: Fri Aug 29 15:37:18 2025 -0700 + + net/mlx5: E-Switch, Add support for adjacent functions vports discovery + + Adding driver support to query adjacent functions vports, AKA + delegated vports. + + Adjacent functions can delegate their sriov vfs to other sibling PF in + the system, to be managed by the eswitch capable sibling PF. + E.g, ECPF to Host PF, multi host PF between each other, etc. + + Only supported in switchdev mode. + + Signed-off-by: Adithya Jayachandran + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250829223722.900629-4-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +index 206223ce63a8..a65ab661375a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile ++++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +@@ -69,7 +69,7 @@ mlx5_core-$(CONFIG_MLX5_TC_SAMPLE) += en/tc/sample.o + # Core extra + # + mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o eswitch_offloads_termtbl.o \ +- ecpf.o rdma.o esw/legacy.o \ ++ ecpf.o rdma.o esw/legacy.o esw/adj_vport.o \ + esw/devlink_port.o esw/vporttbl.o esw/qos.o esw/ipsec.o + + mlx5_core-$(CONFIG_MLX5_ESWITCH) += esw/acl/helper.o \ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +new file mode 100644 +index 000000000000..37a06c0949d5 +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +@@ -0,0 +1,185 @@ ++// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB ++// Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ++ ++#include "eswitch.h" ++ ++enum { ++ MLX5_ADJ_VPORT_DISCONNECT = 0x0, ++ MLX5_ADJ_VPORT_CONNECT = 0x1, ++}; ++ ++static int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, ++ u16 vport, bool connect) ++{ ++ u32 in[MLX5_ST_SZ_DW(modify_vport_state_in)] = {}; ++ ++ MLX5_SET(modify_vport_state_in, in, opcode, ++ MLX5_CMD_OP_MODIFY_VPORT_STATE); ++ MLX5_SET(modify_vport_state_in, in, op_mod, ++ MLX5_VPORT_STATE_OP_MOD_ESW_VPORT); ++ MLX5_SET(modify_vport_state_in, in, other_vport, 1); ++ MLX5_SET(modify_vport_state_in, in, vport_number, vport); ++ MLX5_SET(modify_vport_state_in, in, ingress_connect_valid, 1); ++ MLX5_SET(modify_vport_state_in, in, egress_connect_valid, 1); ++ MLX5_SET(modify_vport_state_in, in, ingress_connect, connect); ++ MLX5_SET(modify_vport_state_in, in, egress_connect, connect); ++ ++ return mlx5_cmd_exec_in(dev, modify_vport_state, in); ++} ++ ++static void mlx5_esw_destroy_esw_vport(struct mlx5_core_dev *dev, u16 vport) ++{ ++ u32 in[MLX5_ST_SZ_DW(destroy_esw_vport_in)] = {}; ++ ++ MLX5_SET(destroy_esw_vport_in, in, opcode, ++ MLX5_CMD_OPCODE_DESTROY_ESW_VPORT); ++ MLX5_SET(destroy_esw_vport_in, in, vport_num, vport); ++ ++ mlx5_cmd_exec_in(dev, destroy_esw_vport, in); ++} ++ ++static int mlx5_esw_create_esw_vport(struct mlx5_core_dev *dev, u16 vhca_id, ++ u16 *vport_num) ++{ ++ u32 out[MLX5_ST_SZ_DW(create_esw_vport_out)] = {}; ++ u32 in[MLX5_ST_SZ_DW(create_esw_vport_in)] = {}; ++ int err; ++ ++ MLX5_SET(create_esw_vport_in, in, opcode, ++ MLX5_CMD_OPCODE_CREATE_ESW_VPORT); ++ MLX5_SET(create_esw_vport_in, in, managed_vhca_id, vhca_id); ++ ++ err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); ++ if (!err) ++ *vport_num = MLX5_GET(create_esw_vport_out, out, vport_num); ++ ++ return err; ++} ++ ++static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id) ++{ ++ struct mlx5_vport *vport; ++ u16 vport_num; ++ int err; ++ ++ err = mlx5_esw_create_esw_vport(esw->dev, vhca_id, &vport_num); ++ if (err) { ++ esw_warn(esw->dev, ++ "Failed to create adjacent vport for vhca_id %d, err %d\n", ++ vhca_id, err); ++ return err; ++ } ++ ++ esw_debug(esw->dev, "Created adjacent vport[%d] %d for vhca_id 0x%x\n", ++ esw->last_vport_idx, vport_num, vhca_id); ++ ++ err = mlx5_esw_vport_alloc(esw, esw->last_vport_idx++, vport_num); ++ if (err) ++ goto destroy_esw_vport; ++ ++ xa_set_mark(&esw->vports, vport_num, MLX5_ESW_VPT_VF); ++ vport = mlx5_eswitch_get_vport(esw, vport_num); ++ vport->adjacent = true; ++ vport->vhca_id = vhca_id; ++ ++ mlx5_esw_adj_vport_modify(esw->dev, vport_num, MLX5_ADJ_VPORT_CONNECT); ++ return 0; ++ ++destroy_esw_vport: ++ mlx5_esw_destroy_esw_vport(esw->dev, vport_num); ++ return err; ++} ++ ++static void mlx5_esw_adj_vport_destroy(struct mlx5_eswitch *esw, ++ struct mlx5_vport *vport) ++{ ++ u16 vport_num = vport->vport; ++ ++ esw_debug(esw->dev, "Destroying adjacent vport %d for vhca_id 0x%x\n", ++ vport_num, vport->vhca_id); ++ mlx5_esw_adj_vport_modify(esw->dev, vport_num, ++ MLX5_ADJ_VPORT_DISCONNECT); ++ mlx5_esw_vport_free(esw, vport); ++ /* Reset the vport index back so new adj vports can use this index. ++ * When vport count can incrementally change, this needs to be modified. ++ */ ++ esw->last_vport_idx--; ++ mlx5_esw_destroy_esw_vport(esw->dev, vport_num); ++} ++ ++void mlx5_esw_adjacent_vhcas_cleanup(struct mlx5_eswitch *esw) ++{ ++ struct mlx5_vport *vport; ++ unsigned long i; ++ ++ if (!MLX5_CAP_GEN_2(esw->dev, delegated_vhca_max)) ++ return; ++ ++ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) { ++ if (!vport->adjacent) ++ continue; ++ mlx5_esw_adj_vport_destroy(esw, vport); ++ } ++} ++ ++void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw) ++{ ++ u32 delegated_vhca_max = MLX5_CAP_GEN_2(esw->dev, delegated_vhca_max); ++ u32 in[MLX5_ST_SZ_DW(query_delegated_vhca_in)] = {}; ++ int outlen, err, i = 0; ++ u8 *out; ++ u32 count; ++ ++ if (!delegated_vhca_max) ++ return; ++ ++ outlen = MLX5_ST_SZ_BYTES(query_delegated_vhca_out) + ++ delegated_vhca_max * ++ MLX5_ST_SZ_BYTES(delegated_function_vhca_rid_info); ++ ++ esw_debug(esw->dev, "delegated_vhca_max=%d\n", delegated_vhca_max); ++ ++ out = kvzalloc(outlen, GFP_KERNEL); ++ if (!out) ++ return; ++ ++ MLX5_SET(query_delegated_vhca_in, in, opcode, ++ MLX5_CMD_OPCODE_QUERY_DELEGATED_VHCA); ++ ++ err = mlx5_cmd_exec(esw->dev, in, sizeof(in), out, outlen); ++ if (err) { ++ kvfree(out); ++ esw_warn(esw->dev, "Failed to query delegated vhca, err %d\n", ++ err); ++ return; ++ } ++ ++ count = MLX5_GET(query_delegated_vhca_out, out, functions_count); ++ esw_debug(esw->dev, "Delegated vhca functions count %d\n", count); ++ ++ for (i = 0; i < count; i++) { ++ void *rid_info, *rid_info_reg; ++ u16 vhca_id; ++ ++ rid_info = MLX5_ADDR_OF(query_delegated_vhca_out, out, ++ delegated_function_vhca_rid_info[i]); ++ ++ rid_info_reg = MLX5_ADDR_OF(delegated_function_vhca_rid_info, ++ rid_info, function_vhca_rid_info); ++ ++ vhca_id = MLX5_GET(function_vhca_rid_info_reg, rid_info_reg, ++ vhca_id); ++ esw_debug(esw->dev, "Delegating vhca_id 0x%x rid info:\n", ++ vhca_id); ++ ++ err = mlx5_esw_adj_vport_create(esw, vhca_id); ++ if (err) { ++ esw_warn(esw->dev, ++ "Failed to init adjacent vhca 0x%x, err %d\n", ++ vhca_id, err); ++ break; ++ } ++ } ++ ++ kvfree(out); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index 900650a1a66e..10eca910a2db 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -1217,7 +1217,8 @@ void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs) + unsigned long i; + + mlx5_esw_for_each_vf_vport(esw, i, vport, num_vfs) { +- if (!vport->enabled) ++ /* Adjacent VFs are unloaded separately */ ++ if (!vport->enabled || vport->adjacent) + continue; + mlx5_eswitch_unload_pf_vf_vport(esw, vport->vport); + } +@@ -1236,6 +1237,42 @@ static void mlx5_eswitch_unload_ec_vf_vports(struct mlx5_eswitch *esw, + } + } + ++static void mlx5_eswitch_unload_adj_vf_vports(struct mlx5_eswitch *esw) ++{ ++ struct mlx5_vport *vport; ++ unsigned long i; ++ ++ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) { ++ if (!vport->enabled || !vport->adjacent) ++ continue; ++ mlx5_eswitch_unload_pf_vf_vport(esw, vport->vport); ++ } ++} ++ ++static int ++mlx5_eswitch_load_adj_vf_vports(struct mlx5_eswitch *esw, ++ enum mlx5_eswitch_vport_event enabled_events) ++{ ++ struct mlx5_vport *vport; ++ unsigned long i; ++ int err; ++ ++ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) { ++ if (!vport->adjacent) ++ continue; ++ err = mlx5_eswitch_load_pf_vf_vport(esw, vport->vport, ++ enabled_events); ++ if (err) ++ goto unload_adj_vf_vport; ++ } ++ ++ return 0; ++ ++unload_adj_vf_vport: ++ mlx5_eswitch_unload_adj_vf_vports(esw); ++ return err; ++} ++ + int mlx5_eswitch_load_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs, + enum mlx5_eswitch_vport_event enabled_events) + { +@@ -1345,8 +1382,16 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, + enabled_events); + if (ret) + goto vf_err; ++ ++ /* Enable adjacent VF vports */ ++ ret = mlx5_eswitch_load_adj_vf_vports(esw, enabled_events); ++ if (ret) ++ goto unload_vf_vports; ++ + return 0; + ++unload_vf_vports: ++ mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs); + vf_err: + if (mlx5_core_ec_sriov_enabled(esw->dev)) + mlx5_eswitch_unload_ec_vf_vports(esw, esw->esw_funcs.num_ec_vfs); +@@ -1367,6 +1412,8 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, + */ + void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw) + { ++ mlx5_eswitch_unload_adj_vf_vports(esw); ++ + mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs); + + if (mlx5_core_ec_sriov_enabled(esw->dev)) +@@ -1791,8 +1838,7 @@ int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 * + return err; + } + +-static int mlx5_esw_vport_alloc(struct mlx5_eswitch *esw, +- int index, u16 vport_num) ++int mlx5_esw_vport_alloc(struct mlx5_eswitch *esw, int index, u16 vport_num) + { + struct mlx5_vport *vport; + int err; +@@ -1819,8 +1865,9 @@ static int mlx5_esw_vport_alloc(struct mlx5_eswitch *esw, + return err; + } + +-static void mlx5_esw_vport_free(struct mlx5_eswitch *esw, struct mlx5_vport *vport) ++void mlx5_esw_vport_free(struct mlx5_eswitch *esw, struct mlx5_vport *vport) + { ++ esw->total_vports--; + xa_erase(&esw->vports, vport->vport); + kfree(vport); + } +@@ -1904,6 +1951,9 @@ static int mlx5_esw_vports_init(struct mlx5_eswitch *esw) + err = mlx5_esw_vport_alloc(esw, idx, MLX5_VPORT_UPLINK); + if (err) + goto err; ++ ++ /* Adjacent vports or other dynamically create vports will use this */ ++ esw->last_vport_idx = ++idx; + return 0; + + err: +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index cfd6b1b8c6f4..2c0e5ca73f3d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -216,6 +216,7 @@ struct mlx5_vport { + u32 metadata; + int vhca_id; + ++ bool adjacent; /* delegated vhca from adjacent function */ + struct mlx5_vport_info info; + + /* Protected with the E-Switch qos domain lock. The Vport QoS can +@@ -384,6 +385,7 @@ struct mlx5_eswitch { + + struct mlx5_esw_bridge_offloads *br_offloads; + struct mlx5_esw_offload offloads; ++ u32 last_vport_idx; + int mode; + u16 manager_vport; + u16 first_host_vport; +@@ -417,6 +419,8 @@ int mlx5_esw_qos_modify_vport_rate(struct mlx5_eswitch *esw, u16 vport_num, u32 + /* E-Switch API */ + int mlx5_eswitch_init(struct mlx5_core_dev *dev); + void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw); ++int mlx5_esw_vport_alloc(struct mlx5_eswitch *esw, int index, u16 vport_num); ++void mlx5_esw_vport_free(struct mlx5_eswitch *esw, struct mlx5_vport *vport); + + #define MLX5_ESWITCH_IGNORE_NUM_VFS (-1) + int mlx5_eswitch_enable_locked(struct mlx5_eswitch *esw, int num_vfs); +@@ -622,6 +626,9 @@ bool mlx5_esw_multipath_prereq(struct mlx5_core_dev *dev0, + + const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev); + ++void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw); ++void mlx5_esw_adjacent_vhcas_cleanup(struct mlx5_eswitch *esw); ++ + #define MLX5_DEBUG_ESWITCH_MASK BIT(3) + + #define esw_info(__dev, format, ...) \ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index cdba7bc448ee..fb03981d5036 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -3538,6 +3538,8 @@ int esw_offloads_enable(struct mlx5_eswitch *esw) + int err; + + mutex_init(&esw->offloads.termtbl_mutex); ++ mlx5_esw_adjacent_vhcas_setup(esw); ++ + err = mlx5_rdma_enable_roce(esw->dev); + if (err) + goto err_roce; +@@ -3602,6 +3604,7 @@ int esw_offloads_enable(struct mlx5_eswitch *esw) + err_metadata: + mlx5_rdma_disable_roce(esw->dev); + err_roce: ++ mlx5_esw_adjacent_vhcas_cleanup(esw); + mutex_destroy(&esw->offloads.termtbl_mutex); + return err; + } +@@ -3635,6 +3638,7 @@ void esw_offloads_disable(struct mlx5_eswitch *esw) + mapping_destroy(esw->offloads.reg_c0_obj_pool); + esw_offloads_metadata_uninit(esw); + mlx5_rdma_disable_roce(esw->dev); ++ mlx5_esw_adjacent_vhcas_cleanup(esw); + mutex_destroy(&esw->offloads.termtbl_mutex); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1520-net-mlx5-e-switch-create-acls-root-namespace-for-adjacent-vp.patch b/SOURCES/1520-net-mlx5-e-switch-create-acls-root-namespace-for-adjacent-vp.patch new file mode 100644 index 000000000..bd9096930 --- /dev/null +++ b/SOURCES/1520-net-mlx5-e-switch-create-acls-root-namespace-for-adjacent-vp.patch @@ -0,0 +1,60 @@ +From 3c7bde53049a9fd2ec77d02d1c80aa511db89454 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:56 -0400 +Subject: [PATCH] net/mlx5: E-Switch, Create acls root namespace for adjacent + vports + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9984ec9f1f502dc5e19daf4210b221f554ca35db +Author: Saeed Mahameed +Date: Fri Aug 29 15:37:19 2025 -0700 + + net/mlx5: E-Switch, Create acls root namespace for adjacent vports + + Use the new vport acl root namespace add/remove API to create the + missing acl root name spaces per each new adjacent function vport. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250829223722.900629-5-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +index 37a06c0949d5..1d104b3fe9e0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +@@ -1,6 +1,7 @@ + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB + // Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + ++#include "fs_core.h" + #include "eswitch.h" + + enum { +@@ -82,6 +83,9 @@ static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id) + vport->adjacent = true; + vport->vhca_id = vhca_id; + ++ mlx5_fs_vport_egress_acl_ns_add(esw->dev->priv.steering, vport->index); ++ mlx5_fs_vport_ingress_acl_ns_add(esw->dev->priv.steering, vport->index); ++ + mlx5_esw_adj_vport_modify(esw->dev, vport_num, MLX5_ADJ_VPORT_CONNECT); + return 0; + +@@ -99,6 +103,10 @@ static void mlx5_esw_adj_vport_destroy(struct mlx5_eswitch *esw, + vport_num, vport->vhca_id); + mlx5_esw_adj_vport_modify(esw->dev, vport_num, + MLX5_ADJ_VPORT_DISCONNECT); ++ mlx5_fs_vport_egress_acl_ns_remove(esw->dev->priv.steering, ++ vport->index); ++ mlx5_fs_vport_ingress_acl_ns_remove(esw->dev->priv.steering, ++ vport->index); + mlx5_esw_vport_free(esw, vport); + /* Reset the vport index back so new adj vports can use this index. + * When vport count can incrementally change, this needs to be modified. +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1521-net-mlx5-e-switch-register-representors-for-adjacent-vports.patch b/SOURCES/1521-net-mlx5-e-switch-register-representors-for-adjacent-vports.patch new file mode 100644 index 000000000..adfb22d47 --- /dev/null +++ b/SOURCES/1521-net-mlx5-e-switch-register-representors-for-adjacent-vports.patch @@ -0,0 +1,134 @@ +From 769550b37c1dfb703c621e938b789e4437093ffc Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:56 -0400 +Subject: [PATCH] net/mlx5: E-Switch, Register representors for adjacent vports + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a0a7002b943997f5a4a9103ab92db388965f7aff +Author: Saeed Mahameed +Date: Fri Aug 29 15:37:20 2025 -0700 + + net/mlx5: E-Switch, Register representors for adjacent vports + + Register representors for adjacent vports dynamically when they are + discovered. Dynamically added representors state will now be set to + 'REGISTERED' when the representor type was already registered, + otherwise they won't be loaded. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250829223722.900629-6-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +index 1d104b3fe9e0..3380f85678bc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +@@ -85,10 +85,19 @@ static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id) + + mlx5_fs_vport_egress_acl_ns_add(esw->dev->priv.steering, vport->index); + mlx5_fs_vport_ingress_acl_ns_add(esw->dev->priv.steering, vport->index); ++ err = mlx5_esw_offloads_rep_add(esw, vport); ++ if (err) ++ goto acl_ns_remove; + + mlx5_esw_adj_vport_modify(esw->dev, vport_num, MLX5_ADJ_VPORT_CONNECT); + return 0; + ++acl_ns_remove: ++ mlx5_fs_vport_ingress_acl_ns_remove(esw->dev->priv.steering, ++ vport->index); ++ mlx5_fs_vport_egress_acl_ns_remove(esw->dev->priv.steering, ++ vport->index); ++ mlx5_esw_vport_free(esw, vport); + destroy_esw_vport: + mlx5_esw_destroy_esw_vport(esw->dev, vport_num); + return err; +@@ -103,6 +112,7 @@ static void mlx5_esw_adj_vport_destroy(struct mlx5_eswitch *esw, + vport_num, vport->vhca_id); + mlx5_esw_adj_vport_modify(esw->dev, vport_num, + MLX5_ADJ_VPORT_DISCONNECT); ++ mlx5_esw_offloads_rep_remove(esw, vport); + mlx5_fs_vport_egress_acl_ns_remove(esw->dev->priv.steering, + vport->index); + mlx5_fs_vport_ingress_acl_ns_remove(esw->dev->priv.steering, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 2c0e5ca73f3d..6d36d8bbb979 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -838,6 +838,11 @@ void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, + int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); + bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id); + ++void mlx5_esw_offloads_rep_remove(struct mlx5_eswitch *esw, ++ const struct mlx5_vport *vport); ++int mlx5_esw_offloads_rep_add(struct mlx5_eswitch *esw, ++ const struct mlx5_vport *vport); ++ + /** + * struct mlx5_esw_event_info - Indicates eswitch mode changed/changing. + * +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index fb03981d5036..d57f86d297ab 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -2378,7 +2378,20 @@ static int esw_offloads_start(struct mlx5_eswitch *esw, + return 0; + } + +-static int mlx5_esw_offloads_rep_init(struct mlx5_eswitch *esw, const struct mlx5_vport *vport) ++void mlx5_esw_offloads_rep_remove(struct mlx5_eswitch *esw, ++ const struct mlx5_vport *vport) ++{ ++ struct mlx5_eswitch_rep *rep = xa_load(&esw->offloads.vport_reps, ++ vport->vport); ++ ++ if (!rep) ++ return; ++ xa_erase(&esw->offloads.vport_reps, vport->vport); ++ kfree(rep); ++} ++ ++int mlx5_esw_offloads_rep_add(struct mlx5_eswitch *esw, ++ const struct mlx5_vport *vport) + { + struct mlx5_eswitch_rep *rep; + int rep_type; +@@ -2390,9 +2403,19 @@ static int mlx5_esw_offloads_rep_init(struct mlx5_eswitch *esw, const struct mlx + + rep->vport = vport->vport; + rep->vport_index = vport->index; +- for (rep_type = 0; rep_type < NUM_REP_TYPES; rep_type++) +- atomic_set(&rep->rep_data[rep_type].state, REP_UNREGISTERED); +- ++ for (rep_type = 0; rep_type < NUM_REP_TYPES; rep_type++) { ++ if (!esw->offloads.rep_ops[rep_type]) { ++ atomic_set(&rep->rep_data[rep_type].state, ++ REP_UNREGISTERED); ++ continue; ++ } ++ /* Dynamic/delegated vports add their representors after ++ * mlx5_eswitch_register_vport_reps, so mark them as registered ++ * for them to be loaded later with the others. ++ */ ++ rep->esw = esw; ++ atomic_set(&rep->rep_data[rep_type].state, REP_REGISTERED); ++ } + err = xa_insert(&esw->offloads.vport_reps, rep->vport, rep, GFP_KERNEL); + if (err) + goto insert_err; +@@ -2430,7 +2453,7 @@ static int esw_offloads_init_reps(struct mlx5_eswitch *esw) + xa_init(&esw->offloads.vport_reps); + + mlx5_esw_for_each_vport(esw, i, vport) { +- err = mlx5_esw_offloads_rep_init(esw, vport); ++ err = mlx5_esw_offloads_rep_add(esw, vport); + if (err) + goto err; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1522-net-mlx5-e-switch-set-representor-attributes-for-adjacent-vf.patch b/SOURCES/1522-net-mlx5-e-switch-set-representor-attributes-for-adjacent-vf.patch new file mode 100644 index 000000000..31ec2bd03 --- /dev/null +++ b/SOURCES/1522-net-mlx5-e-switch-set-representor-attributes-for-adjacent-vf.patch @@ -0,0 +1,138 @@ +From cb384ea85892df1bafcb9e6d92919a27840649c6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: E-switch, Set representor attributes for adjacent + VFs + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5d8ae2c2cfe88a2c7458e18f30df4c655dfa983e +Author: Adithya Jayachandran +Date: Fri Aug 29 15:37:21 2025 -0700 + + net/mlx5: E-switch, Set representor attributes for adjacent VFs + + Adjacent vfs get their devlink port information from firmware, + use the information (pfnum, function id) from FW when populating the + devlink port attributes. + + Before: + $ devlink port show + pci/0000:00:03.0/180225: type eth netdev eth0 flavour pcivf controller 0 pfnum 0 vfnum 49152 external false splittable false + function: + hw_addr 00:00:00:00:00:00 + + After: + $ devlink port show + pci/0000:00:03.0/180225: type eth netdev enp0s3npf0vf2 flavour pcivf controller 0 pfnum 0 vfnum 2 external false splittable false + function: + hw_addr 00:00:00:00:00:00 + + Signed-off-by: Adithya Jayachandran + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250829223722.900629-7-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +index 3380f85678bc..0091ba697bae 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +@@ -57,7 +57,8 @@ static int mlx5_esw_create_esw_vport(struct mlx5_core_dev *dev, u16 vhca_id, + return err; + } + +-static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id) ++static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id, ++ const void *rid_info_reg) + { + struct mlx5_vport *vport; + u16 vport_num; +@@ -83,6 +84,12 @@ static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id) + vport->adjacent = true; + vport->vhca_id = vhca_id; + ++ vport->adj_info.parent_pci_devfn = ++ MLX5_GET(function_vhca_rid_info_reg, rid_info_reg, ++ parent_pci_device_function); ++ vport->adj_info.function_id = ++ MLX5_GET(function_vhca_rid_info_reg, rid_info_reg, function_id); ++ + mlx5_fs_vport_egress_acl_ns_add(esw->dev->priv.steering, vport->index); + mlx5_fs_vport_ingress_acl_ns_add(esw->dev->priv.steering, vport->index); + err = mlx5_esw_offloads_rep_add(esw, vport); +@@ -176,7 +183,7 @@ void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw) + esw_debug(esw->dev, "Delegated vhca functions count %d\n", count); + + for (i = 0; i < count; i++) { +- void *rid_info, *rid_info_reg; ++ const void *rid_info, *rid_info_reg; + u16 vhca_id; + + rid_info = MLX5_ADDR_OF(query_delegated_vhca_out, out, +@@ -187,10 +194,9 @@ void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw) + + vhca_id = MLX5_GET(function_vhca_rid_info_reg, rid_info_reg, + vhca_id); +- esw_debug(esw->dev, "Delegating vhca_id 0x%x rid info:\n", +- vhca_id); ++ esw_debug(esw->dev, "Delegating vhca_id 0x%x\n", vhca_id); + +- err = mlx5_esw_adj_vport_create(esw, vhca_id); ++ err = mlx5_esw_adj_vport_create(esw, vhca_id, rid_info_reg); + if (err) { + esw_warn(esw->dev, + "Failed to init adjacent vhca 0x%x, err %d\n", +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +index c33accadae0f..cf88a106d80d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +@@ -27,6 +27,7 @@ static void mlx5_esw_offloads_pf_vf_devlink_port_attrs_set(struct mlx5_eswitch * + { + struct mlx5_core_dev *dev = esw->dev; + struct netdev_phys_item_id ppid = {}; ++ struct mlx5_vport *vport; + u32 controller_num = 0; + bool external; + u16 pfnum; +@@ -42,10 +43,18 @@ static void mlx5_esw_offloads_pf_vf_devlink_port_attrs_set(struct mlx5_eswitch * + dl_port->attrs.switch_id.id_len = ppid.id_len; + devlink_port_attrs_pci_pf_set(dl_port, controller_num, pfnum, external); + } else if (mlx5_eswitch_is_vf_vport(esw, vport_num)) { ++ u16 func_id = vport_num - 1; ++ ++ vport = mlx5_eswitch_get_vport(esw, vport_num); + memcpy(dl_port->attrs.switch_id.id, ppid.id, ppid.id_len); + dl_port->attrs.switch_id.id_len = ppid.id_len; ++ if (vport->adjacent) { ++ func_id = vport->adj_info.function_id; ++ pfnum = vport->adj_info.parent_pci_devfn; ++ } ++ + devlink_port_attrs_pci_vf_set(dl_port, controller_num, pfnum, +- vport_num - 1, external); ++ func_id, external); + } else if (mlx5_core_is_ec_vf_vport(esw->dev, vport_num)) { + u16 base_vport = mlx5_core_ec_vf_vport_base(dev); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 6d36d8bbb979..4fe285ce32aa 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -217,6 +217,11 @@ struct mlx5_vport { + int vhca_id; + + bool adjacent; /* delegated vhca from adjacent function */ ++ struct { ++ u16 parent_pci_devfn; /* Adjacent parent PCI device function */ ++ u16 function_id; /* Function ID of the delegated VPort */ ++ } adj_info; ++ + struct mlx5_vport_info info; + + /* Protected with the E-Switch qos domain lock. The Vport QoS can +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1523-net-mlx5-dr-hws-use-the-cached-vhca-id-for-this-device.patch b/SOURCES/1523-net-mlx5-dr-hws-use-the-cached-vhca-id-for-this-device.patch new file mode 100644 index 000000000..07497d3c9 --- /dev/null +++ b/SOURCES/1523-net-mlx5-dr-hws-use-the-cached-vhca-id-for-this-device.patch @@ -0,0 +1,133 @@ +From 445001490208492dac2b44949d28ad9854942329 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: {DR,HWS}, Use the cached vhca_id for this device + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 0c2a02f3c066d4b50ebb66178843df83f33e4f1b +Author: Saeed Mahameed +Date: Fri Aug 29 15:37:22 2025 -0700 + + net/mlx5: {DR,HWS}, Use the cached vhca_id for this device + + The mlx5 driver caches many capabilities to be used by mlx5 layers. + + In SW and HW steering we can use the cached vhca_id instead of invoking + FW commands. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250829223722.900629-8-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +index acb0317f930b..f22eaf506d28 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +@@ -1200,40 +1200,20 @@ int mlx5hws_cmd_query_caps(struct mlx5_core_dev *mdev, + int mlx5hws_cmd_query_gvmi(struct mlx5_core_dev *mdev, bool other_function, + u16 vport_number, u16 *gvmi) + { +- u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; +- int out_size; +- void *out; + int err; + +- if (other_function) { +- err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); +- if (!err) +- return 0; +- +- mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", +- vport_number); +- return err; ++ if (!other_function) { ++ /* self vhca_id */ ++ *gvmi = MLX5_CAP_GEN(mdev, vhca_id); ++ return 0; + } + +- /* get vhca_id for `this` function */ +- out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out); +- out = kzalloc(out_size, GFP_KERNEL); +- if (!out) +- return -ENOMEM; +- +- MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); +- MLX5_SET(query_hca_cap_in, in, op_mod, +- MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 | HCA_CAP_OPMOD_GET_CUR); +- +- err = mlx5_cmd_exec_inout(mdev, query_hca_cap, in, out); ++ err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); + if (err) { +- kfree(out); ++ mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", ++ vport_number); + return err; + } + +- *gvmi = MLX5_GET(query_hca_cap_out, out, capability.cmd_hca_cap.vhca_id); +- +- kfree(out); +- + return 0; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c +index bf99b933fd14..1ebb2b15c080 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c +@@ -35,41 +35,21 @@ int mlx5dr_cmd_query_esw_vport_context(struct mlx5_core_dev *mdev, + int mlx5dr_cmd_query_gvmi(struct mlx5_core_dev *mdev, bool other_vport, + u16 vport_number, u16 *gvmi) + { +- u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; +- int out_size; +- void *out; + int err; + +- if (other_vport) { +- err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); +- if (!err) +- return 0; +- +- mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", +- vport_number); +- return err; ++ if (!other_vport) { ++ /* self vhca_id */ ++ *gvmi = MLX5_CAP_GEN(mdev, vhca_id); ++ return 0; + } + +- /* get vhca_id for `this` function */ +- out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out); +- out = kzalloc(out_size, GFP_KERNEL); +- if (!out) +- return -ENOMEM; +- +- MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); +- MLX5_SET(query_hca_cap_in, in, op_mod, +- MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 | +- HCA_CAP_OPMOD_GET_CUR); +- +- err = mlx5_cmd_exec_inout(mdev, query_hca_cap, in, out); ++ err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); + if (err) { +- kfree(out); ++ mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", ++ vport_number); + return err; + } + +- *gvmi = MLX5_GET(query_hca_cap_out, out, capability.cmd_hca_cap.vhca_id); +- +- kfree(out); + return 0; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1524-net-mlx5-add-psp-capabilities-structures-and-bits.patch b/SOURCES/1524-net-mlx5-add-psp-capabilities-structures-and-bits.patch new file mode 100644 index 000000000..d7d3628e6 --- /dev/null +++ b/SOURCES/1524-net-mlx5-add-psp-capabilities-structures-and-bits.patch @@ -0,0 +1,266 @@ +From 4fade4931589e2d29e81f53e02b684a77ad6ac5f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: Add PSP capabilities structures and bits + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 04a3134f88a4bd03001a3093144819523cfca99e +Author: Saeed Mahameed +Date: Tue Sep 2 22:45:24 2025 -0700 + + net/mlx5: Add PSP capabilities structures and bits + + Add mlx5_ifc PSP related capabilities structures and HW definitions + needed for PSP support in mlx5. + + Link: https://lore.kernel.org/netdev/20250828162953.2707727-1-daniel.zahka@gmail.com/ + Signed-off-by: Saeed Mahameed + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c +index 57476487e31f..eeb4437975f2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c +@@ -294,6 +294,12 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) + return err; + } + ++ if (MLX5_CAP_GEN(dev, psp)) { ++ err = mlx5_core_get_caps(dev, MLX5_CAP_PSP); ++ if (err) ++ return err; ++ } ++ + return 0; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index f6b04b2ae623..6175aa0bbbb7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1798,6 +1798,7 @@ static const int types[] = { + MLX5_CAP_VDPA_EMULATION, + MLX5_CAP_IPSEC, + MLX5_CAP_PORT_SELECTION, ++ MLX5_CAP_PSP, + MLX5_CAP_MACSEC, + MLX5_CAP_ADV_VIRTUALIZATION, + MLX5_CAP_CRYPTO, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +index c6436c3a7a83..c4bb6967f74d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +@@ -1280,7 +1280,7 @@ hws_definer_conv_misc2(struct mlx5hws_definer_conv_data *cd, + struct mlx5hws_definer_fc *fc = cd->fc; + struct mlx5hws_definer_fc *curr_fc; + +- if (HWS_IS_FLD_SET_SZ(match_param, misc_parameters_2.reserved_at_1a0, 0x8) || ++ if (HWS_IS_FLD_SET_SZ(match_param, misc_parameters_2.psp_syndrome, 0x8) || + HWS_IS_FLD_SET_SZ(match_param, + misc_parameters_2.ipsec_next_header, 0x8) || + HWS_IS_FLD_SET_SZ(match_param, misc_parameters_2.reserved_at_1c0, 0x40) || +diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h +index 3b506482b4fa..04ceff52e19b 100644 +--- a/include/linux/mlx5/device.h ++++ b/include/linux/mlx5/device.h +@@ -1238,6 +1238,7 @@ enum mlx5_cap_type { + MLX5_CAP_IPSEC, + MLX5_CAP_CRYPTO = 0x1a, + MLX5_CAP_SHAMPO = 0x1d, ++ MLX5_CAP_PSP = 0x1e, + MLX5_CAP_MACSEC = 0x1f, + MLX5_CAP_GENERAL_2 = 0x20, + MLX5_CAP_PORT_SELECTION = 0x25, +@@ -1477,6 +1478,9 @@ enum mlx5_qcam_feature_groups { + #define MLX5_CAP_SHAMPO(mdev, cap) \ + MLX5_GET(shampo_cap, mdev->caps.hca[MLX5_CAP_SHAMPO]->cur, cap) + ++#define MLX5_CAP_PSP(mdev, cap)\ ++ MLX5_GET(psp_cap, (mdev)->caps.hca[MLX5_CAP_PSP]->cur, cap) ++ + enum { + MLX5_CMD_STAT_OK = 0x0, + MLX5_CMD_STAT_INT_ERR = 0x1, +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 4362dcb3b8fa..5934c24d8dcb 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -314,6 +314,8 @@ enum { + MLX5_CMD_OP_CREATE_UMEM = 0xa08, + MLX5_CMD_OP_DESTROY_UMEM = 0xa0a, + MLX5_CMD_OP_SYNC_STEERING = 0xb00, ++ MLX5_CMD_OP_PSP_GEN_SPI = 0xb10, ++ MLX5_CMD_OP_PSP_ROTATE_KEY = 0xb11, + MLX5_CMD_OP_QUERY_VHCA_STATE = 0xb0d, + MLX5_CMD_OP_MODIFY_VHCA_STATE = 0xb0e, + MLX5_CMD_OP_SYNC_CRYPTO = 0xb12, +@@ -489,12 +491,14 @@ struct mlx5_ifc_flow_table_prop_layout_bits { + u8 execute_aso[0x1]; + u8 reserved_at_47[0x19]; + +- u8 reserved_at_60[0x2]; ++ u8 reformat_l2_to_l3_psp_tunnel[0x1]; ++ u8 reformat_l3_psp_tunnel_to_l2[0x1]; + u8 reformat_insert[0x1]; + u8 reformat_remove[0x1]; + u8 macsec_encrypt[0x1]; + u8 macsec_decrypt[0x1]; +- u8 reserved_at_66[0x2]; ++ u8 psp_encrypt[0x1]; ++ u8 psp_decrypt[0x1]; + u8 reformat_add_macsec[0x1]; + u8 reformat_remove_macsec[0x1]; + u8 reparse[0x1]; +@@ -703,7 +707,7 @@ struct mlx5_ifc_fte_match_set_misc2_bits { + + u8 metadata_reg_a[0x20]; + +- u8 reserved_at_1a0[0x8]; ++ u8 psp_syndrome[0x8]; + u8 macsec_syndrome[0x8]; + u8 ipsec_syndrome[0x8]; + u8 ipsec_next_header[0x8]; +@@ -1510,6 +1514,21 @@ struct mlx5_ifc_macsec_cap_bits { + u8 reserved_at_40[0x7c0]; + }; + ++struct mlx5_ifc_psp_cap_bits { ++ u8 reserved_at_0[0x1]; ++ u8 psp_crypto_offload[0x1]; ++ u8 reserved_at_2[0x1]; ++ u8 psp_crypto_esp_aes_gcm_256_encrypt[0x1]; ++ u8 psp_crypto_esp_aes_gcm_128_encrypt[0x1]; ++ u8 psp_crypto_esp_aes_gcm_256_decrypt[0x1]; ++ u8 psp_crypto_esp_aes_gcm_128_decrypt[0x1]; ++ u8 reserved_at_7[0x4]; ++ u8 log_max_num_of_psp_spi[0x5]; ++ u8 reserved_at_10[0x10]; ++ ++ u8 reserved_at_20[0x7e0]; ++}; ++ + enum { + MLX5_WQ_TYPE_LINKED_LIST = 0x0, + MLX5_WQ_TYPE_CYCLIC = 0x1, +@@ -1875,7 +1894,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { + + u8 reserved_at_2a0[0x7]; + u8 mkey_pcie_tph[0x1]; +- u8 reserved_at_2a8[0x3]; ++ u8 reserved_at_2a8[0x2]; ++ ++ u8 psp[0x1]; + u8 shampo[0x1]; + u8 reserved_at_2ac[0x4]; + u8 max_wqe_sz_rq[0x10]; +@@ -3802,6 +3823,7 @@ union mlx5_ifc_hca_cap_union_bits { + struct mlx5_ifc_macsec_cap_bits macsec_cap; + struct mlx5_ifc_crypto_cap_bits crypto_cap; + struct mlx5_ifc_ipsec_cap_bits ipsec_cap; ++ struct mlx5_ifc_psp_cap_bits psp_cap; + u8 reserved_at_0[0x8000]; + }; + +@@ -3831,6 +3853,7 @@ enum { + enum { + MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_IPSEC = 0x0, + MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_MACSEC = 0x1, ++ MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_PSP = 0x2, + }; + + struct mlx5_ifc_vlan_bits { +@@ -7158,6 +7181,8 @@ enum mlx5_reformat_ctx_type { + MLX5_REFORMAT_TYPE_DEL_ESP_TRANSPORT_OVER_UDP = 0xa, + MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV6 = 0xb, + MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV6 = 0xc, ++ MLX5_REFORMAT_TYPE_ADD_PSP_TUNNEL = 0xd, ++ MLX5_REFORMAT_TYPE_DEL_PSP_TUNNEL = 0xe, + MLX5_REFORMAT_TYPE_INSERT_HDR = 0xf, + MLX5_REFORMAT_TYPE_REMOVE_HDR = 0x10, + MLX5_REFORMAT_TYPE_ADD_MACSEC = 0x11, +@@ -7284,6 +7309,7 @@ enum { + MLX5_ACTION_IN_FIELD_IPSEC_SYNDROME = 0x5D, + MLX5_ACTION_IN_FIELD_OUT_EMD_47_32 = 0x6F, + MLX5_ACTION_IN_FIELD_OUT_EMD_31_0 = 0x70, ++ MLX5_ACTION_IN_FIELD_PSP_SYNDROME = 0x71, + }; + + struct mlx5_ifc_alloc_modify_header_context_out_bits { +@@ -13078,6 +13104,7 @@ enum { + MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_TLS = 0x1, + MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_IPSEC = 0x2, + MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_MACSEC = 0x4, ++ MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_PSP = 0x6, + }; + + struct mlx5_ifc_tls_static_params_bits { +@@ -13495,4 +13522,64 @@ enum mlx5e_pcie_cong_event_mod_field { + MLX5_PCIE_CONG_EVENT_MOD_THRESH = BIT(2), + }; + ++struct mlx5_ifc_psp_rotate_key_in_bits { ++ u8 opcode[0x10]; ++ u8 uid[0x10]; ++ ++ u8 reserved_at_20[0x10]; ++ u8 op_mod[0x10]; ++ ++ u8 reserved_at_40[0x40]; ++}; ++ ++struct mlx5_ifc_psp_rotate_key_out_bits { ++ u8 status[0x8]; ++ u8 reserved_at_8[0x18]; ++ ++ u8 syndrome[0x20]; ++ ++ u8 reserved_at_40[0x40]; ++}; ++ ++enum mlx5_psp_gen_spi_in_key_size { ++ MLX5_PSP_GEN_SPI_IN_KEY_SIZE_128 = 0x0, ++ MLX5_PSP_GEN_SPI_IN_KEY_SIZE_256 = 0x1, ++}; ++ ++struct mlx5_ifc_key_spi_bits { ++ u8 spi[0x20]; ++ ++ u8 reserved_at_20[0x60]; ++ ++ u8 key[8][0x20]; ++}; ++ ++struct mlx5_ifc_psp_gen_spi_in_bits { ++ u8 opcode[0x10]; ++ u8 uid[0x10]; ++ ++ u8 reserved_at_20[0x10]; ++ u8 op_mod[0x10]; ++ ++ u8 reserved_at_40[0x20]; ++ ++ u8 key_size[0x2]; ++ u8 reserved_at_62[0xe]; ++ u8 num_of_spi[0x10]; ++}; ++ ++struct mlx5_ifc_psp_gen_spi_out_bits { ++ u8 status[0x8]; ++ u8 reserved_at_8[0x18]; ++ ++ u8 syndrome[0x20]; ++ ++ u8 reserved_at_40[0x10]; ++ u8 num_of_spi[0x10]; ++ ++ u8 reserved_at_60[0x20]; ++ ++ struct mlx5_ifc_key_spi_bits key_spi[]; ++}; ++ + #endif /* MLX5_IFC_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1525-net-mlx5-extract-mtctr-register-read-logic-into-helper-funct.patch b/SOURCES/1525-net-mlx5-extract-mtctr-register-read-logic-into-helper-funct.patch new file mode 100644 index 000000000..1842d738e --- /dev/null +++ b/SOURCES/1525-net-mlx5-extract-mtctr-register-read-logic-into-helper-funct.patch @@ -0,0 +1,96 @@ +From e06059ef23979150acdcb02c9d2f8d8d864317fa Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: Extract MTCTR register read logic into helper + function + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 96c345c3c54c31abf8ba04c241b8fe26fa0ab022 +Author: Carolina Jubran +Date: Tue Aug 12 17:17:07 2025 +0300 + + net/mlx5: Extract MTCTR register read logic into helper function + + Refactor the MTCTR register reading logic into a dedicated helper to + lay the groundwork for the next patch. + + Signed-off-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1755008228-88881-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +index 214d732d18e9..9b49bdc339ad 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +@@ -247,27 +247,24 @@ static bool mlx5_is_ptm_source_time_available(struct mlx5_core_dev *dev) + return !!MLX5_GET(mtptm_reg, out, psta); + } + +-static int mlx5_mtctr_syncdevicetime(ktime_t *device_time, +- struct system_counterval_t *sys_counterval, +- void *ctx) ++static int mlx5_mtctr_read(struct mlx5_core_dev *mdev, ++ bool real_time_mode, ++ struct system_counterval_t *sys_counterval, ++ u64 *device) + { + u32 out[MLX5_ST_SZ_DW(mtctr_reg)] = {0}; + u32 in[MLX5_ST_SZ_DW(mtctr_reg)] = {0}; +- struct mlx5_core_dev *mdev = ctx; +- bool real_time_mode; +- u64 host, device; ++ u64 host; + int err; + +- real_time_mode = mlx5_real_time_mode(mdev); +- + MLX5_SET(mtctr_reg, in, first_clock_timestamp_request, + MLX5_MTCTR_REQUEST_PTM_ROOT_CLOCK); + MLX5_SET(mtctr_reg, in, second_clock_timestamp_request, + real_time_mode ? MLX5_MTCTR_REQUEST_REAL_TIME_CLOCK : +- MLX5_MTCTR_REQUEST_FREE_RUNNING_COUNTER); ++ MLX5_MTCTR_REQUEST_FREE_RUNNING_COUNTER); + +- err = mlx5_core_access_reg(mdev, in, sizeof(in), out, sizeof(out), MLX5_REG_MTCTR, +- 0, 0); ++ err = mlx5_core_access_reg(mdev, in, sizeof(in), out, sizeof(out), ++ MLX5_REG_MTCTR, 0, 0); + if (err) + return err; + +@@ -281,8 +278,26 @@ static int mlx5_mtctr_syncdevicetime(ktime_t *device_time, + .cs_id = CSID_X86_ART, + .use_nsecs = true, + }; ++ *device = MLX5_GET64(mtctr_reg, out, second_clock_timestamp); ++ ++ return 0; ++} ++ ++static int mlx5_mtctr_syncdevicetime(ktime_t *device_time, ++ struct system_counterval_t *sys_counterval, ++ void *ctx) ++{ ++ struct mlx5_core_dev *mdev = ctx; ++ bool real_time_mode; ++ u64 device; ++ int err; ++ ++ real_time_mode = mlx5_real_time_mode(mdev); ++ ++ err = mlx5_mtctr_read(mdev, real_time_mode, sys_counterval, &device); ++ if (err) ++ return err; + +- device = MLX5_GET64(mtctr_reg, out, second_clock_timestamp); + if (real_time_mode) + *device_time = ns_to_ktime(REAL_TIME_TO_NS(device >> 32, device & U32_MAX)); + else +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1526-net-mlx5-support-getcyclesx-and-getcrosscycles.patch b/SOURCES/1526-net-mlx5-support-getcyclesx-and-getcrosscycles.patch new file mode 100644 index 000000000..a7f2fb69f --- /dev/null +++ b/SOURCES/1526-net-mlx5-support-getcyclesx-and-getcrosscycles.patch @@ -0,0 +1,148 @@ +From f3f5c68e707d79c44f1ad12f7449fa6a93c3eb6e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: Support getcyclesx and getcrosscycles +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a3fb485505caeadb559029900f5f37a332ae54e0 +Author: Carolina Jubran +Date: Tue Aug 12 17:17:08 2025 +0300 + + net/mlx5: Support getcyclesx and getcrosscycles + + Implement the getcyclesx64 and getcrosscycles callbacks in ptp_info to + expose the device’s raw free-running counter. + + Signed-off-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1755008228-88881-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +index 9b49bdc339ad..7ad3baca99de 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +@@ -306,6 +306,23 @@ static int mlx5_mtctr_syncdevicetime(ktime_t *device_time, + return 0; + } + ++static int ++mlx5_mtctr_syncdevicecyclestime(ktime_t *device_time, ++ struct system_counterval_t *sys_counterval, ++ void *ctx) ++{ ++ struct mlx5_core_dev *mdev = ctx; ++ u64 device; ++ int err; ++ ++ err = mlx5_mtctr_read(mdev, false, sys_counterval, &device); ++ if (err) ++ return err; ++ *device_time = ns_to_ktime(device); ++ ++ return 0; ++} ++ + static int mlx5_ptp_getcrosststamp(struct ptp_clock_info *ptp, + struct system_device_crosststamp *cts) + { +@@ -330,6 +347,32 @@ static int mlx5_ptp_getcrosststamp(struct ptp_clock_info *ptp, + mlx5_clock_unlock(clock); + return err; + } ++ ++static int mlx5_ptp_getcrosscycles(struct ptp_clock_info *ptp, ++ struct system_device_crosststamp *cts) ++{ ++ struct mlx5_clock *clock = ++ container_of(ptp, struct mlx5_clock, ptp_info); ++ struct system_time_snapshot history_begin = {0}; ++ struct mlx5_core_dev *mdev; ++ int err; ++ ++ mlx5_clock_lock(clock); ++ mdev = mlx5_clock_mdev_get(clock); ++ ++ if (!mlx5_is_ptm_source_time_available(mdev)) { ++ err = -EBUSY; ++ goto unlock; ++ } ++ ++ ktime_get_snapshot(&history_begin); ++ ++ err = get_device_system_crosststamp(mlx5_mtctr_syncdevicecyclestime, ++ mdev, &history_begin, cts); ++unlock: ++ mlx5_clock_unlock(clock); ++ return err; ++} + #endif /* CONFIG_X86 */ + + static u64 mlx5_read_time(struct mlx5_core_dev *dev, +@@ -528,6 +571,24 @@ static int mlx5_ptp_gettimex(struct ptp_clock_info *ptp, struct timespec64 *ts, + return 0; + } + ++static int mlx5_ptp_getcyclesx(struct ptp_clock_info *ptp, ++ struct timespec64 *ts, ++ struct ptp_system_timestamp *sts) ++{ ++ struct mlx5_clock *clock = container_of(ptp, struct mlx5_clock, ++ ptp_info); ++ struct mlx5_core_dev *mdev; ++ u64 cycles; ++ ++ mlx5_clock_lock(clock); ++ mdev = mlx5_clock_mdev_get(clock); ++ ++ cycles = mlx5_read_time(mdev, sts, false); ++ *ts = ns_to_timespec64(cycles); ++ mlx5_clock_unlock(clock); ++ return 0; ++} ++ + static int mlx5_ptp_adjtime_real_time(struct mlx5_core_dev *mdev, s64 delta) + { + u32 in[MLX5_ST_SZ_DW(mtutc_reg)] = {}; +@@ -1244,6 +1305,7 @@ static void mlx5_init_timer_max_freq_adjustment(struct mlx5_core_dev *mdev) + static void mlx5_init_timer_clock(struct mlx5_core_dev *mdev) + { + struct mlx5_clock *clock = mdev->clock; ++ bool expose_cycles; + + /* Configure the PHC */ + clock->ptp_info = mlx5_ptp_clock_info; +@@ -1251,12 +1313,22 @@ static void mlx5_init_timer_clock(struct mlx5_core_dev *mdev) + if (MLX5_CAP_MCAM_REG(mdev, mtutc)) + mlx5_init_timer_max_freq_adjustment(mdev); + ++ expose_cycles = !MLX5_CAP_GEN(mdev, disciplined_fr_counter) || ++ !mlx5_real_time_mode(mdev); ++ + #ifdef CONFIG_X86 + if (MLX5_CAP_MCAM_REG3(mdev, mtptm) && +- MLX5_CAP_MCAM_REG3(mdev, mtctr) && boot_cpu_has(X86_FEATURE_ART)) ++ MLX5_CAP_MCAM_REG3(mdev, mtctr) && boot_cpu_has(X86_FEATURE_ART)) { + clock->ptp_info.getcrosststamp = mlx5_ptp_getcrosststamp; ++ if (expose_cycles) ++ clock->ptp_info.getcrosscycles = ++ mlx5_ptp_getcrosscycles; ++ } + #endif /* CONFIG_X86 */ + ++ if (expose_cycles) ++ clock->ptp_info.getcyclesx64 = mlx5_ptp_getcyclesx; ++ + mlx5_timecounter_init(mdev); + mlx5_init_clock_info(mdev); + mlx5_init_overflow_period(mdev); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1527-net-mlx5-add-rs-fec-histogram-infrastructure.patch b/SOURCES/1527-net-mlx5-add-rs-fec-histogram-infrastructure.patch new file mode 100644 index 000000000..fd9c20042 --- /dev/null +++ b/SOURCES/1527-net-mlx5-add-rs-fec-histogram-infrastructure.patch @@ -0,0 +1,116 @@ +From 90c470d8b4225f84cf2065766d96aa95d8ff3af6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: Add RS FEC histogram infrastructure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ff97bc38be343e4530e2f140b40cbdce2e09152f +Author: Carolina Jubran +Date: Wed Sep 3 10:30:00 2025 +0300 + + net/mlx5: Add RS FEC histogram infrastructure + + Define the Ports Phy Histogram Configuration Register (PPHCR) to expose + RS-FEC histogram bin ranges, and expose a new counter group in the Ports + Performance Counters Register (PPCNT) to report the corresponding + histogram values. + + Co-developed-by: Yael Chemla + Signed-off-by: Yael Chemla + Signed-off-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1756884600-520195-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h +index 04ceff52e19b..6644864b1972 100644 +--- a/include/linux/mlx5/device.h ++++ b/include/linux/mlx5/device.h +@@ -1515,6 +1515,7 @@ enum { + MLX5_PHYSICAL_LAYER_RECOVERY_GROUP = 0x1a, + MLX5_INFINIBAND_PORT_COUNTERS_GROUP = 0x20, + MLX5_INFINIBAND_EXTENDED_PORT_COUNTERS_GROUP = 0x21, ++ MLX5_RS_FEC_HISTOGRAM_GROUP = 0x23, + }; + + enum { +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 10fe492e1fed..89d020a4f572 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -130,6 +130,7 @@ enum { + MLX5_REG_PDDR = 0x5031, + MLX5_REG_PMLP = 0x5002, + MLX5_REG_PPLM = 0x5023, ++ MLX5_REG_PPHCR = 0x503E, + MLX5_REG_PCAM = 0x507f, + MLX5_REG_NODE_DESC = 0x6001, + MLX5_REG_HOST_ENDIANNESS = 0x7004, +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 5934c24d8dcb..961e9c76c6c5 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -4900,6 +4900,11 @@ union mlx5_ifc_field_select_802_1_r_roce_auto_bits { + u8 reserved_at_0[0x20]; + }; + ++struct mlx5_ifc_rs_histogram_cntrs_bits { ++ u8 hist[16][0x40]; ++ u8 reserved_at_400[0x2c0]; ++}; ++ + union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits { + struct mlx5_ifc_eth_802_3_cntrs_grp_data_layout_bits eth_802_3_cntrs_grp_data_layout; + struct mlx5_ifc_eth_2863_cntrs_grp_data_layout_bits eth_2863_cntrs_grp_data_layout; +@@ -4914,6 +4919,7 @@ union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits { + struct mlx5_ifc_phys_layer_cntrs_bits phys_layer_cntrs; + struct mlx5_ifc_phys_layer_statistical_cntrs_bits phys_layer_statistical_cntrs; + struct mlx5_ifc_phys_layer_recovery_cntrs_bits phys_layer_recovery_cntrs; ++ struct mlx5_ifc_rs_histogram_cntrs_bits rs_histogram_cntrs; + u8 reserved_at_0[0x7c0]; + }; + +@@ -11737,6 +11743,28 @@ struct mlx5_ifc_mtctr_reg_bits { + u8 second_clock_timestamp[0x40]; + }; + ++struct mlx5_ifc_bin_range_layout_bits { ++ u8 reserved_at_0[0xa]; ++ u8 high_val[0x6]; ++ u8 reserved_at_10[0xa]; ++ u8 low_val[0x6]; ++}; ++ ++struct mlx5_ifc_pphcr_reg_bits { ++ u8 active_hist_type[0x4]; ++ u8 reserved_at_4[0x4]; ++ u8 local_port[0x8]; ++ u8 reserved_at_10[0x10]; ++ ++ u8 reserved_at_20[0x8]; ++ u8 num_of_bins[0x8]; ++ u8 reserved_at_30[0x10]; ++ ++ u8 reserved_at_40[0x40]; ++ ++ struct mlx5_ifc_bin_range_layout_bits bin_range[16]; ++}; ++ + union mlx5_ifc_ports_control_registers_document_bits { + struct mlx5_ifc_bufferx_reg_bits bufferx_reg; + struct mlx5_ifc_eth_2819_cntrs_grp_data_layout_bits eth_2819_cntrs_grp_data_layout; +@@ -11803,6 +11831,7 @@ union mlx5_ifc_ports_control_registers_document_bits { + struct mlx5_ifc_mtmp_reg_bits mtmp_reg; + struct mlx5_ifc_mtptm_reg_bits mtptm_reg; + struct mlx5_ifc_mtctr_reg_bits mtctr_reg; ++ struct mlx5_ifc_pphcr_reg_bits pphcr_reg; + u8 reserved_at_0[0x60e0]; + }; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1528-net-mlx5-implement-cqe-compress-type-via-devlink-params.patch b/SOURCES/1528-net-mlx5-implement-cqe-compress-type-via-devlink-params.patch new file mode 100644 index 000000000..559cf099e --- /dev/null +++ b/SOURCES/1528-net-mlx5-implement-cqe-compress-type-via-devlink-params.patch @@ -0,0 +1,399 @@ +From 40ba4249f330045380f077e35929e6bb548f4c1e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: Implement cqe_compress_type via devlink params + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit bf2da4799fdb6eb58d9c9541b7dc1096c260499d +Author: Saeed Mahameed +Date: Sat Sep 6 18:29:44 2025 -0700 + + net/mlx5: Implement cqe_compress_type via devlink params + + Selects which algorithm should be used by the NIC in order to decide rate of + CQE compression dependeng on PCIe bus conditions. + + Supported values: + + 1) balanced, merges fewer CQEs, resulting in a moderate compression ratio + but maintaining a balance between bandwidth savings and performance + 2) aggressive, merges more CQEs into a single entry, achieving a higher + compression rate and maximizing performance, particularly under high + traffic loads. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Jiri Pirko + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250907012953.301746-3-saeed@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst +index 7febe0aecd53..2edc842b620d 100644 +--- a/Documentation/networking/devlink/mlx5.rst ++++ b/Documentation/networking/devlink/mlx5.rst +@@ -117,6 +117,16 @@ parameters. + - driverinit + - Control the size (in packets) of the hairpin queues. + ++ * - ``cqe_compress_type`` ++ - string ++ - permanent ++ - Configure which mechanism/algorithm should be used by the NIC that will ++ affect the rate (aggressiveness) of compressed CQEs depending on PCIe bus ++ conditions and other internal NIC factors. This mode affects all queues ++ that enable compression. ++ * ``balanced`` : Merges fewer CQEs, resulting in a moderate compression ratio but maintaining a balance between bandwidth savings and performance ++ * ``aggressive`` : Merges more CQEs into a single entry, achieving a higher compression rate and maximizing performance, particularly under high traffic loads ++ + The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` + + Info versions +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +index a65ab661375a..d77696f46eb5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile ++++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile +@@ -17,7 +17,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ + fs_counters.o fs_ft_pool.o rl.o lag/debugfs.o lag/lag.o dev.o events.o wq.o lib/gid.o \ + lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o diag/fs_tracepoint.o \ + diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o diag/reporter_vnic.o \ +- fw_reset.o qos.o lib/tout.o lib/aso.o wc.o fs_pool.o ++ fw_reset.o qos.o lib/tout.o lib/aso.o wc.o fs_pool.o lib/nv_param.o + + # + # Netdev basic +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index 8c53fe5aa306..0c0f7231cb2a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -10,6 +10,7 @@ + #include "esw/qos.h" + #include "sf/dev/dev.h" + #include "sf/sf.h" ++#include "lib/nv_param.h" + + static int mlx5_devlink_flash_update(struct devlink *devlink, + struct devlink_flash_update_params *params, +@@ -895,8 +896,14 @@ int mlx5_devlink_params_register(struct devlink *devlink) + if (err) + goto max_uc_list_err; + ++ err = mlx5_nv_param_register_dl_params(devlink); ++ if (err) ++ goto nv_param_err; ++ + return 0; + ++nv_param_err: ++ mlx5_devlink_max_uc_list_params_unregister(devlink); + max_uc_list_err: + mlx5_devlink_auxdev_params_unregister(devlink); + auxdev_reg_err: +@@ -907,6 +914,7 @@ int mlx5_devlink_params_register(struct devlink *devlink) + + void mlx5_devlink_params_unregister(struct devlink *devlink) + { ++ mlx5_nv_param_unregister_dl_params(devlink); + mlx5_devlink_max_uc_list_params_unregister(devlink); + mlx5_devlink_auxdev_params_unregister(devlink); + devl_params_unregister(devlink, mlx5_devlink_params, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h +index 961f75da6227..74bcdfa70361 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h +@@ -22,6 +22,7 @@ enum mlx5_devlink_param_id { + MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT, + MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, + MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE, ++ MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE + }; + + struct mlx5_trap_ctx { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +new file mode 100644 +index 000000000000..20a39483be04 +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +@@ -0,0 +1,245 @@ ++// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB ++/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ ++ ++#include "nv_param.h" ++#include "mlx5_core.h" ++ ++enum { ++ MLX5_CLASS_0_CTRL_ID_NV_SW_OFFLOAD_CONFIG = 0x10a, ++}; ++ ++struct mlx5_ifc_configuration_item_type_class_global_bits { ++ u8 type_class[0x8]; ++ u8 parameter_index[0x18]; ++}; ++ ++union mlx5_ifc_config_item_type_auto_bits { ++ struct mlx5_ifc_configuration_item_type_class_global_bits ++ configuration_item_type_class_global; ++ u8 reserved_at_0[0x20]; ++}; ++ ++struct mlx5_ifc_config_item_bits { ++ u8 valid[0x2]; ++ u8 priority[0x2]; ++ u8 header_type[0x2]; ++ u8 ovr_en[0x1]; ++ u8 rd_en[0x1]; ++ u8 access_mode[0x2]; ++ u8 reserved_at_a[0x1]; ++ u8 writer_id[0x5]; ++ u8 version[0x4]; ++ u8 reserved_at_14[0x2]; ++ u8 host_id_valid[0x1]; ++ u8 length[0x9]; ++ ++ union mlx5_ifc_config_item_type_auto_bits type; ++ ++ u8 reserved_at_40[0x10]; ++ u8 crc16[0x10]; ++}; ++ ++struct mlx5_ifc_mnvda_reg_bits { ++ struct mlx5_ifc_config_item_bits configuration_item_header; ++ ++ u8 configuration_item_data[64][0x20]; ++}; ++ ++struct mlx5_ifc_nv_sw_offload_conf_bits { ++ u8 ip_over_vxlan_port[0x10]; ++ u8 tunnel_ecn_copy_offload_disable[0x1]; ++ u8 pci_atomic_mode[0x3]; ++ u8 sr_enable[0x1]; ++ u8 ptp_cyc2realtime[0x1]; ++ u8 vector_calc_disable[0x1]; ++ u8 uctx_en[0x1]; ++ u8 prio_tag_required_en[0x1]; ++ u8 esw_fdb_ipv4_ttl_modify_enable[0x1]; ++ u8 mkey_by_name[0x1]; ++ u8 ip_over_vxlan_en[0x1]; ++ u8 one_qp_per_recovery[0x1]; ++ u8 cqe_compression[0x3]; ++ u8 tunnel_udp_entropy_proto_disable[0x1]; ++ u8 reserved_at_21[0x1]; ++ u8 ar_enable[0x1]; ++ u8 log_max_outstanding_wqe[0x5]; ++ u8 vf_migration[0x2]; ++ u8 log_tx_psn_win[0x6]; ++ u8 lro_log_timeout3[0x4]; ++ u8 lro_log_timeout2[0x4]; ++ u8 lro_log_timeout1[0x4]; ++ u8 lro_log_timeout0[0x4]; ++}; ++ ++#define MNVDA_HDR_SZ \ ++ (MLX5_ST_SZ_BYTES(mnvda_reg) - \ ++ MLX5_BYTE_OFF(mnvda_reg, configuration_item_data)) ++ ++#define MLX5_SET_CFG_ITEM_TYPE(_cls_name, _mnvda_ptr, _field, _val) \ ++ MLX5_SET(mnvda_reg, _mnvda_ptr, \ ++ configuration_item_header.type.configuration_item_type_class_##_cls_name._field, \ ++ _val) ++ ++#define MLX5_SET_CFG_HDR_LEN(_mnvda_ptr, _cls_name) \ ++ MLX5_SET(mnvda_reg, _mnvda_ptr, configuration_item_header.length, \ ++ MLX5_ST_SZ_BYTES(_cls_name)) ++ ++#define MLX5_GET_CFG_HDR_LEN(_mnvda_ptr) \ ++ MLX5_GET(mnvda_reg, _mnvda_ptr, configuration_item_header.length) ++ ++static int mlx5_nv_param_read(struct mlx5_core_dev *dev, void *mnvda, ++ size_t len) ++{ ++ u32 param_idx, type_class; ++ u32 header_len; ++ void *cls_ptr; ++ int err; ++ ++ if (WARN_ON(len > MLX5_ST_SZ_BYTES(mnvda_reg)) || len < MNVDA_HDR_SZ) ++ return -EINVAL; /* A caller bug */ ++ ++ err = mlx5_core_access_reg(dev, mnvda, len, mnvda, len, MLX5_REG_MNVDA, ++ 0, 0); ++ if (!err) ++ return 0; ++ ++ cls_ptr = MLX5_ADDR_OF(mnvda_reg, mnvda, ++ configuration_item_header.type.configuration_item_type_class_global); ++ ++ type_class = MLX5_GET(configuration_item_type_class_global, cls_ptr, ++ type_class); ++ param_idx = MLX5_GET(configuration_item_type_class_global, cls_ptr, ++ parameter_index); ++ header_len = MLX5_GET_CFG_HDR_LEN(mnvda); ++ ++ mlx5_core_warn(dev, "Failed to read mnvda reg: type_class 0x%x, param_idx 0x%x, header_len %u, err %d\n", ++ type_class, param_idx, header_len, err); ++ ++ return -EOPNOTSUPP; ++} ++ ++static int mlx5_nv_param_write(struct mlx5_core_dev *dev, void *mnvda, ++ size_t len) ++{ ++ if (WARN_ON(len > MLX5_ST_SZ_BYTES(mnvda_reg)) || len < MNVDA_HDR_SZ) ++ return -EINVAL; ++ ++ if (WARN_ON(MLX5_GET_CFG_HDR_LEN(mnvda) == 0)) ++ return -EINVAL; ++ ++ return mlx5_core_access_reg(dev, mnvda, len, mnvda, len, MLX5_REG_MNVDA, ++ 0, 1); ++} ++ ++static int ++mlx5_nv_param_read_sw_offload_conf(struct mlx5_core_dev *dev, void *mnvda, ++ size_t len) ++{ ++ MLX5_SET_CFG_ITEM_TYPE(global, mnvda, type_class, 0); ++ MLX5_SET_CFG_ITEM_TYPE(global, mnvda, parameter_index, ++ MLX5_CLASS_0_CTRL_ID_NV_SW_OFFLOAD_CONFIG); ++ MLX5_SET_CFG_HDR_LEN(mnvda, nv_sw_offload_conf); ++ ++ return mlx5_nv_param_read(dev, mnvda, len); ++} ++ ++static const char *const ++ cqe_compress_str[] = { "balanced", "aggressive" }; ++ ++static int ++mlx5_nv_param_devlink_cqe_compress_get(struct devlink *devlink, u32 id, ++ struct devlink_param_gset_ctx *ctx) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {}; ++ u8 value = U8_MAX; ++ void *data; ++ int err; ++ ++ err = mlx5_nv_param_read_sw_offload_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ value = MLX5_GET(nv_sw_offload_conf, data, cqe_compression); ++ ++ if (value >= ARRAY_SIZE(cqe_compress_str)) ++ return -EOPNOTSUPP; ++ ++ strscpy(ctx->val.vstr, cqe_compress_str[value], sizeof(ctx->val.vstr)); ++ return 0; ++} ++ ++static int ++mlx5_nv_param_devlink_cqe_compress_validate(struct devlink *devlink, u32 id, ++ union devlink_param_value val, ++ struct netlink_ext_ack *extack) ++{ ++ int i; ++ ++ for (i = 0; i < ARRAY_SIZE(cqe_compress_str); i++) { ++ if (!strcmp(val.vstr, cqe_compress_str[i])) ++ return 0; ++ } ++ ++ NL_SET_ERR_MSG_MOD(extack, ++ "Invalid value, supported values are balanced/aggressive"); ++ return -EOPNOTSUPP; ++} ++ ++static int ++mlx5_nv_param_devlink_cqe_compress_set(struct devlink *devlink, u32 id, ++ struct devlink_param_gset_ctx *ctx, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {}; ++ int err = 0; ++ void *data; ++ u8 value; ++ ++ if (!strcmp(ctx->val.vstr, "aggressive")) ++ value = 1; ++ else /* balanced: can't be anything else already validated above */ ++ value = 0; ++ ++ err = mlx5_nv_param_read_sw_offload_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Failed to read sw_offload_conf mnvda reg"); ++ return err; ++ } ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ MLX5_SET(nv_sw_offload_conf, data, cqe_compression, value); ++ ++ return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); ++} ++ ++static const struct devlink_param mlx5_nv_param_devlink_params[] = { ++ DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE, ++ "cqe_compress_type", DEVLINK_PARAM_TYPE_STRING, ++ BIT(DEVLINK_PARAM_CMODE_PERMANENT), ++ mlx5_nv_param_devlink_cqe_compress_get, ++ mlx5_nv_param_devlink_cqe_compress_set, ++ mlx5_nv_param_devlink_cqe_compress_validate), ++}; ++ ++int mlx5_nv_param_register_dl_params(struct devlink *devlink) ++{ ++ if (!mlx5_core_is_pf(devlink_priv(devlink))) ++ return 0; ++ ++ return devl_params_register(devlink, mlx5_nv_param_devlink_params, ++ ARRAY_SIZE(mlx5_nv_param_devlink_params)); ++} ++ ++void mlx5_nv_param_unregister_dl_params(struct devlink *devlink) ++{ ++ if (!mlx5_core_is_pf(devlink_priv(devlink))) ++ return; ++ ++ devl_params_unregister(devlink, mlx5_nv_param_devlink_params, ++ ARRAY_SIZE(mlx5_nv_param_devlink_params)); ++} ++ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.h +new file mode 100644 +index 000000000000..9f4922ff7745 +--- /dev/null ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.h +@@ -0,0 +1,14 @@ ++/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ ++/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ ++ ++#ifndef __MLX5_NV_PARAM_H ++#define __MLX5_NV_PARAM_H ++ ++#include ++#include "devlink.h" ++ ++int mlx5_nv_param_register_dl_params(struct devlink *devlink); ++void mlx5_nv_param_unregister_dl_params(struct devlink *devlink); ++ ++#endif ++ +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 89d020a4f572..c5106eb8b413 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -137,6 +137,7 @@ enum { + MLX5_REG_MTCAP = 0x9009, + MLX5_REG_MTMP = 0x900A, + MLX5_REG_MCIA = 0x9014, ++ MLX5_REG_MNVDA = 0x9024, + MLX5_REG_MFRL = 0x9028, + MLX5_REG_MLCR = 0x902b, + MLX5_REG_MRTC = 0x902d, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1529-net-mlx5-implement-devlink-enable-sriov-parameter.patch b/SOURCES/1529-net-mlx5-implement-devlink-enable-sriov-parameter.patch new file mode 100644 index 000000000..6a032aa7c --- /dev/null +++ b/SOURCES/1529-net-mlx5-implement-devlink-enable-sriov-parameter.patch @@ -0,0 +1,308 @@ +From 9392c0b3d4463777bea3276ddbd9ef0dded68854 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: Implement devlink enable_sriov parameter + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 95a0af146dff5437acb4ea27eacc05aa22c7bb54 +Author: Vlad Dumitrescu +Date: Sat Sep 6 18:29:45 2025 -0700 + + net/mlx5: Implement devlink enable_sriov parameter + + Example usage: + devlink dev param set pci/0000:01:00.0 name enable_sriov value {true, false} cmode permanent + devlink dev reload pci/0000:01:00.0 action fw_activate + echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove + echo 1 >/sys/bus/pci/rescan + grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_* + + Signed-off-by: Vlad Dumitrescu + Tested-by: Kamal Heib + Reviewed-by: Jiri Pirko + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250907012953.301746-4-saeed@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst +index 2edc842b620d..c3610f7c1d4b 100644 +--- a/Documentation/networking/devlink/mlx5.rst ++++ b/Documentation/networking/devlink/mlx5.rst +@@ -15,23 +15,31 @@ Parameters + * - Name + - Mode + - Validation ++ - Notes + * - ``enable_roce`` + - driverinit +- - Type: Boolean +- +- If the device supports RoCE disablement, RoCE enablement state controls ++ - Boolean ++ - If the device supports RoCE disablement, RoCE enablement state controls + device support for RoCE capability. Otherwise, the control occurs in the + driver stack. When RoCE is disabled at the driver level, only raw + ethernet QPs are supported. + * - ``io_eq_size`` + - driverinit + - The range is between 64 and 4096. ++ - + * - ``event_eq_size`` + - driverinit + - The range is between 64 and 4096. ++ - + * - ``max_macs`` + - driverinit + - The range is between 1 and 2^31. Only power of 2 values are supported. ++ - ++ * - ``enable_sriov`` ++ - permanent ++ - Boolean ++ - Applies to each physical function (PF) independently, if the device ++ supports it. Otherwise, it applies symmetrically to all PFs. + + The ``mlx5`` driver also implements the following driver-specific + parameters. +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +index 20a39483be04..ed2129843ec7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +@@ -5,7 +5,11 @@ + #include "mlx5_core.h" + + enum { ++ MLX5_CLASS_0_CTRL_ID_NV_GLOBAL_PCI_CONF = 0x80, ++ MLX5_CLASS_0_CTRL_ID_NV_GLOBAL_PCI_CAP = 0x81, + MLX5_CLASS_0_CTRL_ID_NV_SW_OFFLOAD_CONFIG = 0x10a, ++ ++ MLX5_CLASS_3_CTRL_ID_NV_PF_PCI_CONF = 0x80, + }; + + struct mlx5_ifc_configuration_item_type_class_global_bits { +@@ -13,9 +17,18 @@ struct mlx5_ifc_configuration_item_type_class_global_bits { + u8 parameter_index[0x18]; + }; + ++struct mlx5_ifc_configuration_item_type_class_per_host_pf_bits { ++ u8 type_class[0x8]; ++ u8 pf_index[0x6]; ++ u8 pci_bus_index[0x8]; ++ u8 parameter_index[0xa]; ++}; ++ + union mlx5_ifc_config_item_type_auto_bits { + struct mlx5_ifc_configuration_item_type_class_global_bits + configuration_item_type_class_global; ++ struct mlx5_ifc_configuration_item_type_class_per_host_pf_bits ++ configuration_item_type_class_per_host_pf; + u8 reserved_at_0[0x20]; + }; + +@@ -45,6 +58,45 @@ struct mlx5_ifc_mnvda_reg_bits { + u8 configuration_item_data[64][0x20]; + }; + ++struct mlx5_ifc_nv_global_pci_conf_bits { ++ u8 sriov_valid[0x1]; ++ u8 reserved_at_1[0x10]; ++ u8 per_pf_total_vf[0x1]; ++ u8 reserved_at_12[0xe]; ++ ++ u8 sriov_en[0x1]; ++ u8 reserved_at_21[0xf]; ++ u8 total_vfs[0x10]; ++ ++ u8 reserved_at_40[0x20]; ++}; ++ ++struct mlx5_ifc_nv_global_pci_cap_bits { ++ u8 max_vfs_per_pf_valid[0x1]; ++ u8 reserved_at_1[0x13]; ++ u8 per_pf_total_vf_supported[0x1]; ++ u8 reserved_at_15[0xb]; ++ ++ u8 sriov_support[0x1]; ++ u8 reserved_at_21[0xf]; ++ u8 max_vfs_per_pf[0x10]; ++ ++ u8 reserved_at_40[0x60]; ++}; ++ ++struct mlx5_ifc_nv_pf_pci_conf_bits { ++ u8 reserved_at_0[0x9]; ++ u8 pf_total_vf_en[0x1]; ++ u8 reserved_at_a[0x16]; ++ ++ u8 reserved_at_20[0x20]; ++ ++ u8 reserved_at_40[0x10]; ++ u8 total_vf[0x10]; ++ ++ u8 reserved_at_60[0x20]; ++}; ++ + struct mlx5_ifc_nv_sw_offload_conf_bits { + u8 ip_over_vxlan_port[0x10]; + u8 tunnel_ecn_copy_offload_disable[0x1]; +@@ -216,7 +268,154 @@ mlx5_nv_param_devlink_cqe_compress_set(struct devlink *devlink, u32 id, + return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); + } + ++static int mlx5_nv_param_read_global_pci_conf(struct mlx5_core_dev *dev, ++ void *mnvda, size_t len) ++{ ++ MLX5_SET_CFG_ITEM_TYPE(global, mnvda, type_class, 0); ++ MLX5_SET_CFG_ITEM_TYPE(global, mnvda, parameter_index, ++ MLX5_CLASS_0_CTRL_ID_NV_GLOBAL_PCI_CONF); ++ MLX5_SET_CFG_HDR_LEN(mnvda, nv_global_pci_conf); ++ ++ return mlx5_nv_param_read(dev, mnvda, len); ++} ++ ++static int mlx5_nv_param_read_global_pci_cap(struct mlx5_core_dev *dev, ++ void *mnvda, size_t len) ++{ ++ MLX5_SET_CFG_ITEM_TYPE(global, mnvda, type_class, 0); ++ MLX5_SET_CFG_ITEM_TYPE(global, mnvda, parameter_index, ++ MLX5_CLASS_0_CTRL_ID_NV_GLOBAL_PCI_CAP); ++ MLX5_SET_CFG_HDR_LEN(mnvda, nv_global_pci_cap); ++ ++ return mlx5_nv_param_read(dev, mnvda, len); ++} ++ ++static int mlx5_nv_param_read_per_host_pf_conf(struct mlx5_core_dev *dev, ++ void *mnvda, size_t len) ++{ ++ MLX5_SET_CFG_ITEM_TYPE(per_host_pf, mnvda, type_class, 3); ++ MLX5_SET_CFG_ITEM_TYPE(per_host_pf, mnvda, parameter_index, ++ MLX5_CLASS_3_CTRL_ID_NV_PF_PCI_CONF); ++ MLX5_SET_CFG_HDR_LEN(mnvda, nv_pf_pci_conf); ++ ++ return mlx5_nv_param_read(dev, mnvda, len); ++} ++ ++static int mlx5_devlink_enable_sriov_get(struct devlink *devlink, u32 id, ++ struct devlink_param_gset_ctx *ctx) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {}; ++ bool sriov_en = false; ++ void *data; ++ int err; ++ ++ err = mlx5_nv_param_read_global_pci_cap(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ if (!MLX5_GET(nv_global_pci_cap, data, sriov_support)) { ++ ctx->val.vbool = false; ++ return 0; ++ } ++ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_global_pci_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ sriov_en = MLX5_GET(nv_global_pci_conf, data, sriov_en); ++ if (!MLX5_GET(nv_global_pci_conf, data, per_pf_total_vf)) { ++ ctx->val.vbool = sriov_en; ++ return 0; ++ } ++ ++ /* SRIOV is per PF */ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_per_host_pf_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ ctx->val.vbool = sriov_en && ++ MLX5_GET(nv_pf_pci_conf, data, pf_total_vf_en); ++ return 0; ++} ++ ++static int mlx5_devlink_enable_sriov_set(struct devlink *devlink, u32 id, ++ struct devlink_param_gset_ctx *ctx, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {}; ++ bool per_pf_support; ++ void *cap, *data; ++ int err; ++ ++ err = mlx5_nv_param_read_global_pci_cap(dev, mnvda, sizeof(mnvda)); ++ if (err) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Failed to read global PCI capability"); ++ return err; ++ } ++ ++ cap = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ per_pf_support = MLX5_GET(nv_global_pci_cap, cap, ++ per_pf_total_vf_supported); ++ ++ if (!MLX5_GET(nv_global_pci_cap, cap, sriov_support)) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "SRIOV is not supported on this device"); ++ return -EOPNOTSUPP; ++ } ++ ++ if (!per_pf_support) { ++ /* We don't allow global SRIOV setting on per PF devlink */ ++ NL_SET_ERR_MSG_MOD(extack, ++ "SRIOV is not per PF on this device"); ++ return -EOPNOTSUPP; ++ } ++ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_global_pci_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Unable to read global PCI configuration"); ++ return err; ++ } ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ ++ /* setup per PF sriov mode */ ++ MLX5_SET(nv_global_pci_conf, data, sriov_valid, 1); ++ MLX5_SET(nv_global_pci_conf, data, sriov_en, 1); ++ MLX5_SET(nv_global_pci_conf, data, per_pf_total_vf, 1); ++ ++ err = mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); ++ if (err) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Unable to write global PCI configuration"); ++ return err; ++ } ++ ++ /* enable/disable sriov on this PF */ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_per_host_pf_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Unable to read per host PF configuration"); ++ return err; ++ } ++ MLX5_SET(nv_pf_pci_conf, data, pf_total_vf_en, ctx->val.vbool); ++ return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); ++} ++ + static const struct devlink_param mlx5_nv_param_devlink_params[] = { ++ DEVLINK_PARAM_GENERIC(ENABLE_SRIOV, BIT(DEVLINK_PARAM_CMODE_PERMANENT), ++ mlx5_devlink_enable_sriov_get, ++ mlx5_devlink_enable_sriov_set, NULL), + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE, + "cqe_compress_type", DEVLINK_PARAM_TYPE_STRING, + BIT(DEVLINK_PARAM_CMODE_PERMANENT), +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1530-net-mlx5-implement-devlink-total-vfs-parameter.patch b/SOURCES/1530-net-mlx5-implement-devlink-total-vfs-parameter.patch new file mode 100644 index 000000000..c0616076a --- /dev/null +++ b/SOURCES/1530-net-mlx5-implement-devlink-total-vfs-parameter.patch @@ -0,0 +1,218 @@ +From dd2496e03913db85e628821c349281e8cad5ef4f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5: Implement devlink total_vfs parameter + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a4c49611cf4f7018ee80f02bded12fd4002ef95c +Author: Vlad Dumitrescu +Date: Sat Sep 6 18:29:46 2025 -0700 + + net/mlx5: Implement devlink total_vfs parameter + + Some devices support both symmetric (same value for all PFs) and + asymmetric, while others only support symmetric configuration. This + implementation prefers asymmetric, since it is closer to the devlink + model (per function settings), but falls back to symmetric when needed. + + Example usage: + devlink dev param set pci/0000:01:00.0 name total_vfs value cmode permanent + devlink dev reload pci/0000:01:00.0 action fw_activate + echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove + echo 1 >/sys/bus/pci/rescan + cat /sys/bus/pci/devices/0000:01:00.0/sriov_totalvfs + + Signed-off-by: Vlad Dumitrescu + Reviewed-by: Jiri Pirko + Tested-by: Kamal Heib + Signed-off-by: Saeed Mahameed + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/20250907012953.301746-5-saeed@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst +index c3610f7c1d4b..07b1424cbfbb 100644 +--- a/Documentation/networking/devlink/mlx5.rst ++++ b/Documentation/networking/devlink/mlx5.rst +@@ -40,6 +40,28 @@ Parameters + - Boolean + - Applies to each physical function (PF) independently, if the device + supports it. Otherwise, it applies symmetrically to all PFs. ++ * - ``total_vfs`` ++ - permanent ++ - The range is between 1 and a device-specific max. ++ - Applies to each physical function (PF) independently, if the device ++ supports it. Otherwise, it applies symmetrically to all PFs. ++ ++Note: permanent parameters such as ``enable_sriov`` and ``total_vfs`` require FW reset to take effect ++ ++.. code-block:: bash ++ ++ # setup parameters ++ devlink dev param set pci/0000:01:00.0 name enable_sriov value true cmode permanent ++ devlink dev param set pci/0000:01:00.0 name total_vfs value 8 cmode permanent ++ ++ # Fw reset ++ devlink dev reload pci/0000:01:00.0 action fw_activate ++ ++ # for PCI related config such as sriov PCI reset/rescan is required: ++ echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove ++ echo 1 >/sys/bus/pci/rescan ++ grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_* ++ + + The ``mlx5`` driver also implements the following driver-specific + parameters. +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +index ed2129843ec7..383d8cfe4c0a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +@@ -412,10 +412,142 @@ static int mlx5_devlink_enable_sriov_set(struct devlink *devlink, u32 id, + return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); + } + ++static int mlx5_devlink_total_vfs_get(struct devlink *devlink, u32 id, ++ struct devlink_param_gset_ctx *ctx) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {}; ++ void *data; ++ int err; ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ ++ err = mlx5_nv_param_read_global_pci_cap(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ if (!MLX5_GET(nv_global_pci_cap, data, sriov_support)) { ++ ctx->val.vu32 = 0; ++ return 0; ++ } ++ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_global_pci_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ if (!MLX5_GET(nv_global_pci_conf, data, per_pf_total_vf)) { ++ ctx->val.vu32 = MLX5_GET(nv_global_pci_conf, data, total_vfs); ++ return 0; ++ } ++ ++ /* SRIOV is per PF */ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_per_host_pf_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ ctx->val.vu32 = MLX5_GET(nv_pf_pci_conf, data, total_vf); ++ ++ return 0; ++} ++ ++static int mlx5_devlink_total_vfs_set(struct devlink *devlink, u32 id, ++ struct devlink_param_gset_ctx *ctx, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)]; ++ bool per_pf_support; ++ void *data; ++ int err; ++ ++ err = mlx5_nv_param_read_global_pci_cap(dev, mnvda, sizeof(mnvda)); ++ if (err) { ++ NL_SET_ERR_MSG_MOD(extack, "Failed to read global pci cap"); ++ return err; ++ } ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ if (!MLX5_GET(nv_global_pci_cap, data, sriov_support)) { ++ NL_SET_ERR_MSG_MOD(extack, "Not configurable on this device"); ++ return -EOPNOTSUPP; ++ } ++ ++ per_pf_support = MLX5_GET(nv_global_pci_cap, data, ++ per_pf_total_vf_supported); ++ if (!per_pf_support) { ++ /* We don't allow global SRIOV setting on per PF devlink */ ++ NL_SET_ERR_MSG_MOD(extack, ++ "SRIOV is not per PF on this device"); ++ return -EOPNOTSUPP; ++ } ++ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_global_pci_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ MLX5_SET(nv_global_pci_conf, data, sriov_valid, 1); ++ MLX5_SET(nv_global_pci_conf, data, per_pf_total_vf, per_pf_support); ++ ++ if (!per_pf_support) { ++ MLX5_SET(nv_global_pci_conf, data, total_vfs, ctx->val.vu32); ++ return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); ++ } ++ ++ /* SRIOV is per PF */ ++ err = mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ memset(mnvda, 0, sizeof(mnvda)); ++ err = mlx5_nv_param_read_per_host_pf_conf(dev, mnvda, sizeof(mnvda)); ++ if (err) ++ return err; ++ ++ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data); ++ MLX5_SET(nv_pf_pci_conf, data, total_vf, ctx->val.vu32); ++ return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); ++} ++ ++static int mlx5_devlink_total_vfs_validate(struct devlink *devlink, u32 id, ++ union devlink_param_value val, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ u32 cap[MLX5_ST_SZ_DW(mnvda_reg)]; ++ void *data; ++ u16 max; ++ int err; ++ ++ data = MLX5_ADDR_OF(mnvda_reg, cap, configuration_item_data); ++ ++ err = mlx5_nv_param_read_global_pci_cap(dev, cap, sizeof(cap)); ++ if (err) ++ return err; ++ ++ if (!MLX5_GET(nv_global_pci_cap, data, max_vfs_per_pf_valid)) ++ return 0; /* optimistic, but set might fail later */ ++ ++ max = MLX5_GET(nv_global_pci_cap, data, max_vfs_per_pf); ++ if (val.vu16 > max) { ++ NL_SET_ERR_MSG_FMT_MOD(extack, ++ "Max allowed by device is %u", max); ++ return -EINVAL; ++ } ++ ++ return 0; ++} ++ + static const struct devlink_param mlx5_nv_param_devlink_params[] = { + DEVLINK_PARAM_GENERIC(ENABLE_SRIOV, BIT(DEVLINK_PARAM_CMODE_PERMANENT), + mlx5_devlink_enable_sriov_get, + mlx5_devlink_enable_sriov_set, NULL), ++ DEVLINK_PARAM_GENERIC(TOTAL_VFS, BIT(DEVLINK_PARAM_CMODE_PERMANENT), ++ mlx5_devlink_total_vfs_get, ++ mlx5_devlink_total_vfs_set, ++ mlx5_devlink_total_vfs_validate), + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE, + "cqe_compress_type", DEVLINK_PARAM_TYPE_STRING, + BIT(DEVLINK_PARAM_CMODE_PERMANENT), +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1531-net-mlx5e-make-pcie-congestion-event-thresholds-configurable.patch b/SOURCES/1531-net-mlx5e-make-pcie-congestion-event-thresholds-configurable.patch new file mode 100644 index 000000000..dc80a495d --- /dev/null +++ b/SOURCES/1531-net-mlx5e-make-pcie-congestion-event-thresholds-configurable.patch @@ -0,0 +1,359 @@ +From 21028531e0ca0857334d5e11cea7c7b27adcd477 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:57 -0400 +Subject: [PATCH] net/mlx5e: Make PCIe congestion event thresholds configurable + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f4053490a6f651476124903dbe0777e3c24ac8cb +Author: Dragos Tatulea +Date: Sun Sep 7 12:39:35 2025 +0300 + + net/mlx5e: Make PCIe congestion event thresholds configurable + + Add devlink driverinit parameters for configuring the thresholds for + PCIe congestion events. These parameters are registered only when the + firmware supports this feature. + + Update the mlx5 devlink docs as well on these new params. + + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1757237976-531416-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst +index 07b1424cbfbb..60cc9fedf1ef 100644 +--- a/Documentation/networking/devlink/mlx5.rst ++++ b/Documentation/networking/devlink/mlx5.rst +@@ -146,6 +146,58 @@ parameters. + - u32 + - driverinit + - Control the size (in packets) of the hairpin queues. ++ * - ``pcie_cong_inbound_high`` ++ - u16 ++ - driverinit ++ - High threshold configuration for PCIe congestion events. The firmware ++ will send an event once device side inbound PCIe traffic went ++ above the configured high threshold for a long enough period (at least ++ 200ms). ++ ++ See pci_bw_inbound_high ethtool stat. ++ ++ Units are 0.01 %. Accepted values are in range [0, 10000]. ++ pcie_cong_inbound_low < pcie_cong_inbound_high. ++ Default value: 9000 (Corresponds to 90%). ++ * - ``pcie_cong_inbound_low`` ++ - u16 ++ - driverinit ++ - Low threshold configuration for PCIe congestion events. The firmware ++ will send an event once device side inbound PCIe traffic went ++ below the configured low threshold, only after having been previously in ++ a congested state. ++ ++ See pci_bw_inbound_low ethtool stat. ++ ++ Units are 0.01 %. Accepted values are in range [0, 10000]. ++ pcie_cong_inbound_low < pcie_cong_inbound_high. ++ Default value: 7500. ++ * - ``pcie_cong_outbound_high`` ++ - u16 ++ - driverinit ++ - High threshold configuration for PCIe congestion events. The firmware ++ will send an event once device side outbound PCIe traffic went ++ above the configured high threshold for a long enough period (at least ++ 200ms). ++ ++ See pci_bw_outbound_high ethtool stat. ++ ++ Units are 0.01 %. Accepted values are in range [0, 10000]. ++ pcie_cong_outbound_low < pcie_cong_outbound_high. ++ Default value: 9000 (Corresponds to 90%). ++ * - ``pcie_cong_outbound_low`` ++ - u16 ++ - driverinit ++ - Low threshold configuration for PCIe congestion events. The firmware ++ will send an event once device side outbound PCIe traffic went ++ below the configured low threshold, only after having been previously in ++ a congested state. ++ ++ See pci_bw_outbound_low ethtool stat. ++ ++ Units are 0.01 %. Accepted values are in range [0, 10000]. ++ pcie_cong_outbound_low < pcie_cong_outbound_high. ++ Default value: 7500. + + * - ``cqe_compress_type`` + - string +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index 0c0f7231cb2a..e900451643a3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -651,6 +651,105 @@ static void mlx5_devlink_eth_params_unregister(struct devlink *devlink) + ARRAY_SIZE(mlx5_devlink_eth_params)); + } + ++#define MLX5_PCIE_CONG_THRESH_MAX 10000 ++#define MLX5_PCIE_CONG_THRESH_DEF_LOW 7500 ++#define MLX5_PCIE_CONG_THRESH_DEF_HIGH 9000 ++ ++static int ++mlx5_devlink_pcie_cong_thresh_validate(struct devlink *devl, u32 id, ++ union devlink_param_value val, ++ struct netlink_ext_ack *extack) ++{ ++ if (val.vu16 > MLX5_PCIE_CONG_THRESH_MAX) { ++ NL_SET_ERR_MSG_FMT_MOD(extack, "Value %u > max supported (%u)", ++ val.vu16, MLX5_PCIE_CONG_THRESH_MAX); ++ ++ return -EINVAL; ++ } ++ ++ switch (id) { ++ case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW: ++ case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH: ++ case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW: ++ case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH: ++ break; ++ default: ++ return -EOPNOTSUPP; ++ } ++ ++ return 0; ++} ++ ++static void mlx5_devlink_pcie_cong_init_values(struct devlink *devlink) ++{ ++ union devlink_param_value value; ++ u32 id; ++ ++ value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_LOW; ++ id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW; ++ devl_param_driverinit_value_set(devlink, id, value); ++ ++ value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_HIGH; ++ id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH; ++ devl_param_driverinit_value_set(devlink, id, value); ++ ++ value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_LOW; ++ id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW; ++ devl_param_driverinit_value_set(devlink, id, value); ++ ++ value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_HIGH; ++ id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH; ++ devl_param_driverinit_value_set(devlink, id, value); ++} ++ ++static const struct devlink_param mlx5_devlink_pcie_cong_params[] = { ++ DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW, ++ "pcie_cong_inbound_low", DEVLINK_PARAM_TYPE_U16, ++ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, ++ mlx5_devlink_pcie_cong_thresh_validate), ++ DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH, ++ "pcie_cong_inbound_high", DEVLINK_PARAM_TYPE_U16, ++ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, ++ mlx5_devlink_pcie_cong_thresh_validate), ++ DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW, ++ "pcie_cong_outbound_low", DEVLINK_PARAM_TYPE_U16, ++ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, ++ mlx5_devlink_pcie_cong_thresh_validate), ++ DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH, ++ "pcie_cong_outbound_high", DEVLINK_PARAM_TYPE_U16, ++ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, ++ mlx5_devlink_pcie_cong_thresh_validate), ++}; ++ ++static int mlx5_devlink_pcie_cong_params_register(struct devlink *devlink) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ int err; ++ ++ if (!mlx5_pcie_cong_event_supported(dev)) ++ return 0; ++ ++ err = devl_params_register(devlink, mlx5_devlink_pcie_cong_params, ++ ARRAY_SIZE(mlx5_devlink_pcie_cong_params)); ++ if (err) ++ return err; ++ ++ mlx5_devlink_pcie_cong_init_values(devlink); ++ ++ return 0; ++} ++ ++static void mlx5_devlink_pcie_cong_params_unregister(struct devlink *devlink) ++{ ++ struct mlx5_core_dev *dev = devlink_priv(devlink); ++ ++ if (!mlx5_pcie_cong_event_supported(dev)) ++ return; ++ ++ devl_params_unregister(devlink, mlx5_devlink_pcie_cong_params, ++ ARRAY_SIZE(mlx5_devlink_pcie_cong_params)); ++} ++ + static int mlx5_devlink_enable_rdma_validate(struct devlink *devlink, u32 id, + union devlink_param_value val, + struct netlink_ext_ack *extack) +@@ -896,6 +995,10 @@ int mlx5_devlink_params_register(struct devlink *devlink) + if (err) + goto max_uc_list_err; + ++ err = mlx5_devlink_pcie_cong_params_register(devlink); ++ if (err) ++ goto pcie_cong_err; ++ + err = mlx5_nv_param_register_dl_params(devlink); + if (err) + goto nv_param_err; +@@ -903,6 +1006,8 @@ int mlx5_devlink_params_register(struct devlink *devlink) + return 0; + + nv_param_err: ++ mlx5_devlink_pcie_cong_params_unregister(devlink); ++pcie_cong_err: + mlx5_devlink_max_uc_list_params_unregister(devlink); + max_uc_list_err: + mlx5_devlink_auxdev_params_unregister(devlink); +@@ -915,6 +1020,7 @@ int mlx5_devlink_params_register(struct devlink *devlink) + void mlx5_devlink_params_unregister(struct devlink *devlink) + { + mlx5_nv_param_unregister_dl_params(devlink); ++ mlx5_devlink_pcie_cong_params_unregister(devlink); + mlx5_devlink_max_uc_list_params_unregister(devlink); + mlx5_devlink_auxdev_params_unregister(devlink); + devl_params_unregister(devlink, mlx5_devlink_params, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h +index 74bcdfa70361..c9555119a661 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h +@@ -22,6 +22,10 @@ enum mlx5_devlink_param_id { + MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT, + MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, + MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE, ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW, ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH, ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW, ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH, + MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE + }; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +index 0ed017569a19..0cf142f71c09 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +@@ -1,6 +1,7 @@ + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB + // Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. + ++#include "../devlink.h" + #include "en.h" + #include "pcie_cong_event.h" + +@@ -41,13 +42,6 @@ struct mlx5e_pcie_cong_event { + struct mlx5e_pcie_cong_stats stats; + }; + +-/* In units of 0.01 % */ +-static const struct mlx5e_pcie_cong_thresh default_thresh_config = { +- .inbound_high = 9000, +- .inbound_low = 7500, +- .outbound_high = 9000, +- .outbound_low = 7500, +-}; + + static const struct counter_desc mlx5e_pcie_cong_stats_desc[] = { + { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, +@@ -249,8 +243,60 @@ static int mlx5e_pcie_cong_event_handler(struct notifier_block *nb, + return NOTIFY_OK; + } + ++static int ++mlx5e_pcie_cong_get_thresh_config(struct mlx5_core_dev *dev, ++ struct mlx5e_pcie_cong_thresh *config) ++{ ++ u32 ids[4] = { ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW, ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH, ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW, ++ MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH, ++ }; ++ struct devlink *devlink = priv_to_devlink(dev); ++ union devlink_param_value val[4]; ++ ++ for (int i = 0; i < 4; i++) { ++ u32 id = ids[i]; ++ int err; ++ ++ err = devl_param_driverinit_value_get(devlink, id, &val[i]); ++ if (err) ++ return err; ++ } ++ ++ config->inbound_low = val[0].vu16; ++ config->inbound_high = val[1].vu16; ++ config->outbound_low = val[2].vu16; ++ config->outbound_high = val[3].vu16; ++ ++ return 0; ++} ++ ++static int ++mlx5e_thresh_config_validate(struct mlx5_core_dev *mdev, ++ const struct mlx5e_pcie_cong_thresh *config) ++{ ++ int err = 0; ++ ++ if (config->inbound_low >= config->inbound_high) { ++ err = -EINVAL; ++ mlx5_core_err(mdev, "PCIe inbound congestion threshold configuration invalid: low (%u) >= high (%u).\n", ++ config->inbound_low, config->inbound_high); ++ } ++ ++ if (config->outbound_low >= config->outbound_high) { ++ err = -EINVAL; ++ mlx5_core_err(mdev, "PCIe outbound congestion threshold configuration invalid: low (%u) >= high (%u).\n", ++ config->outbound_low, config->outbound_high); ++ } ++ ++ return err; ++} ++ + int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) + { ++ struct mlx5e_pcie_cong_thresh thresh_config = {}; + struct mlx5e_pcie_cong_event *cong_event; + struct mlx5_core_dev *mdev = priv->mdev; + int err; +@@ -258,6 +304,16 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) + if (!mlx5_pcie_cong_event_supported(mdev)) + return 0; + ++ err = mlx5e_pcie_cong_get_thresh_config(mdev, &thresh_config); ++ if (WARN_ON(err)) ++ return err; ++ ++ err = mlx5e_thresh_config_validate(mdev, &thresh_config); ++ if (err) { ++ mlx5_core_err(mdev, "PCIe congestion event feature disabled\n"); ++ return err; ++ } ++ + cong_event = kvzalloc_node(sizeof(*cong_event), GFP_KERNEL, + mdev->priv.numa_node); + if (!cong_event) +@@ -269,7 +325,7 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) + + cong_event->priv = priv; + +- err = mlx5_cmd_pcie_cong_event_set(mdev, &default_thresh_config, ++ err = mlx5_cmd_pcie_cong_event_set(mdev, &thresh_config, + &cong_event->obj_id); + if (err) { + mlx5_core_warn(mdev, "Error creating a PCIe congestion event object\n"); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1532-net-mlx5e-add-stale-counter-for-pcie-congestion-events.patch b/SOURCES/1532-net-mlx5e-add-stale-counter-for-pcie-congestion-events.patch new file mode 100644 index 000000000..7f68c244e --- /dev/null +++ b/SOURCES/1532-net-mlx5e-add-stale-counter-for-pcie-congestion-events.patch @@ -0,0 +1,89 @@ +From 3b6651757c73fb5891d4e1eb65f6e8ea0cb6188c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5e: Add stale counter for PCIe congestion events + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cdc492746e3f6d73a9e6a6a9962c9f1f7b7961b5 +Author: Dragos Tatulea +Date: Sun Sep 7 12:39:36 2025 +0300 + + net/mlx5e: Add stale counter for PCIe congestion events + + This ethtool counter is meant to help with observing how many times the + congestion event was triggered but on query there was no state change. + + This would help to indicate when a work item was scheduled to run too + late and in the meantime the congestion state changed back to previous + state. + + While at it, do a driveby typo fix in documentation for + pci_bw_inbound_high. + + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1757237976-531416-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +index 754c81436408..cc498895f92e 100644 +--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst ++++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +@@ -1348,7 +1348,7 @@ Device Counters + is in a congested state. + If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested. + If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested. +- - Tnformative ++ - Informative + + * - `pci_bw_inbound_low` + - The number of times the device crossed the low inbound PCIe bandwidth +@@ -1373,3 +1373,8 @@ Device Counters + If pci_bw_outbound_high == pci_bw_outbound_low then the device is not congested. + If pci_bw_outbound_high > pci_bw_outbound_low then the device is congested. + - Informative ++ ++ * - `pci_bw_stale_event` ++ - The number of times the device fired a PCIe congestion event but on query ++ there was no change in state. ++ - Informative +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +index 0cf142f71c09..2eb666a46f39 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c +@@ -24,6 +24,7 @@ struct mlx5e_pcie_cong_stats { + u32 pci_bw_inbound_low; + u32 pci_bw_outbound_high; + u32 pci_bw_outbound_low; ++ u32 pci_bw_stale_event; + }; + + struct mlx5e_pcie_cong_event { +@@ -52,6 +53,8 @@ static const struct counter_desc mlx5e_pcie_cong_stats_desc[] = { + pci_bw_outbound_high) }, + { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, + pci_bw_outbound_low) }, ++ { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, ++ pci_bw_stale_event) }, + }; + + #define NUM_PCIE_CONG_COUNTERS ARRAY_SIZE(mlx5e_pcie_cong_stats_desc) +@@ -212,8 +215,10 @@ static void mlx5e_pcie_cong_event_work(struct work_struct *work) + } + + changes = cong_event->state ^ new_cong_state; +- if (!changes) ++ if (!changes) { ++ cong_event->stats.pci_bw_stale_event++; + return; ++ } + + cong_event->state = new_cong_state; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1533-net-mlx5-fix-typo-in-pci-irq-c-comment.patch b/SOURCES/1533-net-mlx5-fix-typo-in-pci-irq-c-comment.patch new file mode 100644 index 000000000..31d44ecc4 --- /dev/null +++ b/SOURCES/1533-net-mlx5-fix-typo-in-pci-irq-c-comment.patch @@ -0,0 +1,42 @@ +From db599277be7ea60432782677ca6ef98c7d73b692 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5: fix typo in pci_irq.c comment +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c5e389cc6b36701098d31fa3438c553c7fe7c1bb +Author: Alok Tiwari +Date: Fri Sep 12 06:50:44 2025 -0700 + + net/mlx5: fix typo in pci_irq.c comment + + Fix a typo in a comment in pci_irq.c: + "ssigned" → "assigned" + + Signed-off-by: Alok Tiwari + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20250912135050.3921116-1-alok.a.tiwari@oracle.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +index 692ef9c2f729..e18a850c615c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +@@ -54,7 +54,7 @@ static int mlx5_core_func_to_vport(const struct mlx5_core_dev *dev, + + /** + * mlx5_get_default_msix_vec_count - Get the default number of MSI-X vectors +- * to be ssigned to each VF. ++ * to be assigned to each VF. + * @dev: PF to work on + * @num_vfs: Number of enabled VFs + */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1534-net-mlx5-refactor-devcom-to-use-match-attributes.patch b/SOURCES/1534-net-mlx5-refactor-devcom-to-use-match-attributes.patch new file mode 100644 index 000000000..88500c31e --- /dev/null +++ b/SOURCES/1534-net-mlx5-refactor-devcom-to-use-match-attributes.patch @@ -0,0 +1,330 @@ +From d09bbd0ecb3ccf4257ae9b07e3eb0819e1867c71 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5: Refactor devcom to use match attributes + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f05a82fbcc645dceeed242d80bccb9dad2ca3383 +Author: Shay Drory +Date: Mon Sep 15 15:41:07 2025 +0300 + + net/mlx5: Refactor devcom to use match attributes + + Refactor the devcom interface to use a match attribute structure instead + of passing raw keys. This change lays the groundwork for extending + devcom matching logic with additional fields like net namespace, + improving its flexibility and robustness. + + No functional changes. + + Signed-off-by: Shay Drory + Reviewed-by: Mark Bloch + Reviewed-by: Simon Horman + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757940070-618661-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 4cc80cda8a09..b09291decca5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -232,9 +232,13 @@ static int mlx5e_devcom_event_mpv(int event, void *my_data, void *event_data) + + static int mlx5e_devcom_init_mpv(struct mlx5e_priv *priv, u64 *data) + { ++ struct mlx5_devcom_match_attr attr = { ++ .key.val = *data, ++ }; ++ + priv->devcom = mlx5_devcom_register_component(priv->mdev->priv.devc, + MLX5_DEVCOM_MPV, +- *data, ++ &attr, + mlx5e_devcom_event_mpv, + priv); + if (IS_ERR(priv->devcom)) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index fef418e1ed1a..ffa5749df587 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -5387,12 +5387,13 @@ void mlx5e_tc_ht_cleanup(struct rhashtable *tc_ht) + int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv) + { + const size_t sz_enc_opts = sizeof(struct tunnel_match_enc_opts); ++ struct mlx5_devcom_match_attr attr = {}; + struct netdev_phys_item_id ppid; + struct mlx5e_rep_priv *rpriv; + struct mapping_ctx *mapping; + struct mlx5_eswitch *esw; + struct mlx5e_priv *priv; +- u64 mapping_id, key; ++ u64 mapping_id; + int err = 0; + + rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv); +@@ -5448,8 +5449,8 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv) + + err = dev_get_port_parent_id(priv->netdev, &ppid, false); + if (!err) { +- memcpy(&key, &ppid.id, sizeof(key)); +- mlx5_esw_offloads_devcom_init(esw, key); ++ memcpy(&attr.key.val, &ppid.id, sizeof(attr.key.val)); ++ mlx5_esw_offloads_devcom_init(esw, &attr); + } + + return 0; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 4fe285ce32aa..df3756d7e52e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -433,7 +433,8 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs); + void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf); + void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw); + void mlx5_eswitch_disable(struct mlx5_eswitch *esw); +-void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, u64 key); ++void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, ++ const struct mlx5_devcom_match_attr *attr); + void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw); + bool mlx5_esw_offloads_devcom_is_ready(struct mlx5_eswitch *esw); + int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw, +@@ -928,7 +929,9 @@ static inline void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) {} + static inline int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) { return 0; } + static inline void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf) {} + static inline void mlx5_eswitch_disable(struct mlx5_eswitch *esw) {} +-static inline void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, u64 key) {} ++static inline void ++mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, ++ const struct mlx5_devcom_match_attr *attr) {} + static inline void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw) {} + static inline bool mlx5_esw_offloads_devcom_is_ready(struct mlx5_eswitch *esw) { return false; } + static inline bool mlx5_eswitch_is_funcs_handler(struct mlx5_core_dev *dev) { return false; } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index d57f86d297ab..bc9838dc5bf8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -3104,7 +3104,8 @@ static int mlx5_esw_offloads_devcom_event(int event, + return err; + } + +-void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, u64 key) ++void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, ++ const struct mlx5_devcom_match_attr *attr) + { + int i; + +@@ -3123,7 +3124,7 @@ void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, u64 key) + esw->num_peers = 0; + esw->devcom = mlx5_devcom_register_component(esw->dev->priv.devc, + MLX5_DEVCOM_ESW_OFFLOADS, +- key, ++ attr, + mlx5_esw_offloads_devcom_event, + esw); + if (IS_ERR(esw->devcom)) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +index 7ad3baca99de..8f2ad45bec9f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +@@ -1435,14 +1435,20 @@ static int mlx5_clock_alloc(struct mlx5_core_dev *mdev, bool shared) + static void mlx5_shared_clock_register(struct mlx5_core_dev *mdev, u64 key) + { + struct mlx5_core_dev *peer_dev, *next = NULL; ++ struct mlx5_devcom_match_attr attr = { ++ .key.val = key, ++ }; ++ struct mlx5_devcom_comp_dev *compd; + struct mlx5_devcom_comp_dev *pos; + +- mdev->clock_state->compdev = mlx5_devcom_register_component(mdev->priv.devc, +- MLX5_DEVCOM_SHARED_CLOCK, +- key, NULL, mdev); +- if (IS_ERR(mdev->clock_state->compdev)) ++ compd = mlx5_devcom_register_component(mdev->priv.devc, ++ MLX5_DEVCOM_SHARED_CLOCK, ++ &attr, NULL, mdev); ++ if (IS_ERR(compd)) + return; + ++ mdev->clock_state->compdev = compd; ++ + mlx5_devcom_comp_lock(mdev->clock_state->compdev); + mlx5_devcom_for_each_peer_entry(mdev->clock_state->compdev, peer_dev, pos) { + if (peer_dev->clock) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +index 7b0766c89f4c..1ab9de316deb 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +@@ -22,11 +22,15 @@ struct mlx5_devcom_dev { + struct kref ref; + }; + ++struct mlx5_devcom_key { ++ union mlx5_devcom_match_key key; ++}; ++ + struct mlx5_devcom_comp { + struct list_head comp_list; + enum mlx5_devcom_component id; +- u64 key; + struct list_head comp_dev_list_head; ++ struct mlx5_devcom_key key; + mlx5_devcom_event_handler_t handler; + struct kref ref; + bool ready; +@@ -108,7 +112,8 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom_dev *devc) + } + + static struct mlx5_devcom_comp * +-mlx5_devcom_comp_alloc(u64 id, u64 key, mlx5_devcom_event_handler_t handler) ++mlx5_devcom_comp_alloc(u64 id, const struct mlx5_devcom_match_attr *attr, ++ mlx5_devcom_event_handler_t handler) + { + struct mlx5_devcom_comp *comp; + +@@ -117,7 +122,7 @@ mlx5_devcom_comp_alloc(u64 id, u64 key, mlx5_devcom_event_handler_t handler) + return ERR_PTR(-ENOMEM); + + comp->id = id; +- comp->key = key; ++ comp->key.key = attr->key; + comp->handler = handler; + init_rwsem(&comp->sem); + lockdep_register_key(&comp->lock_key); +@@ -180,21 +185,27 @@ devcom_free_comp_dev(struct mlx5_devcom_comp_dev *devcom) + static bool + devcom_component_equal(struct mlx5_devcom_comp *devcom, + enum mlx5_devcom_component id, +- u64 key) ++ const struct mlx5_devcom_match_attr *attr) + { +- return devcom->id == id && devcom->key == key; ++ if (devcom->id != id) ++ return false; ++ ++ if (memcmp(&devcom->key.key, &attr->key, sizeof(devcom->key.key))) ++ return false; ++ ++ return true; + } + + static struct mlx5_devcom_comp * + devcom_component_get(struct mlx5_devcom_dev *devc, + enum mlx5_devcom_component id, +- u64 key, ++ const struct mlx5_devcom_match_attr *attr, + mlx5_devcom_event_handler_t handler) + { + struct mlx5_devcom_comp *comp; + + devcom_for_each_component(comp) { +- if (devcom_component_equal(comp, id, key)) { ++ if (devcom_component_equal(comp, id, attr)) { + if (handler == comp->handler) { + kref_get(&comp->ref); + return comp; +@@ -212,7 +223,7 @@ devcom_component_get(struct mlx5_devcom_dev *devc, + struct mlx5_devcom_comp_dev * + mlx5_devcom_register_component(struct mlx5_devcom_dev *devc, + enum mlx5_devcom_component id, +- u64 key, ++ const struct mlx5_devcom_match_attr *attr, + mlx5_devcom_event_handler_t handler, + void *data) + { +@@ -223,14 +234,14 @@ mlx5_devcom_register_component(struct mlx5_devcom_dev *devc, + return ERR_PTR(-EINVAL); + + mutex_lock(&comp_list_lock); +- comp = devcom_component_get(devc, id, key, handler); ++ comp = devcom_component_get(devc, id, attr, handler); + if (IS_ERR(comp)) { + devcom = ERR_PTR(-EINVAL); + goto out_unlock; + } + + if (!comp) { +- comp = mlx5_devcom_comp_alloc(id, key, handler); ++ comp = mlx5_devcom_comp_alloc(id, attr, handler); + if (IS_ERR(comp)) { + devcom = ERR_CAST(comp); + goto out_unlock; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +index c79699b94a02..f350d2395707 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +@@ -6,6 +6,14 @@ + + #include + ++union mlx5_devcom_match_key { ++ u64 val; ++}; ++ ++struct mlx5_devcom_match_attr { ++ union mlx5_devcom_match_key key; ++}; ++ + enum mlx5_devcom_component { + MLX5_DEVCOM_ESW_OFFLOADS, + MLX5_DEVCOM_MPV, +@@ -25,7 +33,7 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom_dev *devc); + struct mlx5_devcom_comp_dev * + mlx5_devcom_register_component(struct mlx5_devcom_dev *devc, + enum mlx5_devcom_component id, +- u64 key, ++ const struct mlx5_devcom_match_attr *attr, + mlx5_devcom_event_handler_t handler, + void *data); + void mlx5_devcom_unregister_component(struct mlx5_devcom_comp_dev *devcom); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c +index eeb0b7ea05f1..d4015328ba65 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c +@@ -210,13 +210,15 @@ static void sd_cleanup(struct mlx5_core_dev *dev) + static int sd_register(struct mlx5_core_dev *dev) + { + struct mlx5_devcom_comp_dev *devcom, *pos; ++ struct mlx5_devcom_match_attr attr = {}; + struct mlx5_core_dev *peer, *primary; + struct mlx5_sd *sd, *primary_sd; + int err, i; + + sd = mlx5_get_sd(dev); ++ attr.key.val = sd->group_id; + devcom = mlx5_devcom_register_component(dev->priv.devc, MLX5_DEVCOM_SD_GROUP, +- sd->group_id, NULL, dev); ++ &attr, NULL, dev); + if (IS_ERR(devcom)) + return PTR_ERR(devcom); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 6175aa0bbbb7..d741ec582c2c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -975,6 +975,10 @@ static void mlx5_pci_close(struct mlx5_core_dev *dev) + + static void mlx5_register_hca_devcom_comp(struct mlx5_core_dev *dev) + { ++ struct mlx5_devcom_match_attr attr = { ++ .key.val = mlx5_query_nic_system_image_guid(dev), ++ }; ++ + /* This component is use to sync adding core_dev to lag_dev and to sync + * changes of mlx5_adev_devices between LAG layer and other layers. + */ +@@ -983,8 +987,7 @@ static void mlx5_register_hca_devcom_comp(struct mlx5_core_dev *dev) + + dev->priv.hca_devcom_comp = + mlx5_devcom_register_component(dev->priv.devc, MLX5_DEVCOM_HCA_PORTS, +- mlx5_query_nic_system_image_guid(dev), +- NULL, dev); ++ &attr, NULL, dev); + if (IS_ERR(dev->priv.hca_devcom_comp)) + mlx5_core_err(dev, "Failed to register devcom HCA component\n"); + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1535-net-mlx5-lag-move-devcom-registration-to-lag-layer.patch b/SOURCES/1535-net-mlx5-lag-move-devcom-registration-to-lag-layer.patch new file mode 100644 index 000000000..ee0323272 --- /dev/null +++ b/SOURCES/1535-net-mlx5-lag-move-devcom-registration-to-lag-layer.patch @@ -0,0 +1,145 @@ +From 105bbeb8900edbce45170c99c9b9456925b2ea85 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5: Lag, move devcom registration to LAG layer + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5a977b5833b7a261bfa6094595ffa73c1071588c +Author: Shay Drory +Date: Mon Sep 15 15:41:08 2025 +0300 + + net/mlx5: Lag, move devcom registration to LAG layer + + Move the devcom registration for the HCA_PORTS component from the core + initialization path into the LAG logic. This better reflects the logical + ownership of this component and ensures proper alignment with the LAG + lifecycle. + + Signed-off-by: Shay Drory + Reviewed-by: Mark Bloch + Reviewed-by: Simon Horman + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757940070-618661-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +index d058cbb4a00c..ccb22ed13f84 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +@@ -1404,6 +1404,34 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev) + return 0; + } + ++static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev) ++{ ++ mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp); ++} ++ ++static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_devcom_match_attr attr = { ++ .key.val = mlx5_query_nic_system_image_guid(dev), ++ }; ++ ++ /* This component is use to sync adding core_dev to lag_dev and to sync ++ * changes of mlx5_adev_devices between LAG layer and other layers. ++ */ ++ dev->priv.hca_devcom_comp = ++ mlx5_devcom_register_component(dev->priv.devc, ++ MLX5_DEVCOM_HCA_PORTS, ++ &attr, NULL, dev); ++ if (IS_ERR(dev->priv.hca_devcom_comp)) { ++ mlx5_core_err(dev, ++ "Failed to register devcom HCA component, err: %ld\n", ++ PTR_ERR(dev->priv.hca_devcom_comp)); ++ return PTR_ERR(dev->priv.hca_devcom_comp); ++ } ++ ++ return 0; ++} ++ + void mlx5_lag_remove_mdev(struct mlx5_core_dev *dev) + { + struct mlx5_lag *ldev; +@@ -1425,6 +1453,7 @@ void mlx5_lag_remove_mdev(struct mlx5_core_dev *dev) + } + mlx5_ldev_remove_mdev(ldev, dev); + mutex_unlock(&ldev->lock); ++ mlx5_lag_unregister_hca_devcom_comp(dev); + mlx5_ldev_put(ldev); + } + +@@ -1435,7 +1464,7 @@ void mlx5_lag_add_mdev(struct mlx5_core_dev *dev) + if (!mlx5_lag_is_supported(dev)) + return; + +- if (IS_ERR_OR_NULL(dev->priv.hca_devcom_comp)) ++ if (mlx5_lag_register_hca_devcom_comp(dev)) + return; + + recheck: +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index d741ec582c2c..00fe79878c4f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -973,30 +973,6 @@ static void mlx5_pci_close(struct mlx5_core_dev *dev) + mlx5_pci_disable_device(dev); + } + +-static void mlx5_register_hca_devcom_comp(struct mlx5_core_dev *dev) +-{ +- struct mlx5_devcom_match_attr attr = { +- .key.val = mlx5_query_nic_system_image_guid(dev), +- }; +- +- /* This component is use to sync adding core_dev to lag_dev and to sync +- * changes of mlx5_adev_devices between LAG layer and other layers. +- */ +- if (!mlx5_lag_is_supported(dev)) +- return; +- +- dev->priv.hca_devcom_comp = +- mlx5_devcom_register_component(dev->priv.devc, MLX5_DEVCOM_HCA_PORTS, +- &attr, NULL, dev); +- if (IS_ERR(dev->priv.hca_devcom_comp)) +- mlx5_core_err(dev, "Failed to register devcom HCA component\n"); +-} +- +-static void mlx5_unregister_hca_devcom_comp(struct mlx5_core_dev *dev) +-{ +- mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp); +-} +- + static int mlx5_init_once(struct mlx5_core_dev *dev) + { + int err; +@@ -1005,7 +981,6 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + if (IS_ERR(dev->priv.devc)) + mlx5_core_warn(dev, "failed to register devcom device %ld\n", + PTR_ERR(dev->priv.devc)); +- mlx5_register_hca_devcom_comp(dev); + + err = mlx5_query_board_id(dev); + if (err) { +@@ -1143,7 +1118,6 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + err_irq_cleanup: + mlx5_irq_table_cleanup(dev); + err_devcom: +- mlx5_unregister_hca_devcom_comp(dev); + mlx5_devcom_unregister_device(dev->priv.devc); + + return err; +@@ -1174,7 +1148,6 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev) + mlx5_events_cleanup(dev); + mlx5_eq_table_cleanup(dev); + mlx5_irq_table_cleanup(dev); +- mlx5_unregister_hca_devcom_comp(dev); + mlx5_devcom_unregister_device(dev->priv.devc); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1536-net-mlx5-add-net-namespace-support-to-devcom.patch b/SOURCES/1536-net-mlx5-add-net-namespace-support-to-devcom.patch new file mode 100644 index 000000000..50b517e53 --- /dev/null +++ b/SOURCES/1536-net-mlx5-add-net-namespace-support-to-devcom.patch @@ -0,0 +1,149 @@ +From 18400abe1a3e9946faafbe52a16ac08dfbd4a63c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5: Add net namespace support to devcom + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 95f73447c269e196dc149bece65b2d83cdb42b08 +Author: Shay Drory +Date: Mon Sep 15 15:41:09 2025 +0300 + + net/mlx5: Add net namespace support to devcom + + Extend the devcom framework to support namespace-aware components. + + The existing devcom matching logic was based solely on numeric keys, + limiting its use to the global (init_net) scope or requiring clients to + ignore namespaces altogether, both of which are incorrect in + multi-namespace environments. + + This patch introduces namespace support by allowing devcom clients to + provide a namespace match attribute. The devcom pairing mechanism is + updated to compare the namespace, enabling proper isolation and + interaction of components across different net namespaces. + + With this change, components that require namespace aware pairing, such + as SD groups or LAG, can now work correctly in multi-namespace + scenarios. In particular, this opens the way to support hardware LAG + within a net namespace. + + Signed-off-by: Shay Drory + Reviewed-by: Mark Bloch + Reviewed-by: Parav Pandit + Reviewed-by: Simon Horman + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757940070-618661-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index ffa5749df587..1ddefeeeca01 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -66,6 +66,7 @@ + #include "lib/devcom.h" + #include "lib/geneve.h" + #include "lib/fs_chains.h" ++#include "lib/mlx5.h" + #include "diag/en_tc_tracepoint.h" + #include + #include "lag/lag.h" +@@ -5450,6 +5451,8 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv) + err = dev_get_port_parent_id(priv->netdev, &ppid, false); + if (!err) { + memcpy(&attr.key.val, &ppid.id, sizeof(attr.key.val)); ++ attr.flags = MLX5_DEVCOM_MATCH_FLAGS_NS; ++ attr.net = mlx5_core_net(esw->dev); + mlx5_esw_offloads_devcom_init(esw, &attr); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +index 1ab9de316deb..faa2833602c8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +@@ -4,6 +4,7 @@ + #include + #include + #include "lib/devcom.h" ++#include "lib/mlx5.h" + #include "mlx5_core.h" + + static LIST_HEAD(devcom_dev_list); +@@ -23,7 +24,9 @@ struct mlx5_devcom_dev { + }; + + struct mlx5_devcom_key { ++ u32 flags; + union mlx5_devcom_match_key key; ++ possible_net_t net; + }; + + struct mlx5_devcom_comp { +@@ -123,6 +126,9 @@ mlx5_devcom_comp_alloc(u64 id, const struct mlx5_devcom_match_attr *attr, + + comp->id = id; + comp->key.key = attr->key; ++ comp->key.flags = attr->flags; ++ if (attr->flags & MLX5_DEVCOM_MATCH_FLAGS_NS) ++ write_pnet(&comp->key.net, attr->net); + comp->handler = handler; + init_rwsem(&comp->sem); + lockdep_register_key(&comp->lock_key); +@@ -190,9 +196,16 @@ devcom_component_equal(struct mlx5_devcom_comp *devcom, + if (devcom->id != id) + return false; + ++ if (devcom->key.flags != attr->flags) ++ return false; ++ + if (memcmp(&devcom->key.key, &attr->key, sizeof(devcom->key.key))) + return false; + ++ if (devcom->key.flags & MLX5_DEVCOM_MATCH_FLAGS_NS && ++ !net_eq(read_pnet(&devcom->key.net), attr->net)) ++ return false; ++ + return true; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +index f350d2395707..609c85f47917 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +@@ -6,12 +6,18 @@ + + #include + ++enum mlx5_devom_match_flags { ++ MLX5_DEVCOM_MATCH_FLAGS_NS = BIT(0), ++}; ++ + union mlx5_devcom_match_key { + u64 val; + }; + + struct mlx5_devcom_match_attr { ++ u32 flags; + union mlx5_devcom_match_key key; ++ struct net *net; + }; + + enum mlx5_devcom_component { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c +index d4015328ba65..f5c2701f6e87 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c +@@ -217,6 +217,8 @@ static int sd_register(struct mlx5_core_dev *dev) + + sd = mlx5_get_sd(dev); + attr.key.val = sd->group_id; ++ attr.flags = MLX5_DEVCOM_MATCH_FLAGS_NS; ++ attr.net = mlx5_core_net(dev); + devcom = mlx5_devcom_register_component(dev->priv.devc, MLX5_DEVCOM_SD_GROUP, + &attr, NULL, dev); + if (IS_ERR(devcom)) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1537-net-mlx5-lag-add-net-namespace-support.patch b/SOURCES/1537-net-mlx5-lag-add-net-namespace-support.patch new file mode 100644 index 000000000..4eb22caf8 --- /dev/null +++ b/SOURCES/1537-net-mlx5-lag-add-net-namespace-support.patch @@ -0,0 +1,131 @@ +From 897f77f80756b9168173416e0e6b1dbc0308a4f2 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5: Lag, add net namespace support +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d654d3fc2066c40586fa3b0538c0bf093e20b817 +Author: Shay Drory +Date: Mon Sep 15 15:41:10 2025 +0300 + + net/mlx5: Lag, add net namespace support + + Update the LAG implementation to support net namespace isolation. + Recent devcom changes added namespace-aware client matching. Align LAG + with this model so that hardware LAG forms only between mlx5 interfaces + that share the same network namespace. This avoids cross-namespace + interference and matches user expectations when devices are placed in + different netns. + + Make LAG netns-aware by storing the device’s namespace in mlx5_lag and + registering the devcom client with that namespace. As a result, only + peers in the same netns are eligible to form a LAG. + Adjust reload handling so LAG teardown/re-evaluation happens in the + correct namespace context. Remove the blanket restriction that prevented + devlink reload when LAG was active. Remove the reload restriction here + allowing devlink reload in LAG mode is part of delivering complete netns + aware LAG support: + + With per-netns devcom registration, reload no longer risks + cross-namespace coupling. The devcom client is torn down and + re-registered in the device’s current netns, and LAG is re-evaluated + within that scope. The change is trivial and self-contained, and keeping + it in this patch avoids splitting a feature that is functionally one + unit. + + Only devices in same netns can form hardware LAG. + devlink reload no longer fails just because LAG is active. + LAG is torn down/re-created as needed within the correct namespace. + No change for setups that don’t use namespaces. + + Signed-off-by: Shay Drory + Reviewed-by: Mark Bloch + Reviewed-by: Parav Pandit + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757940070-618661-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index e900451643a3..18ef8404e5e6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -204,11 +204,6 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change, + return 0; + } + +- if (mlx5_lag_is_active(dev)) { +- NL_SET_ERR_MSG_MOD(extack, "reload is unsupported in Lag mode"); +- return -EOPNOTSUPP; +- } +- + if (mlx5_core_is_mp_slave(dev)) { + NL_SET_ERR_MSG_MOD(extack, "reload is unsupported for multi port slave"); + return -EOPNOTSUPP; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +index ccb22ed13f84..59c00c911275 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +@@ -35,6 +35,7 @@ + #include + #include + #include ++#include "lib/mlx5.h" + #include "lib/devcom.h" + #include "mlx5_core.h" + #include "eswitch.h" +@@ -231,9 +232,13 @@ static void mlx5_do_bond_work(struct work_struct *work); + static void mlx5_ldev_free(struct kref *ref) + { + struct mlx5_lag *ldev = container_of(ref, struct mlx5_lag, ref); ++ struct net *net; ++ ++ if (ldev->nb.notifier_call) { ++ net = read_pnet(&ldev->net); ++ unregister_netdevice_notifier_net(net, &ldev->nb); ++ } + +- if (ldev->nb.notifier_call) +- unregister_netdevice_notifier_net(&init_net, &ldev->nb); + mlx5_lag_mp_cleanup(ldev); + cancel_delayed_work_sync(&ldev->bond_work); + destroy_workqueue(ldev->wq); +@@ -271,7 +276,8 @@ static struct mlx5_lag *mlx5_lag_dev_alloc(struct mlx5_core_dev *dev) + INIT_DELAYED_WORK(&ldev->bond_work, mlx5_do_bond_work); + + ldev->nb.notifier_call = mlx5_lag_netdev_event; +- if (register_netdevice_notifier_net(&init_net, &ldev->nb)) { ++ write_pnet(&ldev->net, mlx5_core_net(dev)); ++ if (register_netdevice_notifier_net(read_pnet(&ldev->net), &ldev->nb)) { + ldev->nb.notifier_call = NULL; + mlx5_core_err(dev, "Failed to register LAG netdev notifier\n"); + } +@@ -1413,6 +1419,8 @@ static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev) + { + struct mlx5_devcom_match_attr attr = { + .key.val = mlx5_query_nic_system_image_guid(dev), ++ .flags = MLX5_DEVCOM_MATCH_FLAGS_NS, ++ .net = mlx5_core_net(dev), + }; + + /* This component is use to sync adding core_dev to lag_dev and to sync +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h +index c2f256bb2bc2..4918eee2b3da 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h +@@ -67,6 +67,7 @@ struct mlx5_lag { + struct workqueue_struct *wq; + struct delayed_work bond_work; + struct notifier_block nb; ++ possible_net_t net; + struct lag_mp lag_mp; + struct mlx5_lag_port_sel port_sel; + /* Protect lag fields/state changes */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1538-net-mlx5-remove-vlan-insertion-fields-from-wqe-ether-segment.patch b/SOURCES/1538-net-mlx5-remove-vlan-insertion-fields-from-wqe-ether-segment.patch new file mode 100644 index 000000000..28aaade42 --- /dev/null +++ b/SOURCES/1538-net-mlx5-remove-vlan-insertion-fields-from-wqe-ether-segment.patch @@ -0,0 +1,56 @@ +From e83fb12bba4582bf6f10d745f8abe0a64d49bab6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5: Remove VLAN insertion fields from WQE Ether segment + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit de2be98541dbe0de58d2dccf7fa19dfc9d9a8260 +Author: Carolina Jubran +Date: Thu Sep 11 10:10:17 2025 +0300 + + net/mlx5: Remove VLAN insertion fields from WQE Ether segment + + Now that the driver no longer uses VLAN TX insertion via the WQE + Ethernet segment, the related fields and flags can be removed. + + Signed-off-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757574619-604874-2-git-send-email-tariqt@nvidia.com + Reviewed-by: Simon Horman + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h +index fc7eeff99a8a..5546c7bd2c83 100644 +--- a/include/linux/mlx5/qp.h ++++ b/include/linux/mlx5/qp.h +@@ -237,13 +237,11 @@ enum { + }; + + enum { +- MLX5_ETH_WQE_SVLAN = 1 << 0, + MLX5_ETH_WQE_TRAILER_HDR_OUTER_IP_ASSOC = 1 << 26, + MLX5_ETH_WQE_TRAILER_HDR_OUTER_L4_ASSOC = 1 << 27, + MLX5_ETH_WQE_TRAILER_HDR_INNER_IP_ASSOC = 3 << 26, + MLX5_ETH_WQE_TRAILER_HDR_INNER_L4_ASSOC = 1 << 28, + MLX5_ETH_WQE_INSERT_TRAILER = 1 << 30, +- MLX5_ETH_WQE_INSERT_VLAN = 1 << 15, + }; + + enum { +@@ -275,10 +273,6 @@ struct mlx5_wqe_eth_seg { + DECLARE_FLEX_ARRAY(u8, data); + }; + } inline_hdr; +- struct { +- __be16 type; +- __be16 vlan_tci; +- } insert; + __be32 trailer; + }; + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1539-net-mlx5-refactor-macsec-wqe-metadata-shifts.patch b/SOURCES/1539-net-mlx5-refactor-macsec-wqe-metadata-shifts.patch new file mode 100644 index 000000000..f1f46b9e5 --- /dev/null +++ b/SOURCES/1539-net-mlx5-refactor-macsec-wqe-metadata-shifts.patch @@ -0,0 +1,152 @@ +From 7fa18c8f4a6bc955619d779d0d26bee794b04c3b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:58 -0400 +Subject: [PATCH] net/mlx5: Refactor MACsec WQE metadata shifts +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cce65f32443b61db2370a67d2e92d16b773fe8a4 +Author: Carolina Jubran +Date: Thu Sep 11 10:10:18 2025 +0300 + + net/mlx5: Refactor MACsec WQE metadata shifts + + Introduce MLX5_ETH_WQE_FT_META_SHIFT as a shared base offset for + features that use the lower 8 bits of the WQE flow_table_metadata + field, currently used for timestamping, IPsec, and MACsec. + + Define MLX5_ETH_WQE_FT_META_MACSEC_FS_ID_MASK so that fs_id occupies + bits 2–5, making it clear that fs_id occupies bits in the metadata. + + Set MLX5_ETH_WQE_FT_META_MACSEC_MASK as the OR of the MACsec flag and + MLX5_ETH_WQE_FT_META_MACSEC_FS_ID_MASK, corresponding to the original + 0x3E mask. + + Update the fs_id macro to right-shift the MACsec flag by + MLX5_ETH_WQE_FT_META_SHIFT and update the RoCE modify-header action to + use it. + + Introduce the helper macro MLX5_MACSEC_TX_METADATA(fs_id) to compose + the full shifted MACsec metadata value. + + These changes make it explicit exactly which metadata bits carry MACsec + information, simplifying future feature exclusions when multiple + features share the WQE flowtable metadata. + + In addition, drop the incorrect “RX flow steering” comment, since this + applies to TX flow steering. + + Signed-off-by: Carolina Jubran + Reviewed-by: Jianbo Liu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757574619-604874-3-git-send-email-tariqt@nvidia.com + Reviewed-by: Simon Horman + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c +index 6ab02f3fc291..528b04d4de41 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c +@@ -1676,7 +1676,7 @@ void mlx5e_macsec_tx_build_eseg(struct mlx5e_macsec *macsec, + if (!fs_id) + return; + +- eseg->flow_table_metadata = cpu_to_be32(MLX5_ETH_WQE_FT_META_MACSEC | fs_id << 2); ++ eseg->flow_table_metadata = cpu_to_be32(MLX5_MACSEC_TX_METADATA(fs_id)); + } + + void mlx5e_macsec_offload_handle_rx_skb(struct net_device *netdev, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c +index 762d55ba9e51..9ec450603176 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c +@@ -45,11 +45,7 @@ + #define MLX5_SECTAG_HEADER_SIZE_WITHOUT_SCI 0x8 + #define MLX5_SECTAG_HEADER_SIZE_WITH_SCI (MLX5_SECTAG_HEADER_SIZE_WITHOUT_SCI + MACSEC_SCI_LEN) + +-/* MACsec RX flow steering */ +-#define MLX5_ETH_WQE_FT_META_MACSEC_MASK 0x3E +- + /* MACsec fs_id handling for steering */ +-#define macsec_fs_set_tx_fs_id(fs_id) (MLX5_ETH_WQE_FT_META_MACSEC | (fs_id) << 2) + #define macsec_fs_set_rx_fs_id(fs_id) ((fs_id) | BIT(30)) + + struct mlx5_sectag_header { +@@ -597,7 +593,7 @@ static int macsec_fs_tx_setup_fte(struct mlx5_macsec_fs *macsec_fs, + MLX5_SET(fte_match_param, spec->match_criteria, misc_parameters_2.metadata_reg_a, + MLX5_ETH_WQE_FT_META_MACSEC_MASK); + MLX5_SET(fte_match_param, spec->match_value, misc_parameters_2.metadata_reg_a, +- macsec_fs_set_tx_fs_id(id)); ++ MLX5_MACSEC_TX_METADATA(id)); + + *fs_id = id; + flow_act->crypto.type = MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_MACSEC; +@@ -2219,8 +2215,10 @@ static int mlx5_macsec_fs_add_roce_rule_tx(struct mlx5_macsec_fs *macsec_fs, u32 + + MLX5_SET(set_action_in, action, action_type, MLX5_ACTION_TYPE_SET); + MLX5_SET(set_action_in, action, field, MLX5_ACTION_IN_FIELD_METADATA_REG_A); +- MLX5_SET(set_action_in, action, data, macsec_fs_set_tx_fs_id(fs_id)); +- MLX5_SET(set_action_in, action, offset, 0); ++ MLX5_SET(set_action_in, action, data, ++ mlx5_macsec_fs_set_tx_fs_id(fs_id)); ++ MLX5_SET(set_action_in, action, offset, ++ MLX5_ETH_WQE_FT_META_MACSEC_SHIFT); + MLX5_SET(set_action_in, action, length, 32); + + modify_hdr = mlx5_modify_header_alloc(mdev, MLX5_FLOW_NAMESPACE_RDMA_TX_MACSEC, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.h +index 34b80c3ef6a5..15acaff43641 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.h +@@ -12,6 +12,21 @@ + #define MLX5_MACSEC_METADATA_MARKER(metadata) ((((metadata) >> 30) & 0x3) == 0x1) + #define MLX5_MACSEC_RX_METADAT_HANDLE(metadata) ((metadata) & MLX5_MACSEC_RX_FS_ID_MASK) + ++/* MACsec TX flow steering */ ++#define MLX5_ETH_WQE_FT_META_MACSEC_MASK \ ++ (MLX5_ETH_WQE_FT_META_MACSEC | MLX5_ETH_WQE_FT_META_MACSEC_FS_ID_MASK) ++#define MLX5_ETH_WQE_FT_META_MACSEC_SHIFT MLX5_ETH_WQE_FT_META_SHIFT ++ ++/* MACsec fs_id handling for steering */ ++#define mlx5_macsec_fs_set_tx_fs_id(fs_id) \ ++ (((MLX5_ETH_WQE_FT_META_MACSEC) >> MLX5_ETH_WQE_FT_META_MACSEC_SHIFT) \ ++ | ((fs_id) << 2)) ++ ++#define MLX5_MACSEC_TX_METADATA(fs_id) \ ++ (mlx5_macsec_fs_set_tx_fs_id(fs_id) << \ ++ MLX5_ETH_WQE_FT_META_MACSEC_SHIFT) ++ ++/* MACsec fs_id uses 4 bits, supports up to 16 interfaces */ + #define MLX5_MACSEC_NUM_OF_SUPPORTED_INTERFACES 16 + + struct mlx5_macsec_fs; +diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h +index 5546c7bd2c83..b21be7630575 100644 +--- a/include/linux/mlx5/qp.h ++++ b/include/linux/mlx5/qp.h +@@ -251,9 +251,14 @@ enum { + MLX5_ETH_WQE_SWP_OUTER_L4_UDP = 1 << 5, + }; + ++/* Base shift for metadata bits used by timestamping, IPsec, and MACsec */ ++#define MLX5_ETH_WQE_FT_META_SHIFT 0 ++ + enum { +- MLX5_ETH_WQE_FT_META_IPSEC = BIT(0), +- MLX5_ETH_WQE_FT_META_MACSEC = BIT(1), ++ MLX5_ETH_WQE_FT_META_IPSEC = BIT(0) << MLX5_ETH_WQE_FT_META_SHIFT, ++ MLX5_ETH_WQE_FT_META_MACSEC = BIT(1) << MLX5_ETH_WQE_FT_META_SHIFT, ++ MLX5_ETH_WQE_FT_META_MACSEC_FS_ID_MASK = ++ GENMASK(5, 2) << MLX5_ETH_WQE_FT_META_SHIFT, + }; + + struct mlx5_wqe_eth_seg { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1540-net-mlx5e-prevent-wqe-metadata-conflicts-between-timestampin.patch b/SOURCES/1540-net-mlx5e-prevent-wqe-metadata-conflicts-between-timestampin.patch new file mode 100644 index 000000000..d3d939d36 --- /dev/null +++ b/SOURCES/1540-net-mlx5e-prevent-wqe-metadata-conflicts-between-timestampin.patch @@ -0,0 +1,79 @@ +From 3cc63348aa2158c664d92a20ee1ca8a6ed9fada3 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:59 -0400 +Subject: [PATCH] net/mlx5e: Prevent WQE metadata conflicts between + timestamping and offloads + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2ac207381c37eebc49559634ce5642119784bc7c +Author: Carolina Jubran +Date: Thu Sep 11 10:10:19 2025 +0300 + + net/mlx5e: Prevent WQE metadata conflicts between timestamping and offloads + + Update the WQE metadata assignment to avoid overriding existing + metadata when setting the sysport timestamp ID. Since timestamp IDs are + limited to 256 values, they use only the lower 8 bits of the metadata + field. + + To avoid conflicts, move IPsec and MACsec metadata ID to bits 8 and 9, + and shift the MACsec fs_id accordingly. This ensures safe coexistence + of timestamping and offload features that use the same metadata field. + + Signed-off-by: Carolina Jubran + Reviewed-by: Jianbo Liu + Reviewed-by: Patrisious Haddad + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1757574619-604874-4-git-send-email-tariqt@nvidia.com + Reviewed-by: Simon Horman + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +index e6a301ba3254..7ffc1cc7aa7d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +@@ -653,7 +653,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb, + struct mlx5_wqe_eth_seg *eseg) + { + if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) +- eseg->flow_table_metadata = ++ eseg->flow_table_metadata |= + cpu_to_be32(mlx5e_ptp_metadata_fifo_peek(&ptpsq->metadata_freelist)); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c +index 9ec450603176..e6be2f01daf4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/macsec_fs.c +@@ -2219,7 +2219,7 @@ static int mlx5_macsec_fs_add_roce_rule_tx(struct mlx5_macsec_fs *macsec_fs, u32 + mlx5_macsec_fs_set_tx_fs_id(fs_id)); + MLX5_SET(set_action_in, action, offset, + MLX5_ETH_WQE_FT_META_MACSEC_SHIFT); +- MLX5_SET(set_action_in, action, length, 32); ++ MLX5_SET(set_action_in, action, length, 8); + + modify_hdr = mlx5_modify_header_alloc(mdev, MLX5_FLOW_NAMESPACE_RDMA_TX_MACSEC, + 1, action); +diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h +index b21be7630575..d67aedc6ea68 100644 +--- a/include/linux/mlx5/qp.h ++++ b/include/linux/mlx5/qp.h +@@ -251,8 +251,9 @@ enum { + MLX5_ETH_WQE_SWP_OUTER_L4_UDP = 1 << 5, + }; + +-/* Base shift for metadata bits used by timestamping, IPsec, and MACsec */ +-#define MLX5_ETH_WQE_FT_META_SHIFT 0 ++/* Metadata bits 0-7 are used by timestamping */ ++/* Base shift for metadata bits used by IPsec and MACsec */ ++#define MLX5_ETH_WQE_FT_META_SHIFT 8 + + enum { + MLX5_ETH_WQE_FT_META_IPSEC = BIT(0) << MLX5_ETH_WQE_FT_META_SHIFT, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1541-net-mlx5-fix-typo-of-mlx5-eq-doorbel-offset.patch b/SOURCES/1541-net-mlx5-fix-typo-of-mlx5-eq-doorbel-offset.patch new file mode 100644 index 000000000..750841555 --- /dev/null +++ b/SOURCES/1541-net-mlx5-fix-typo-of-mlx5-eq-doorbel-offset.patch @@ -0,0 +1,50 @@ +From 509e1b64bb19abbd0d572d08c99d67f04a341c06 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:59 -0400 +Subject: [PATCH] net/mlx5: Fix typo of MLX5_EQ_DOORBEL_OFFSET + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 917449e7c3cdc7a0dfe429de997e39098d9cdd20 +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:35 2025 +0300 + + net/mlx5: Fix typo of MLX5_EQ_DOORBEL_OFFSET + + Also convert it to a simple define. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +index 1ab77159409d..f3c714ebd9cb 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +@@ -32,9 +32,7 @@ enum { + MLX5_EQ_STATE_ALWAYS_ARMED = 0xb, + }; + +-enum { +- MLX5_EQ_DOORBEL_OFFSET = 0x40, +-}; ++#define MLX5_EQ_DOORBELL_OFFSET 0x40 + + /* budget must be smaller than MLX5_NUM_SPARE_EQE to guarantee that we update + * the ci before we polled all the entries in the EQ. MLX5_NUM_SPARE_EQE is +@@ -322,7 +320,7 @@ create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, + eq->eqn = MLX5_GET(create_eq_out, out, eq_number); + eq->irqn = pci_irq_vector(dev->pdev, vecidx); + eq->dev = dev; +- eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET; ++ eq->doorbell = priv->uar->map + MLX5_EQ_DOORBELL_OFFSET; + + err = mlx5_debug_eq_add(dev, eq); + if (err) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1542-net-mlx5-remove-unused-offset-field-from-mlx5-sq-bfreg.patch b/SOURCES/1542-net-mlx5-remove-unused-offset-field-from-mlx5-sq-bfreg.patch new file mode 100644 index 000000000..201944208 --- /dev/null +++ b/SOURCES/1542-net-mlx5-remove-unused-offset-field-from-mlx5-sq-bfreg.patch @@ -0,0 +1,92 @@ +From 7fe95ab6c8a811f699662290775101f52aec8242 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:59 -0400 +Subject: [PATCH] net/mlx5: Remove unused 'offset' field from mlx5_sq_bfreg + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 05dfe654b5932322e297aba11dd6f3f26eea6ecb +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:36 2025 +0300 + + net/mlx5: Remove unused 'offset' field from mlx5_sq_bfreg + + The 'offset' field was introduced in the original commit [1] and never + used until commit [2], which added an unnecessary use. + + Remove the field and refactor the write-combining test to use a local + variable instead. + + [1] commit a6d51b68611e ("net/mlx5: Introduce blue flame register + allocator") + [2] commit d98995b4bf98 ("net/mlx5: Reimplement write combining test") + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +index 2f0316616fa4..276594586404 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +@@ -255,7 +255,8 @@ static void mlx5_wc_destroy_sq(struct mlx5_wc_sq *sq) + mlx5_wq_destroy(&sq->wq_ctrl); + } + +-static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, bool signaled) ++static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset, ++ bool signaled) + { + int buf_size = (1 << MLX5_CAP_GEN(sq->cq.mdev, log_bf_reg_size)) / 2; + struct mlx5_wqe_ctrl_seg *ctrl; +@@ -288,10 +289,10 @@ static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, bool signaled) + */ + wmb(); + +- __iowrite64_copy(sq->bfreg.map + sq->bfreg.offset, mmio_wqe, ++ __iowrite64_copy(sq->bfreg.map + *offset, mmio_wqe, + sizeof(mmio_wqe) / 8); + +- sq->bfreg.offset ^= buf_size; ++ *offset ^= buf_size; + } + + static int mlx5_wc_poll_cq(struct mlx5_wc_sq *sq) +@@ -332,6 +333,7 @@ static int mlx5_wc_poll_cq(struct mlx5_wc_sq *sq) + + static void mlx5_core_test_wc(struct mlx5_core_dev *mdev) + { ++ unsigned int offset = 0; + unsigned long expires; + struct mlx5_wc_sq *sq; + int i, err; +@@ -358,9 +360,9 @@ static void mlx5_core_test_wc(struct mlx5_core_dev *mdev) + goto err_create_sq; + + for (i = 0; i < TEST_WC_NUM_WQES - 1; i++) +- mlx5_wc_post_nop(sq, false); ++ mlx5_wc_post_nop(sq, &offset, false); + +- mlx5_wc_post_nop(sq, true); ++ mlx5_wc_post_nop(sq, &offset, true); + + expires = jiffies + TEST_WC_POLLING_MAX_TIME_JIFFIES; + do { +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index c5106eb8b413..c94d8828aa67 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -434,7 +434,6 @@ struct mlx5_sq_bfreg { + struct mlx5_uars_page *up; + bool wc; + u32 index; +- unsigned int offset; + }; + + struct mlx5_core_health { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1543-net-mlx5e-remove-unused-xsk-param-of-mlx5e-build-xdpsq-param.patch b/SOURCES/1543-net-mlx5e-remove-unused-xsk-param-of-mlx5e-build-xdpsq-param.patch new file mode 100644 index 000000000..b149ac622 --- /dev/null +++ b/SOURCES/1543-net-mlx5e-remove-unused-xsk-param-of-mlx5e-build-xdpsq-param.patch @@ -0,0 +1,78 @@ +From 66b4853aad07eca8de84bdaa5e730cad6ec93dd7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:59 -0400 +Subject: [PATCH] net/mlx5e: Remove unused 'xsk' param of + mlx5e_build_xdpsq_param + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 913d28f8a71cd8e38d6d788c70643f5a71507400 +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:37 2025 +0300 + + net/mlx5e: Remove unused 'xsk' param of mlx5e_build_xdpsq_param + + This was added in commit [1], but its only use removed in commit [2]. + The parameter is unused, so remove it from the function parameter list. + + [1] commit 9ded70fa1d81 ("net/mlx5e: Don't prefill WQEs in XDP SQ in the + multi buffer mode") + [2] commit 1a9304859b3a ("net/mlx5: XDP, Enable TX side XDP multi-buffer + support") + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +index 3cca06a74cf9..31e7f59bc19b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +@@ -1229,7 +1229,6 @@ static void mlx5e_build_async_icosq_param(struct mlx5_core_dev *mdev, + + void mlx5e_build_xdpsq_param(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, +- struct mlx5e_xsk_param *xsk, + struct mlx5e_sq_param *param) + { + void *sqc = param->sqc; +@@ -1256,7 +1255,7 @@ int mlx5e_build_channel_param(struct mlx5_core_dev *mdev, + async_icosq_log_wq_sz = mlx5e_build_async_icosq_log_wq_sz(mdev); + + mlx5e_build_sq_param(mdev, params, &cparam->txq_sq); +- mlx5e_build_xdpsq_param(mdev, params, NULL, &cparam->xdp_sq); ++ mlx5e_build_xdpsq_param(mdev, params, &cparam->xdp_sq); + mlx5e_build_icosq_param(mdev, icosq_log_wq_sz, &cparam->icosq); + mlx5e_build_async_icosq_param(mdev, async_icosq_log_wq_sz, &cparam->async_icosq); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +index 488ccdbc1e2c..e3edf79dde5f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +@@ -132,7 +132,6 @@ void mlx5e_build_tx_cq_param(struct mlx5_core_dev *mdev, + struct mlx5e_cq_param *param); + void mlx5e_build_xdpsq_param(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, +- struct mlx5e_xsk_param *xsk, + struct mlx5e_sq_param *param); + int mlx5e_build_channel_param(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +index d743e823362a..dbd88eb5c082 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +@@ -54,7 +54,7 @@ static void mlx5e_build_xsk_cparam(struct mlx5_core_dev *mdev, + struct mlx5e_channel_param *cparam) + { + mlx5e_build_rq_param(mdev, params, xsk, &cparam->rq); +- mlx5e_build_xdpsq_param(mdev, params, xsk, &cparam->xdp_sq); ++ mlx5e_build_xdpsq_param(mdev, params, &cparam->xdp_sq); + } + + static int mlx5e_init_xsk_rq(struct mlx5e_channel *c, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1544-net-mlx5-store-the-global-doorbell-in-mlx5-priv.patch b/SOURCES/1544-net-mlx5-store-the-global-doorbell-in-mlx5-priv.patch new file mode 100644 index 000000000..cf5518c2e --- /dev/null +++ b/SOURCES/1544-net-mlx5-store-the-global-doorbell-in-mlx5-priv.patch @@ -0,0 +1,371 @@ +From efdec7d57b38f8166d2270d527a7eb2189382893 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:20:59 -0400 +Subject: [PATCH] net/mlx5: Store the global doorbell in mlx5_priv + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit aa4595d0ada65d5d44fa924a42a87c175d9d88e3 +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:38 2025 +0300 + + net/mlx5: Store the global doorbell in mlx5_priv + + The global doorbell is used for more than just Ethernet resources, so + move it out of mlx5e_hw_objs into a common place (mlx5_priv), to avoid + non-Ethernet modules (e.g. HWS, ASO) depending on Ethernet structs. + + Use this opportunity to consolidate it with the 'uar' pointer already + there, which was used as an RX doorbell. Underneath the 'uar' pointer is + identical to 'bfreg->up', so store a single resource and use that + instead. + + For CQ doorbells, care is taken to always use bfreg->up->index instead + of bfreg->index, which may refer to a subsequent UAR page from the same + ALLOC_UAR batch on some NICs. + + This paves the way for cleanly supporting multiple doorbells in the + Ethernet driver. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c +index 78cd3151f2ed..f23eb22e98ff 100644 +--- a/drivers/infiniband/hw/mlx5/cq.c ++++ b/drivers/infiniband/hw/mlx5/cq.c +@@ -645,7 +645,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags) + { + struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev; + struct mlx5_ib_cq *cq = to_mcq(ibcq); +- void __iomem *uar_page = mdev->priv.uar->map; ++ void __iomem *uar_page = mdev->priv.bfreg.up->map; + unsigned long irq_flags; + int ret = 0; + +@@ -920,7 +920,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq, + cq->buf.frag_buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + +- *index = dev->mdev->priv.uar->index; ++ *index = dev->mdev->priv.bfreg.up->index; + + return 0; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c +index 1fd403713baf..35039a95dcfd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c +@@ -145,7 +145,7 @@ int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, + mlx5_core_dbg(dev, "failed adding CP 0x%x to debug file system\n", + cq->cqn); + +- cq->uar = dev->priv.uar; ++ cq->uar = dev->priv.bfreg.up; + cq->irqn = eq->core.irqn; + + return 0; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +index 31e7f59bc19b..b6b4ae7c59fa 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +@@ -810,7 +810,7 @@ static void mlx5e_build_common_cq_param(struct mlx5_core_dev *mdev, + { + void *cqc = param->cqc; + +- MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); + if (MLX5_CAP_GEN(mdev, cqe_128_always) && cache_line_size() >= 128) + MLX5_SET(cqc, cqc, cqe_sz, CQE_STRIDE_128_PAD); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +index 71fb20f63bc3..3729a41dc558 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +@@ -333,7 +333,7 @@ static int mlx5e_ptp_alloc_txqsq(struct mlx5e_ptp *c, int txq_ix, + sq->mdev = mdev; + sq->ch_ix = MLX5E_PTP_CHANNEL_IX; + sq->txq_ix = txq_ix; +- sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; ++ sq->uar_map = mdev->priv.bfreg.map; + sq->min_inline_mode = params->tx_min_inline_mode; + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + sq->stats = &c->priv->ptp_stats.sq[tc]; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +index 6ed3a32b7e22..e9e36358c39d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +@@ -163,17 +163,11 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + goto err_dealloc_transport_domain; + } + +- err = mlx5_alloc_bfreg(mdev, &res->bfreg, false, false); +- if (err) { +- mlx5_core_err(mdev, "alloc bfreg failed, %d\n", err); +- goto err_destroy_mkey; +- } +- + if (create_tises) { + err = mlx5e_create_tises(mdev, res->tisn); + if (err) { + mlx5_core_err(mdev, "alloc tises failed, %d\n", err); +- goto err_destroy_bfreg; ++ goto err_destroy_mkey; + } + res->tisn_valid = true; + } +@@ -190,8 +184,6 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + + return 0; + +-err_destroy_bfreg: +- mlx5_free_bfreg(mdev, &res->bfreg); + err_destroy_mkey: + mlx5_core_destroy_mkey(mdev, res->mkey); + err_dealloc_transport_domain: +@@ -209,7 +201,6 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) + mdev->mlx5e_res.dek_priv = NULL; + if (res->tisn_valid) + mlx5e_destroy_tises(mdev, res->tisn); +- mlx5_free_bfreg(mdev, &res->bfreg); + mlx5_core_destroy_mkey(mdev, res->mkey); + mlx5_core_dealloc_transport_domain(mdev, res->td.tdn); + mlx5_core_dealloc_pd(mdev, res->pdn); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index b09291decca5..0395316f7031 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -1533,7 +1533,7 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, + sq->pdev = c->pdev; + sq->mkey_be = c->mkey_be; + sq->channel = c; +- sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; ++ sq->uar_map = mdev->priv.bfreg.map; + sq->min_inline_mode = params->tx_min_inline_mode; + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu) - ETH_FCS_LEN; + sq->xsk_pool = xsk_pool; +@@ -1618,7 +1618,7 @@ static int mlx5e_alloc_icosq(struct mlx5e_channel *c, + int err; + + sq->channel = c; +- sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; ++ sq->uar_map = mdev->priv.bfreg.map; + sq->reserved_room = param->stop_room; + + param->wq.db_numa_node = cpu_to_node(c->cpu); +@@ -1703,7 +1703,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c, + sq->priv = c->priv; + sq->ch_ix = c->ix; + sq->txq_ix = txq_ix; +- sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; ++ sq->uar_map = mdev->priv.bfreg.map; + sq->min_inline_mode = params->tx_min_inline_mode; + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + sq->max_sq_mpw_wqebbs = mlx5e_get_max_sq_aligned_wqebbs(mdev); +@@ -1779,7 +1779,7 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev, + MLX5_SET(sqc, sqc, flush_in_error_en, 1); + + MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); +- MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.hw_objs.bfreg.index); ++ MLX5_SET(wq, wq, uar_page, mdev->priv.bfreg.index); + MLX5_SET(wq, wq, log_wq_pg_sz, csp->wq_ctrl->buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(wq, wq, dbr_addr, csp->wq_ctrl->db.dma); +@@ -2261,7 +2261,7 @@ static int mlx5e_create_cq(struct mlx5e_cq *cq, struct mlx5e_cq_param *param) + MLX5_SET(cqc, cqc, cq_period_mode, mlx5e_cq_period_mode(param->cq_period_mode)); + + MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); +- MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); + MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +index f3c714ebd9cb..25499da177bc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c +@@ -307,7 +307,7 @@ create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, + + eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry); + MLX5_SET(eqc, eqc, log_eq_size, eq->fbc.log_sz); +- MLX5_SET(eqc, eqc, uar_page, priv->uar->index); ++ MLX5_SET(eqc, eqc, uar_page, priv->bfreg.up->index); + MLX5_SET(eqc, eqc, intr, vecidx); + MLX5_SET(eqc, eqc, log_page_size, + eq->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); +@@ -320,7 +320,7 @@ create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, + eq->eqn = MLX5_GET(create_eq_out, out, eq_number); + eq->irqn = pci_irq_vector(dev->pdev, vecidx); + eq->dev = dev; +- eq->doorbell = priv->uar->map + MLX5_EQ_DOORBELL_OFFSET; ++ eq->doorbell = priv->bfreg.up->map + MLX5_EQ_DOORBELL_OFFSET; + + err = mlx5_debug_eq_add(dev, eq); + if (err) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c +index 58bd749b5e4d..129725159a93 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c +@@ -100,7 +100,7 @@ static int create_aso_cq(struct mlx5_aso_cq *cq, void *cqc_data) + + MLX5_SET(cqc, cqc, cq_period_mode, MLX5_CQ_PERIOD_MODE_START_FROM_EQE); + MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); +- MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); + MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); +@@ -129,7 +129,7 @@ static int mlx5_aso_create_cq(struct mlx5_core_dev *mdev, int numa_node, + return -ENOMEM; + + MLX5_SET(cqc, cqc_data, log_cq_size, 1); +- MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.bfreg.up->index); + if (MLX5_CAP_GEN(mdev, cqe_128_always) && cache_line_size() >= 128) + MLX5_SET(cqc, cqc_data, cqe_sz, CQE_STRIDE_128_PAD); + +@@ -163,7 +163,7 @@ static int mlx5_aso_alloc_sq(struct mlx5_core_dev *mdev, int numa_node, + struct mlx5_wq_param param; + int err; + +- sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; ++ sq->uar_map = mdev->priv.bfreg.map; + + param.db_numa_node = numa_node; + param.buf_numa_node = numa_node; +@@ -203,7 +203,7 @@ static int create_aso_sq(struct mlx5_core_dev *mdev, int pdn, + MLX5_SET(sqc, sqc, ts_format, ts_format); + + MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); +- MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.hw_objs.bfreg.index); ++ MLX5_SET(wq, wq, uar_page, mdev->priv.bfreg.index); + MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 00fe79878c4f..c48f3d9765f7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1316,10 +1316,9 @@ static int mlx5_load(struct mlx5_core_dev *dev) + { + int err; + +- dev->priv.uar = mlx5_get_uars_page(dev); +- if (IS_ERR(dev->priv.uar)) { +- mlx5_core_err(dev, "Failed allocating uar, aborting\n"); +- err = PTR_ERR(dev->priv.uar); ++ err = mlx5_alloc_bfreg(dev, &dev->priv.bfreg, false, false); ++ if (err) { ++ mlx5_core_err(dev, "Failed allocating bfreg, %d\n", err); + return err; + } + +@@ -1430,7 +1429,7 @@ static int mlx5_load(struct mlx5_core_dev *dev) + err_irq_table: + mlx5_pagealloc_stop(dev); + mlx5_events_stop(dev); +- mlx5_put_uars_page(dev, dev->priv.uar); ++ mlx5_free_bfreg(dev, &dev->priv.bfreg); + return err; + } + +@@ -1455,7 +1454,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev) + mlx5_irq_table_destroy(dev); + mlx5_pagealloc_stop(dev); + mlx5_events_stop(dev); +- mlx5_put_uars_page(dev, dev->priv.uar); ++ mlx5_free_bfreg(dev, &dev->priv.bfreg); + } + + int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +index b0595c9b09e4..24ef7d66fa8a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +@@ -690,7 +690,7 @@ static int hws_send_ring_alloc_sq(struct mlx5_core_dev *mdev, + size_t buf_sz; + int err; + +- sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; ++ sq->uar_map = mdev->priv.bfreg.map; + sq->mdev = mdev; + + param.db_numa_node = numa_node; +@@ -764,7 +764,7 @@ static int hws_send_ring_create_sq(struct mlx5_core_dev *mdev, u32 pdn, + MLX5_SET(sqc, sqc, ts_format, ts_format); + + MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); +- MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.hw_objs.bfreg.index); ++ MLX5_SET(wq, wq, uar_page, mdev->priv.bfreg.index); + MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma); + +@@ -940,7 +940,7 @@ static int hws_send_ring_create_cq(struct mlx5_core_dev *mdev, + (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas)); + + MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); +- MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); + MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); + +@@ -963,7 +963,7 @@ static int hws_send_ring_open_cq(struct mlx5_core_dev *mdev, + if (!cqc_data) + return -ENOMEM; + +- MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.bfreg.up->index); + MLX5_SET(cqc, cqc_data, log_cq_size, ilog2(queue->num_entries)); + + err = hws_send_ring_alloc_cq(mdev, numa_node, queue, cqc_data, cq); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +index 276594586404..999d6216648a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +@@ -94,7 +94,7 @@ static int create_wc_cq(struct mlx5_wc_cq *cq, void *cqc_data) + + MLX5_SET(cqc, cqc, cq_period_mode, MLX5_CQ_PERIOD_MODE_START_FROM_EQE); + MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); +- MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); + MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); +@@ -116,7 +116,7 @@ static int mlx5_wc_create_cq(struct mlx5_core_dev *mdev, struct mlx5_wc_cq *cq) + return -ENOMEM; + + MLX5_SET(cqc, cqc, log_cq_size, TEST_WC_LOG_CQ_SZ); +- MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); ++ MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); + if (MLX5_CAP_GEN(mdev, cqe_128_always) && cache_line_size() >= 128) + MLX5_SET(cqc, cqc, cqe_sz, CQE_STRIDE_128_PAD); + +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index c94d8828aa67..5f2cdfa2588d 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -612,7 +612,7 @@ struct mlx5_priv { + struct mlx5_ft_pool *ft_pool; + + struct mlx5_bfreg_data bfregs; +- struct mlx5_uars_page *uar; ++ struct mlx5_sq_bfreg bfreg; + #ifdef CONFIG_MLX5_SF + struct mlx5_vhca_state_notifier *vhca_state_notifier; + struct mlx5_sf_dev_table *sf_dev_table; +@@ -658,7 +658,6 @@ struct mlx5e_resources { + u32 pdn; + struct mlx5_td td; + u32 mkey; +- struct mlx5_sq_bfreg bfreg; + #define MLX5_MAX_NUM_TC 8 + u32 tisn[MLX5_MAX_PORTS][MLX5_MAX_NUM_TC]; + bool tisn_valid; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1545-net-mlx5e-prepare-for-using-multiple-tx-doorbells.patch b/SOURCES/1545-net-mlx5e-prepare-for-using-multiple-tx-doorbells.patch new file mode 100644 index 000000000..d82cd425a --- /dev/null +++ b/SOURCES/1545-net-mlx5e-prepare-for-using-multiple-tx-doorbells.patch @@ -0,0 +1,194 @@ +From 8296ee6bac850052245bd6cdbe58555e8c05e891 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:23:40 -0400 +Subject: [PATCH] net/mlx5e: Prepare for using multiple TX doorbells + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Context diff due to the missing of the following commit: +8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking") + +commit 673d7ab7563e1268ac4ca62914b2b99d16219500 +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:39 2025 +0300 + + net/mlx5e: Prepare for using multiple TX doorbells + + The driver allocates a single doorbell per device and uses + it for all Send Queues (SQs). This can become a bottleneck due to the + high number of concurrent MMIO accesses when ringing the same doorbell + from many channels. + + This patch makes the doorbells used by channel queues configurable. + + mlx5e_channel_pick_doorbell() is added to select the doorbell to be used + for a given channel, picking the default for now. + + When opening a channel, the selected doorbell is saved to the channel + struct and used whenever channel-related queues are created. + + Finally, 'uar_page' is added to 'struct mlx5e_create_sq_param' to + control which doorbell to use when allocating an SQ, since that can + happen outside channel context (e.g. for PTP). + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 5e150e083829..fdd15d674b69 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -788,6 +788,7 @@ struct mlx5e_channel { + int vec_ix; + int sd_ix; + int cpu; ++ struct mlx5_sq_bfreg *bfreg; + /* Sync between icosq recovery and XSK enable/disable. */ + struct mutex icosq_recovery_lock; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +index e3edf79dde5f..00617c65fe3c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h +@@ -51,6 +51,7 @@ struct mlx5e_create_sq_param { + u32 tisn; + u8 tis_lst_sz; + u8 min_inline_mode; ++ u32 uar_page; + }; + + /* Striding RQ dynamic parameters */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +index 3729a41dc558..1cc962232ea8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +@@ -333,7 +333,7 @@ static int mlx5e_ptp_alloc_txqsq(struct mlx5e_ptp *c, int txq_ix, + sq->mdev = mdev; + sq->ch_ix = MLX5E_PTP_CHANNEL_IX; + sq->txq_ix = txq_ix; +- sq->uar_map = mdev->priv.bfreg.map; ++ sq->uar_map = c->bfreg->map; + sq->min_inline_mode = params->tx_min_inline_mode; + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + sq->stats = &c->priv->ptp_stats.sq[tc]; +@@ -471,6 +471,7 @@ static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn, + csp.wq_ctrl = &txqsq->wq_ctrl; + csp.min_inline_mode = txqsq->min_inline_mode; + csp.ts_cqe_to_dest_cqn = ptpsq->ts_cq.mcq.cqn; ++ csp.uar_page = c->bfreg->index; + + err = mlx5e_create_sq_rdy(c->mdev, sqp, &csp, 0, &txqsq->sqn); + if (err) +@@ -885,6 +886,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params, + c->num_tc = mlx5e_get_dcb_num_tc(params); + c->stats = &priv->ptp_stats.ch; + c->lag_port = lag_port; ++ c->bfreg = &mdev->priv.bfreg; + + err = mlx5e_ptp_set_state(c, params); + if (err) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +index 883c044852f1..1b3c9648220b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +@@ -66,6 +66,7 @@ struct mlx5e_ptp { + struct mlx5_core_dev *mdev; + struct hwtstamp_config *tstamp; + DECLARE_BITMAP(state, MLX5E_PTP_STATE_NUM_STATES); ++ struct mlx5_sq_bfreg *bfreg; + }; + + static inline bool mlx5e_use_ptpsq(struct sk_buff *skb) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 0395316f7031..3c617c2bd471 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -1533,7 +1533,7 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, + sq->pdev = c->pdev; + sq->mkey_be = c->mkey_be; + sq->channel = c; +- sq->uar_map = mdev->priv.bfreg.map; ++ sq->uar_map = c->bfreg->map; + sq->min_inline_mode = params->tx_min_inline_mode; + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu) - ETH_FCS_LEN; + sq->xsk_pool = xsk_pool; +@@ -1618,7 +1618,7 @@ static int mlx5e_alloc_icosq(struct mlx5e_channel *c, + int err; + + sq->channel = c; +- sq->uar_map = mdev->priv.bfreg.map; ++ sq->uar_map = c->bfreg->map; + sq->reserved_room = param->stop_room; + + param->wq.db_numa_node = cpu_to_node(c->cpu); +@@ -1703,7 +1703,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c, + sq->priv = c->priv; + sq->ch_ix = c->ix; + sq->txq_ix = txq_ix; +- sq->uar_map = mdev->priv.bfreg.map; ++ sq->uar_map = c->bfreg->map; + sq->min_inline_mode = params->tx_min_inline_mode; + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + sq->max_sq_mpw_wqebbs = mlx5e_get_max_sq_aligned_wqebbs(mdev); +@@ -1779,7 +1779,7 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev, + MLX5_SET(sqc, sqc, flush_in_error_en, 1); + + MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); +- MLX5_SET(wq, wq, uar_page, mdev->priv.bfreg.index); ++ MLX5_SET(wq, wq, uar_page, csp->uar_page); + MLX5_SET(wq, wq, log_wq_pg_sz, csp->wq_ctrl->buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(wq, wq, dbr_addr, csp->wq_ctrl->db.dma); +@@ -1883,6 +1883,7 @@ int mlx5e_open_txqsq(struct mlx5e_channel *c, u32 tisn, int txq_ix, + csp.cqn = sq->cq.mcq.cqn; + csp.wq_ctrl = &sq->wq_ctrl; + csp.min_inline_mode = sq->min_inline_mode; ++ csp.uar_page = c->bfreg->index; + err = mlx5e_create_sq_rdy(c->mdev, param, &csp, qos_queue_group_id, &sq->sqn); + if (err) + goto err_free_txqsq; +@@ -2040,6 +2041,7 @@ static int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params + csp.cqn = sq->cq.mcq.cqn; + csp.wq_ctrl = &sq->wq_ctrl; + csp.min_inline_mode = params->tx_min_inline_mode; ++ csp.uar_page = c->bfreg->index; + err = mlx5e_create_sq_rdy(c->mdev, param, &csp, 0, &sq->sqn); + if (err) + goto err_free_icosq; +@@ -2100,6 +2102,7 @@ int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, + csp.cqn = sq->cq.mcq.cqn; + csp.wq_ctrl = &sq->wq_ctrl; + csp.min_inline_mode = sq->min_inline_mode; ++ csp.uar_page = c->bfreg->index; + set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state); + + err = mlx5e_create_sq_rdy(c->mdev, param, &csp, 0, &sq->sqn); +@@ -2728,6 +2731,11 @@ void mlx5e_trigger_napi_sched(struct napi_struct *napi) + local_bh_enable(); + } + ++static void mlx5e_channel_pick_doorbell(struct mlx5e_channel *c) ++{ ++ c->bfreg = &c->mdev->priv.bfreg; ++} ++ + static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, + struct mlx5e_params *params, + struct xsk_buff_pool *xsk_pool, +@@ -2782,6 +2790,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, + c->aff_mask = irq_get_effective_affinity_mask(irq); + c->lag_port = mlx5e_enumerate_lag_port(mdev, ix); + ++ mlx5e_channel_pick_doorbell(c); ++ + netif_napi_add_config(netdev, &c->napi, mlx5e_napi_poll, ix); + netif_napi_set_irq(&c->napi, irq); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1546-net-mlx5e-prepare-for-using-different-cq-doorbells.patch b/SOURCES/1546-net-mlx5e-prepare-for-using-different-cq-doorbells.patch new file mode 100644 index 000000000..343f88702 --- /dev/null +++ b/SOURCES/1546-net-mlx5e-prepare-for-using-different-cq-doorbells.patch @@ -0,0 +1,168 @@ +From b414407f3762b30a2690b20186179814bef40a8b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:28 -0400 +Subject: [PATCH] net/mlx5e: Prepare for using different CQ doorbells + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a315b723e87ba4e4573e1e5c759d512f38bdc0b3 +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:40 2025 +0300 + + net/mlx5e: Prepare for using different CQ doorbells + + Completion queues (CQs) in mlx5 use the same global doorbell, which may + become contended when accessed concurrently from many cores. + + This patch prepares the CQ management code for supporting different + doorbells per CQ. This will be used in downstream patches to allow + separate doorbells to be used by channels CQs. + + The main change is moving the 'uar' pointer from struct mlx5_core_cq to + struct mlx5e_cq, as the uar page to be used is better off stored + directly there. Other users of mlx5_core_cq also store the UAR to be + used separately and therefore the pointer being removed is dead weight + for them. As evidence, in this patch there are two users which set the + mcq.uar pointer but didn't use it, Software Steering and old Innova CQ + creation code. Instead, they rang the doorbell directly from another + pointer. + + The 'uar' pointer added to struct mlx5e_cq remains in a hot cacheline + (as before), because it may get accessed for each packet. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c +index 35039a95dcfd..e9f319a9bdd6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c +@@ -145,7 +145,6 @@ int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, + mlx5_core_dbg(dev, "failed adding CP 0x%x to debug file system\n", + cq->cqn); + +- cq->uar = dev->priv.bfreg.up; + cq->irqn = eq->core.irqn; + + return 0; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index fdd15d674b69..28c8bfb47a2a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -344,6 +344,7 @@ struct mlx5e_cq { + /* data path - accessed per napi poll */ + u16 event_ctr; + struct napi_struct *napi; ++ struct mlx5_uars_page *uar; + struct mlx5_core_cq mcq; + struct mlx5e_ch_stats *ch_stats; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h +index e837c21d3d21..8189d5e1ef49 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h +@@ -309,10 +309,7 @@ mlx5e_notify_hw(struct mlx5_wq_cyc *wq, u16 pc, void __iomem *uar_map, + + static inline void mlx5e_cq_arm(struct mlx5e_cq *cq) + { +- struct mlx5_core_cq *mcq; +- +- mcq = &cq->mcq; +- mlx5_cq_arm(mcq, MLX5_CQ_DB_REQ_NOT, mcq->uar->map, cq->wq.cc); ++ mlx5_cq_arm(&cq->mcq, MLX5_CQ_DB_REQ_NOT, cq->uar->map, cq->wq.cc); + } + + static inline struct mlx5e_sq_dma * +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 3c617c2bd471..ca4f1c69df54 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -2173,6 +2173,7 @@ static void mlx5e_close_xdpredirect_sq(struct mlx5e_xdpsq *xdpsq) + static int mlx5e_alloc_cq_common(struct mlx5_core_dev *mdev, + struct net_device *netdev, + struct workqueue_struct *workqueue, ++ struct mlx5_uars_page *uar, + struct mlx5e_cq_param *param, + struct mlx5e_cq *cq) + { +@@ -2204,6 +2205,7 @@ static int mlx5e_alloc_cq_common(struct mlx5_core_dev *mdev, + cq->mdev = mdev; + cq->netdev = netdev; + cq->workqueue = workqueue; ++ cq->uar = uar; + + return 0; + } +@@ -2219,7 +2221,8 @@ static int mlx5e_alloc_cq(struct mlx5_core_dev *mdev, + param->wq.db_numa_node = ccp->node; + param->eq_ix = ccp->ix; + +- err = mlx5e_alloc_cq_common(mdev, ccp->netdev, ccp->wq, param, cq); ++ err = mlx5e_alloc_cq_common(mdev, ccp->netdev, ccp->wq, ++ mdev->priv.bfreg.up, param, cq); + + cq->napi = ccp->napi; + cq->ch_stats = ccp->ch_stats; +@@ -2264,7 +2267,7 @@ static int mlx5e_create_cq(struct mlx5e_cq *cq, struct mlx5e_cq_param *param) + MLX5_SET(cqc, cqc, cq_period_mode, mlx5e_cq_period_mode(param->cq_period_mode)); + + MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); +- MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); ++ MLX5_SET(cqc, cqc, uar_page, cq->uar->index); + MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - + MLX5_ADAPTER_PAGE_SHIFT); + MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); +@@ -3563,7 +3566,8 @@ static int mlx5e_alloc_drop_cq(struct mlx5e_priv *priv, + param->wq.buf_numa_node = dev_to_node(mlx5_core_dma_dev(mdev)); + param->wq.db_numa_node = dev_to_node(mlx5_core_dma_dev(mdev)); + +- return mlx5e_alloc_cq_common(priv->mdev, priv->netdev, priv->wq, param, cq); ++ return mlx5e_alloc_cq_common(priv->mdev, priv->netdev, priv->wq, ++ mdev->priv.bfreg.up, param, cq); + } + + int mlx5e_open_drop_rq(struct mlx5e_priv *priv, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c +index c4de6bf8d1b6..cb1319974f83 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c +@@ -475,7 +475,6 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size) + *conn->cq.mcq.arm_db = 0; + conn->cq.mcq.vector = 0; + conn->cq.mcq.comp = mlx5_fpga_conn_cq_complete; +- conn->cq.mcq.uar = fdev->conn_res.uar; + tasklet_setup(&conn->cq.tasklet, mlx5_fpga_conn_cq_tasklet); + + mlx5_fpga_dbg(fdev, "Created CQ #0x%x\n", conn->cq.mcq.cqn); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c +index 4fd4e8483382..077a77fde670 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c +@@ -1131,7 +1131,6 @@ static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, + *cq->mcq.arm_db = cpu_to_be32(2 << 28); + + cq->mcq.vector = 0; +- cq->mcq.uar = uar; + cq->mdev = mdev; + + return cq; +diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h +index 991526039ccb..7ef2c7c7d803 100644 +--- a/include/linux/mlx5/cq.h ++++ b/include/linux/mlx5/cq.h +@@ -41,7 +41,6 @@ struct mlx5_core_cq { + int cqe_sz; + __be32 *set_ci_db; + __be32 *arm_db; +- struct mlx5_uars_page *uar; + refcount_t refcount; + struct completion free; + unsigned vector; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1547-net-mlx5e-use-multiple-tx-doorbells.patch b/SOURCES/1547-net-mlx5e-use-multiple-tx-doorbells.patch new file mode 100644 index 000000000..eaef79383 --- /dev/null +++ b/SOURCES/1547-net-mlx5e-use-multiple-tx-doorbells.patch @@ -0,0 +1,142 @@ +From d549d7b310aa69dabf312c34ac10967ba9df1c21 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:28 -0400 +Subject: [PATCH] net/mlx5e: Use multiple TX doorbells + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 71fb4832d50b01f0af2d257360c239879ce93a8e +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:41 2025 +0300 + + net/mlx5e: Use multiple TX doorbells + + First, allocate more doorbells in mlx5e_create_mdev_resources: + - one doorbell remains 'global' and will be used by all non-channel + associated SQs (e.g. ASO, HWS, PTP, ...). + - allocate additional 'num_doorbells' doorbells. This defaults to + minimum between 8 and max number of channels. + + mlx5e_channel_pick_doorbell() now spreads out channel SQs across + available doorbells. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +index e9e36358c39d..d13cebbc763a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +@@ -143,6 +143,7 @@ static int mlx5e_create_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PORT + int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + { + struct mlx5e_hw_objs *res = &mdev->mlx5e_res.hw_objs; ++ unsigned int num_doorbells, i; + int err; + + err = mlx5_core_alloc_pd(mdev, &res->pdn); +@@ -163,11 +164,30 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + goto err_dealloc_transport_domain; + } + ++ num_doorbells = min(MLX5_DEFAULT_NUM_DOORBELLS, ++ mlx5e_get_max_num_channels(mdev)); ++ res->bfregs = kcalloc(num_doorbells, sizeof(*res->bfregs), GFP_KERNEL); ++ if (!res->bfregs) { ++ err = -ENOMEM; ++ goto err_destroy_mkey; ++ } ++ ++ for (i = 0; i < num_doorbells; i++) { ++ err = mlx5_alloc_bfreg(mdev, res->bfregs + i, false, false); ++ if (err) { ++ mlx5_core_warn(mdev, ++ "could only allocate %d/%d doorbells, err %d.\n", ++ i, num_doorbells, err); ++ break; ++ } ++ } ++ res->num_bfregs = i; ++ + if (create_tises) { + err = mlx5e_create_tises(mdev, res->tisn); + if (err) { + mlx5_core_err(mdev, "alloc tises failed, %d\n", err); +- goto err_destroy_mkey; ++ goto err_destroy_bfregs; + } + res->tisn_valid = true; + } +@@ -184,6 +204,10 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + + return 0; + ++err_destroy_bfregs: ++ for (i = 0; i < res->num_bfregs; i++) ++ mlx5_free_bfreg(mdev, res->bfregs + i); ++ kfree(res->bfregs); + err_destroy_mkey: + mlx5_core_destroy_mkey(mdev, res->mkey); + err_dealloc_transport_domain: +@@ -201,6 +225,9 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) + mdev->mlx5e_res.dek_priv = NULL; + if (res->tisn_valid) + mlx5e_destroy_tises(mdev, res->tisn); ++ for (unsigned int i = 0; i < res->num_bfregs; i++) ++ mlx5_free_bfreg(mdev, res->bfregs + i); ++ kfree(res->bfregs); + mlx5_core_destroy_mkey(mdev, res->mkey); + mlx5_core_dealloc_transport_domain(mdev, res->td.tdn); + mlx5_core_dealloc_pd(mdev, res->pdn); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index ca4f1c69df54..9727a40e1961 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -2736,7 +2736,16 @@ void mlx5e_trigger_napi_sched(struct napi_struct *napi) + + static void mlx5e_channel_pick_doorbell(struct mlx5e_channel *c) + { +- c->bfreg = &c->mdev->priv.bfreg; ++ struct mlx5e_hw_objs *hw_objs = &c->mdev->mlx5e_res.hw_objs; ++ ++ /* No dedicated Ethernet doorbells, use the global one. */ ++ if (hw_objs->num_bfregs == 0) { ++ c->bfreg = &c->mdev->priv.bfreg; ++ return; ++ } ++ ++ /* Round-robin between doorbells. */ ++ c->bfreg = hw_objs->bfregs + c->vec_ix % hw_objs->num_bfregs; + } + + static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 5f2cdfa2588d..5405ca1038f9 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -658,6 +658,8 @@ struct mlx5e_resources { + u32 pdn; + struct mlx5_td td; + u32 mkey; ++ struct mlx5_sq_bfreg *bfregs; ++ unsigned int num_bfregs; + #define MLX5_MAX_NUM_TC 8 + u32 tisn[MLX5_MAX_PORTS][MLX5_MAX_NUM_TC]; + bool tisn_valid; +@@ -802,6 +804,8 @@ struct mlx5_db { + int index; + }; + ++#define MLX5_DEFAULT_NUM_DOORBELLS 8 ++ + enum { + MLX5_COMP_EQ_SIZE = 1024, + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1548-net-mlx5e-use-multiple-cq-doorbells.patch b/SOURCES/1548-net-mlx5e-use-multiple-cq-doorbells.patch new file mode 100644 index 000000000..0a79d9997 --- /dev/null +++ b/SOURCES/1548-net-mlx5e-use-multiple-cq-doorbells.patch @@ -0,0 +1,106 @@ +From dd12546e17527a5b54fd9c3072bfca12613e0cbb Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:28 -0400 +Subject: [PATCH] net/mlx5e: Use multiple CQ doorbells + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 325db9c6f69b1408ccd2c6e237fc07697a9f210f +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:42 2025 +0300 + + net/mlx5e: Use multiple CQ doorbells + + Channel doorbells are now also used by all channel CQs. + + A new 'uar' parameter is added to 'struct mlx5e_create_cq_param', + which is then used in mlx5e_alloc_cq. + + A single UAR page has two TX doorbells and a single CQ doorbell, so + every consecutive pair of 'struct mlx5_sq_bfreg' (TX doorbells) + uses the same underlying 'struct mlx5_uars_page' (CQ doorbell). + So by using c->bfreg->up, CQs from every consecutive channel pair will + share the same CQ doorbell. + + Non-channel associated CQs keep using the global CQ doorbell. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 28c8bfb47a2a..9098e526fbb6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -1062,6 +1062,7 @@ struct mlx5e_create_cq_param { + struct mlx5e_ch_stats *ch_stats; + int node; + int ix; ++ struct mlx5_uars_page *uar; + }; + + struct mlx5e_cq_param; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +index b6b4ae7c59fa..596440c8c364 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +@@ -611,6 +611,7 @@ void mlx5e_build_create_cq_param(struct mlx5e_create_cq_param *ccp, struct mlx5e + .ch_stats = c->stats, + .node = cpu_to_node(c->cpu), + .ix = c->vec_ix, ++ .uar = c->bfreg->up, + }; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +index 1cc962232ea8..d1e0f974b8a3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +@@ -563,6 +563,7 @@ static int mlx5e_ptp_open_tx_cqs(struct mlx5e_ptp *c, + ccp.ch_stats = c->stats; + ccp.napi = &c->napi; + ccp.ix = MLX5E_PTP_CHANNEL_IX; ++ ccp.uar = c->bfreg->up; + + cq_param = &cparams->txq_sq_param.cqp; + +@@ -612,6 +613,7 @@ static int mlx5e_ptp_open_rx_cq(struct mlx5e_ptp *c, + ccp.ch_stats = c->stats; + ccp.napi = &c->napi; + ccp.ix = MLX5E_PTP_CHANNEL_IX; ++ ccp.uar = c->bfreg->up; + + cq_param = &cparams->rq_param.cqp; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c +index 140606fcd23b..5099a1c47f4f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c +@@ -76,6 +76,7 @@ static int mlx5e_open_trap_rq(struct mlx5e_priv *priv, struct mlx5e_trap *t) + ccp.ch_stats = t->stats; + ccp.napi = &t->napi; + ccp.ix = 0; ++ ccp.uar = mdev->priv.bfreg.up; + err = mlx5e_open_cq(priv->mdev, trap_moder, &rq_param->cqp, &ccp, &rq->cq); + if (err) + return err; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 9727a40e1961..26d35a2653dc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -2222,7 +2222,7 @@ static int mlx5e_alloc_cq(struct mlx5_core_dev *mdev, + param->eq_ix = ccp->ix; + + err = mlx5e_alloc_cq_common(mdev, ccp->netdev, ccp->wq, +- mdev->priv.bfreg.up, param, cq); ++ ccp->uar, param, cq); + + cq->napi = ccp->napi; + cq->ch_stats = ccp->ch_stats; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1549-net-mlx5e-use-the-num-doorbells-devlink-param.patch b/SOURCES/1549-net-mlx5e-use-the-num-doorbells-devlink-param.patch new file mode 100644 index 000000000..f5caf4c48 --- /dev/null +++ b/SOURCES/1549-net-mlx5e-use-the-num-doorbells-devlink-param.patch @@ -0,0 +1,140 @@ +From 2568919a7375e71e88c7ea83576fbea2a5a76f5b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:28 -0400 +Subject: [PATCH] net/mlx5e: Use the 'num_doorbells' devlink param + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 11bbcfb7668c6f4d97260f7caaefea22678bc31e +Author: Cosmin Ratiu +Date: Tue Sep 16 17:11:44 2025 +0300 + + net/mlx5e: Use the 'num_doorbells' devlink param + + Use the new devlink param to control how many doorbells mlx5e devices + allocate and use. The maximum number of doorbells configurable is capped + to the maximum number of channels. This only applies to the Ethernet + part, the RDMA devices using mlx5 manage their own doorbells. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst +index 60cc9fedf1ef..41c9b716699e 100644 +--- a/Documentation/networking/devlink/mlx5.rst ++++ b/Documentation/networking/devlink/mlx5.rst +@@ -62,6 +62,15 @@ Note: permanent parameters such as ``enable_sriov`` and ``total_vfs`` require FW + echo 1 >/sys/bus/pci/rescan + grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_* + ++ * - ``num_doorbells`` ++ - driverinit ++ - This controls the number of channel doorbells used by the netdev. In all ++ cases, an additional doorbell is allocated and used for non-channel ++ communication (e.g. for PTP, HWS, etc.). Supported values are: ++ ++ - 0: No channel-specific doorbells, use the global one for everything. ++ - [1, max_num_channels]: Spread netdev channels equally across these ++ doorbells. + + The ``mlx5`` driver also implements the following driver-specific + parameters. +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index 18ef8404e5e6..e8ce011f2464 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -530,6 +530,25 @@ mlx5_devlink_hairpin_queue_size_validate(struct devlink *devlink, u32 id, + return 0; + } + ++static int mlx5_devlink_num_doorbells_validate(struct devlink *devlink, u32 id, ++ union devlink_param_value val, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5_core_dev *mdev = devlink_priv(devlink); ++ u32 val32 = val.vu32; ++ u32 max_num_channels; ++ ++ max_num_channels = mlx5e_get_max_num_channels(mdev); ++ if (val32 > max_num_channels) { ++ NL_SET_ERR_MSG_FMT_MOD(extack, ++ "Requested num_doorbells (%u) exceeds maximum number of channels (%u)", ++ val32, max_num_channels); ++ return -EINVAL; ++ } ++ ++ return 0; ++} ++ + static void mlx5_devlink_hairpin_params_init_values(struct devlink *devlink) + { + struct mlx5_core_dev *dev = devlink_priv(devlink); +@@ -609,6 +628,9 @@ static const struct devlink_param mlx5_devlink_eth_params[] = { + "hairpin_queue_size", DEVLINK_PARAM_TYPE_U32, + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, + mlx5_devlink_hairpin_queue_size_validate), ++ DEVLINK_PARAM_GENERIC(NUM_DOORBELLS, ++ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, ++ mlx5_devlink_num_doorbells_validate), + }; + + static int mlx5_devlink_eth_params_register(struct devlink *devlink) +@@ -632,6 +654,10 @@ static int mlx5_devlink_eth_params_register(struct devlink *devlink) + + mlx5_devlink_hairpin_params_init_values(devlink); + ++ value.vu32 = MLX5_DEFAULT_NUM_DOORBELLS; ++ devl_param_driverinit_value_set(devlink, ++ DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS, ++ value); + return 0; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +index d13cebbc763a..96b744ceaf13 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +@@ -30,6 +30,7 @@ + * SOFTWARE. + */ + ++#include "devlink.h" + #include "en.h" + #include "lib/crypto.h" + +@@ -140,6 +141,18 @@ static int mlx5e_create_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PORT + return err; + } + ++static unsigned int ++mlx5e_get_devlink_param_num_doorbells(struct mlx5_core_dev *dev) ++{ ++ const u32 param_id = DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS; ++ struct devlink *devlink = priv_to_devlink(dev); ++ union devlink_param_value val; ++ int err; ++ ++ err = devl_param_driverinit_value_get(devlink, param_id, &val); ++ return err ? MLX5_DEFAULT_NUM_DOORBELLS : val.vu32; ++} ++ + int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + { + struct mlx5e_hw_objs *res = &mdev->mlx5e_res.hw_objs; +@@ -164,7 +177,7 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + goto err_dealloc_transport_domain; + } + +- num_doorbells = min(MLX5_DEFAULT_NUM_DOORBELLS, ++ num_doorbells = min(mlx5e_get_devlink_param_num_doorbells(mdev), + mlx5e_get_max_num_channels(mdev)); + res->bfregs = kcalloc(num_doorbells, sizeof(*res->bfregs), GFP_KERNEL); + if (!res->bfregs) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1550-net-mlx5e-use-unsigned-for-mlx5e-get-max-num-channels.patch b/SOURCES/1550-net-mlx5e-use-unsigned-for-mlx5e-get-max-num-channels.patch new file mode 100644 index 000000000..c890fad2a --- /dev/null +++ b/SOURCES/1550-net-mlx5e-use-unsigned-for-mlx5e-get-max-num-channels.patch @@ -0,0 +1,50 @@ +From d9239efd2beaaadb2ff5f66de134e261d9967d9f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 09:46:23 -0400 +Subject: [PATCH] net/mlx5e: Use unsigned for mlx5e_get_max_num_channels +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 57a94d4b22b0c6cc5d601e6b6238d78fb923d991 +Author: Cosmin Ratiu +Date: Wed Feb 18 09:29:04 2026 +0200 + + net/mlx5e: Use unsigned for mlx5e_get_max_num_channels + + The max number of channels is always an unsigned int, use the correct + type to fix compilation errors done with strict type checking, e.g.: + + error: call to ‘__compiletime_assert_1110’ declared with attribute + error: min(mlx5e_get_devlink_param_num_doorbells(mdev), + mlx5e_get_max_num_channels(mdev)) signedness error + + Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels") + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Jacob Keller + Link: https://patch.msgid.link/20260218072904.1764634-7-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 9098e526fbb6..964df2d545b0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -178,7 +178,8 @@ static inline u16 mlx5_min_rx_wqes(int wq_type, u32 wq_size) + } + + /* Use this function to get max num channels (rxqs/txqs) only to create netdev */ +-static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev) ++static inline unsigned int ++mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev) + { + return is_kdump_kernel() ? + MLX5E_MIN_NUM_CHANNELS : +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1551-net-mlx5-add-uar-access-and-odp-page-fault-counters.patch b/SOURCES/1551-net-mlx5-add-uar-access-and-odp-page-fault-counters.patch new file mode 100644 index 000000000..52b8f9c91 --- /dev/null +++ b/SOURCES/1551-net-mlx5-add-uar-access-and-odp-page-fault-counters.patch @@ -0,0 +1,58 @@ +From e4d710c6703746ce3e88be7f9b008c5017da77a9 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:28 -0400 +Subject: [PATCH] net/mlx5: Add uar access and odp page fault counters + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a3d076b0567e729d5f21a95525c4d096b1f59e79 +Author: Akiva Goldberger +Date: Wed Sep 17 16:27:58 2025 +0300 + + net/mlx5: Add uar access and odp page fault counters + + Add bar_uar_access, odp_local_triggered_page_fault, and + odp_remote_triggered_page_fault counters to the query_vnic_env command. + Additionally, add corresponding capabilities bits to the HCA CAP. + + Signed-off-by: Akiva Goldberger + Reviewed-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758115678-643464-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 961e9c76c6c5..0ef2af28d424 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -1957,7 +1957,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { + u8 log_max_rqt[0x5]; + u8 reserved_at_390[0x3]; + u8 log_max_rqt_size[0x5]; +- u8 reserved_at_398[0x3]; ++ u8 reserved_at_398[0x1]; ++ u8 vnic_env_cnt_bar_uar_access[0x1]; ++ u8 vnic_env_cnt_odp_page_fault[0x1]; + u8 log_max_tis_per_sq[0x5]; + + u8 ext_stride_num_range[0x1]; +@@ -4018,7 +4020,13 @@ struct mlx5_ifc_vnic_diagnostic_statistics_bits { + + u8 handled_pkt_steering_fail[0x40]; + +- u8 reserved_at_360[0xc80]; ++ u8 bar_uar_access[0x20]; ++ ++ u8 odp_local_triggered_page_fault[0x20]; ++ ++ u8 odp_remote_triggered_page_fault[0x20]; ++ ++ u8 reserved_at_3c0[0xc20]; + }; + + struct mlx5_ifc_traffic_counter_bits { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1552-net-mlx5-change-ttc-rules-to-match-on-undecrypted-esp-packet.patch b/SOURCES/1552-net-mlx5-change-ttc-rules-to-match-on-undecrypted-esp-packet.patch new file mode 100644 index 000000000..ee16ce977 --- /dev/null +++ b/SOURCES/1552-net-mlx5-change-ttc-rules-to-match-on-undecrypted-esp-packet.patch @@ -0,0 +1,345 @@ +From 97fb90a6ac2428b12dbb43857e127ffdf661358c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:29 -0400 +Subject: [PATCH] net/mlx5: Change TTC rules to match on undecrypted ESP + packets + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9f24f0c4d4ddbd207e655697e78ef67a0374a481 +Author: Jianbo Liu +Date: Thu Sep 18 10:19:20 2025 +0300 + + net/mlx5: Change TTC rules to match on undecrypted ESP packets + + The TTC (Traffic Type Classifier) table classifies the traffic and + steers packet to TIRs, where RSS works based on the hash calculated + from the selected packet fields. For AH/ESP packets, SPI and IP + addresses are the fields used to calculate the hash value for RSS. So, + it's hard to distribute packets to different receiving queues as there + is usually only one SPI in that direction. + + IPSec hardware offloads, crypto offload and full (packet) offload were + introduced later. For crypto offload, hardware does encryption, + decryption and authentication, kernel does the others. Kernel always + sends/receives formatted ESP packets with plaintext data instead of + the ciphertext data, all other fields are unmodified. For full + offload, hardware will take care of almost everything, kernel just + sends/receives packets without any IPSec headers. + + Currently, all packets with ESP protocols are forwarded to IPSec + offload tables if IPSec rules are configured. In a downstream patch, + the decrypted packets will be recirculated to TTC table, in order to + use RSS, which does the hash on L4 fields after IPSec headers are + stripped by full offload. So those packets handled by crypto offload + must filtered out, as they still have the ESP headers, but apparently + no need to be decrypted again. To do that, ipsec_next_header is added + for the packet matching, as it is valid only after passing through + IPSec decryption. + + Signed-off-by: Jianbo Liu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758179963-649455-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +index ac65e3191480..3fc093ec1f50 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +@@ -132,7 +132,8 @@ struct mlx5e_ptp_fs; + + void mlx5e_set_ttc_params(struct mlx5e_flow_steering *fs, + struct mlx5e_rx_res *rx_res, +- struct ttc_params *ttc_params, bool tunnel); ++ struct ttc_params *ttc_params, bool tunnel, ++ bool ipsec_rss); + + void mlx5e_destroy_ttc_table(struct mlx5e_flow_steering *fs); + int mlx5e_create_ttc_table(struct mlx5e_flow_steering *fs, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +index 537e732085b2..701147d5e2e1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +@@ -912,7 +912,8 @@ static void mlx5e_set_inner_ttc_params(struct mlx5e_flow_steering *fs, + + void mlx5e_set_ttc_params(struct mlx5e_flow_steering *fs, + struct mlx5e_rx_res *rx_res, +- struct ttc_params *ttc_params, bool tunnel) ++ struct ttc_params *ttc_params, bool tunnel, ++ bool ipsec_rss) + + { + struct mlx5_flow_table_attr *ft_attr = &ttc_params->ft_attr; +@@ -923,6 +924,9 @@ void mlx5e_set_ttc_params(struct mlx5e_flow_steering *fs, + ft_attr->level = MLX5E_TTC_FT_LEVEL; + ft_attr->prio = MLX5E_NIC_PRIO; + ++ ttc_params->ipsec_rss = ipsec_rss && ++ MLX5_CAP_NIC_RX_FT_FIELD_SUPPORT_2(fs->mdev, ipsec_next_header); ++ + for (tt = 0; tt < MLX5_NUM_TT; tt++) { + ttc_params->dests[tt].type = MLX5_FLOW_DESTINATION_TYPE_TIR; + ttc_params->dests[tt].tir_num = +@@ -1289,7 +1293,7 @@ int mlx5e_create_ttc_table(struct mlx5e_flow_steering *fs, + { + struct ttc_params ttc_params = {}; + +- mlx5e_set_ttc_params(fs, rx_res, &ttc_params, true); ++ mlx5e_set_ttc_params(fs, rx_res, &ttc_params, true, true); + fs->ttc = mlx5_create_ttc_table(fs->mdev, &ttc_params); + return PTR_ERR_OR_ZERO(fs->ttc); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +index 5766be2c0153..2ce31ebd70d3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +@@ -971,7 +971,7 @@ static int mlx5e_create_rep_ttc_table(struct mlx5e_priv *priv) + MLX5_FLOW_NAMESPACE_KERNEL), false); + + /* The inner_ttc in the ttc params is intentionally not set */ +- mlx5e_set_ttc_params(priv->fs, priv->rx_res, &ttc_params, false); ++ mlx5e_set_ttc_params(priv->fs, priv->rx_res, &ttc_params, false, false); + + if (rep->vport != MLX5_VPORT_UPLINK) + /* To give uplik rep TTC a lower level for chaining from root ft */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c +index ca9ecec358b2..850fff4548c8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c +@@ -9,7 +9,7 @@ + #include "mlx5_core.h" + #include "lib/fs_ttc.h" + +-#define MLX5_TTC_MAX_NUM_GROUPS 4 ++#define MLX5_TTC_MAX_NUM_GROUPS 5 + #define MLX5_TTC_GROUP_TCPUDP_SIZE (MLX5_TT_IPV6_UDP + 1) + + struct mlx5_fs_ttc_groups { +@@ -31,6 +31,7 @@ static int mlx5_fs_ttc_table_size(const struct mlx5_fs_ttc_groups *groups) + /* L3/L4 traffic type classifier */ + struct mlx5_ttc_table { + int num_groups; ++ const struct mlx5_fs_ttc_groups *groups; + struct mlx5_flow_table *t; + struct mlx5_flow_group **g; + struct mlx5_ttc_rule rules[MLX5_NUM_TT]; +@@ -163,6 +164,8 @@ static struct mlx5_etype_proto ttc_tunnel_rules[] = { + enum TTC_GROUP_TYPE { + TTC_GROUPS_DEFAULT = 0, + TTC_GROUPS_USE_L4_TYPE = 1, ++ TTC_GROUPS_DEFAULT_ESP = 2, ++ TTC_GROUPS_USE_L4_TYPE_ESP = 3, + }; + + static const struct mlx5_fs_ttc_groups ttc_groups[] = { +@@ -184,6 +187,27 @@ static const struct mlx5_fs_ttc_groups ttc_groups[] = { + BIT(0), + }, + }, ++ [TTC_GROUPS_DEFAULT_ESP] = { ++ .num_groups = 4, ++ .group_size = { ++ MLX5_TTC_GROUP_TCPUDP_SIZE + BIT(1) + ++ MLX5_NUM_TUNNEL_TT, ++ BIT(1), /* ESP */ ++ BIT(1), ++ BIT(0), ++ }, ++ }, ++ [TTC_GROUPS_USE_L4_TYPE_ESP] = { ++ .use_l4_type = true, ++ .num_groups = 5, ++ .group_size = { ++ MLX5_TTC_GROUP_TCPUDP_SIZE, ++ BIT(1) + MLX5_NUM_TUNNEL_TT, ++ BIT(1), /* ESP */ ++ BIT(1), ++ BIT(0), ++ }, ++ }, + }; + + static const struct mlx5_fs_ttc_groups inner_ttc_groups[] = { +@@ -207,6 +231,23 @@ static const struct mlx5_fs_ttc_groups inner_ttc_groups[] = { + }, + }; + ++static const struct mlx5_fs_ttc_groups * ++mlx5_ttc_get_fs_groups(bool use_l4_type, bool ipsec_rss) ++{ ++ if (!ipsec_rss) ++ return use_l4_type ? &ttc_groups[TTC_GROUPS_USE_L4_TYPE] : ++ &ttc_groups[TTC_GROUPS_DEFAULT]; ++ ++ return use_l4_type ? &ttc_groups[TTC_GROUPS_USE_L4_TYPE_ESP] : ++ &ttc_groups[TTC_GROUPS_DEFAULT_ESP]; ++} ++ ++bool mlx5_ttc_has_esp_flow_group(struct mlx5_ttc_table *ttc) ++{ ++ return ttc->groups == &ttc_groups[TTC_GROUPS_DEFAULT_ESP] || ++ ttc->groups == &ttc_groups[TTC_GROUPS_USE_L4_TYPE_ESP]; ++} ++ + u8 mlx5_get_proto_by_tunnel_type(enum mlx5_tunnel_types tt) + { + return ttc_tunnel_rules[tt].proto; +@@ -279,7 +320,7 @@ static void mlx5_fs_ttc_set_match_proto(void *headers_c, void *headers_v, + static struct mlx5_flow_handle * + mlx5_generate_ttc_rule(struct mlx5_core_dev *dev, struct mlx5_flow_table *ft, + struct mlx5_flow_destination *dest, u16 etype, u8 proto, +- bool use_l4_type) ++ bool use_l4_type, bool ipsec_rss) + { + int match_ipv_outer = + MLX5_CAP_FLOWTABLE_NIC_RX(dev, +@@ -316,6 +357,14 @@ mlx5_generate_ttc_rule(struct mlx5_core_dev *dev, struct mlx5_flow_table *ft, + MLX5_SET(fte_match_param, spec->match_value, outer_headers.ethertype, etype); + } + ++ if (ipsec_rss && proto == IPPROTO_ESP) { ++ MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, ++ misc_parameters_2.ipsec_next_header); ++ MLX5_SET(fte_match_param, spec->match_value, ++ misc_parameters_2.ipsec_next_header, 0); ++ spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2; ++ } ++ + rule = mlx5_add_flow_rules(ft, spec, &flow_act, dest, 1); + if (IS_ERR(rule)) { + err = PTR_ERR(rule); +@@ -347,7 +396,8 @@ static int mlx5_generate_ttc_table_rules(struct mlx5_core_dev *dev, + rule->rule = mlx5_generate_ttc_rule(dev, ft, ¶ms->dests[tt], + ttc_rules[tt].etype, + ttc_rules[tt].proto, +- use_l4_type); ++ use_l4_type, ++ params->ipsec_rss); + if (IS_ERR(rule->rule)) { + err = PTR_ERR(rule->rule); + rule->rule = NULL; +@@ -370,7 +420,7 @@ static int mlx5_generate_ttc_table_rules(struct mlx5_core_dev *dev, + ¶ms->tunnel_dests[tt], + ttc_tunnel_rules[tt].etype, + ttc_tunnel_rules[tt].proto, +- use_l4_type); ++ use_l4_type, false); + if (IS_ERR(trules[tt])) { + err = PTR_ERR(trules[tt]); + trules[tt] = NULL; +@@ -385,10 +435,38 @@ static int mlx5_generate_ttc_table_rules(struct mlx5_core_dev *dev, + return err; + } + ++static int mlx5_create_ttc_table_ipsec_groups(struct mlx5_ttc_table *ttc, ++ u32 *in, int *next_ix) ++{ ++ u8 *mc = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria); ++ const struct mlx5_fs_ttc_groups *groups = ttc->groups; ++ int ix = *next_ix; ++ ++ /* undecrypted ESP group */ ++ MLX5_SET_CFG(in, match_criteria_enable, ++ MLX5_MATCH_OUTER_HEADERS | MLX5_MATCH_MISC_PARAMETERS_2); ++ MLX5_SET_TO_ONES(fte_match_param, mc, ++ misc_parameters_2.ipsec_next_header); ++ MLX5_SET_CFG(in, start_flow_index, ix); ++ ix += groups->group_size[ttc->num_groups]; ++ MLX5_SET_CFG(in, end_flow_index, ix - 1); ++ ttc->g[ttc->num_groups] = mlx5_create_flow_group(ttc->t, in); ++ if (IS_ERR(ttc->g[ttc->num_groups])) ++ goto err; ++ ttc->num_groups++; ++ ++ *next_ix = ix; ++ ++ return 0; ++ ++err: ++ return PTR_ERR(ttc->g[ttc->num_groups]); ++} ++ + static int mlx5_create_ttc_table_groups(struct mlx5_ttc_table *ttc, +- bool use_ipv, +- const struct mlx5_fs_ttc_groups *groups) ++ bool use_ipv) + { ++ const struct mlx5_fs_ttc_groups *groups = ttc->groups; + int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); + int ix = 0; + u32 *in; +@@ -436,8 +514,18 @@ static int mlx5_create_ttc_table_groups(struct mlx5_ttc_table *ttc, + goto err; + ttc->num_groups++; + ++ if (mlx5_ttc_has_esp_flow_group(ttc)) { ++ err = mlx5_create_ttc_table_ipsec_groups(ttc, in, &ix); ++ if (err) ++ goto err; ++ ++ MLX5_SET(fte_match_param, mc, ++ misc_parameters_2.ipsec_next_header, 0); ++ } ++ + /* L3 Group */ + MLX5_SET(fte_match_param, mc, outer_headers.ip_protocol, 0); ++ MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS); + MLX5_SET_CFG(in, start_flow_index, ix); + ix += groups->group_size[ttc->num_groups]; + MLX5_SET_CFG(in, end_flow_index, ix - 1); +@@ -709,7 +797,6 @@ struct mlx5_ttc_table *mlx5_create_ttc_table(struct mlx5_core_dev *dev, + bool match_ipv_outer = + MLX5_CAP_FLOWTABLE_NIC_RX(dev, + ft_field_support.outer_ip_version); +- const struct mlx5_fs_ttc_groups *groups; + struct mlx5_flow_namespace *ns; + struct mlx5_ttc_table *ttc; + bool use_l4_type; +@@ -738,11 +825,10 @@ struct mlx5_ttc_table *mlx5_create_ttc_table(struct mlx5_core_dev *dev, + return ERR_PTR(-EOPNOTSUPP); + } + +- groups = use_l4_type ? &ttc_groups[TTC_GROUPS_USE_L4_TYPE] : +- &ttc_groups[TTC_GROUPS_DEFAULT]; ++ ttc->groups = mlx5_ttc_get_fs_groups(use_l4_type, params->ipsec_rss); + + WARN_ON_ONCE(params->ft_attr.max_fte); +- params->ft_attr.max_fte = mlx5_fs_ttc_table_size(groups); ++ params->ft_attr.max_fte = mlx5_fs_ttc_table_size(ttc->groups); + ttc->t = mlx5_create_flow_table(ns, ¶ms->ft_attr); + if (IS_ERR(ttc->t)) { + err = PTR_ERR(ttc->t); +@@ -750,7 +836,7 @@ struct mlx5_ttc_table *mlx5_create_ttc_table(struct mlx5_core_dev *dev, + return ERR_PTR(err); + } + +- err = mlx5_create_ttc_table_groups(ttc, match_ipv_outer, groups); ++ err = mlx5_create_ttc_table_groups(ttc, match_ipv_outer); + if (err) + goto destroy_ft; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h +index ab9434fe3ae6..aead62441550 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h +@@ -47,6 +47,7 @@ struct ttc_params { + bool inner_ttc; + DECLARE_BITMAP(ignore_tunnel_dests, MLX5_NUM_TUNNEL_TT); + struct mlx5_flow_destination tunnel_dests[MLX5_NUM_TUNNEL_TT]; ++ bool ipsec_rss; + }; + + const char *mlx5_ttc_get_name(enum mlx5_traffic_types tt); +@@ -70,4 +71,6 @@ int mlx5_ttc_fwd_default_dest(struct mlx5_ttc_table *ttc, + bool mlx5_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev); + u8 mlx5_get_proto_by_tunnel_type(enum mlx5_tunnel_types tt); + ++bool mlx5_ttc_has_esp_flow_group(struct mlx5_ttc_table *ttc); ++ + #endif /* __MLX5_FS_TTC_H__ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1553-net-mlx5e-recirculate-decrypted-packets-into-ttc-table.patch b/SOURCES/1553-net-mlx5e-recirculate-decrypted-packets-into-ttc-table.patch new file mode 100644 index 000000000..f1bf6ca98 --- /dev/null +++ b/SOURCES/1553-net-mlx5e-recirculate-decrypted-packets-into-ttc-table.patch @@ -0,0 +1,121 @@ +From 07129dc1f84040e1a13f973ccd55d90763e0715e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:29 -0400 +Subject: [PATCH] net/mlx5e: Recirculate decrypted packets into TTC table + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c69ac57199eac5259a715314a5edeb4c30925934 +Author: Jianbo Liu +Date: Thu Sep 18 10:19:21 2025 +0300 + + net/mlx5e: Recirculate decrypted packets into TTC table + + In the commit 5e466345291a ("net/mlx5e: IPsec: Add IPsec steering in + local NIC RX"), the decrypted packets are handled in RX error flow + table. There is only one rule in the table, which forwards packets to + the default ESP TIR. + + This patch updates the design to allow RSS after decryption. For ESP + traffic, SPI and IP addresses are the fields selected for RSS hash, + and it's common that only one SPI is configured in RX direction, so + RSS can't work properly as all the packets are hashed to one key in + this case. To take advantage of RSS and improve performance, the + decrypted packets need to be forwarded back to TTC table, where RSS + can work based on the decrypted packet types. + + Signed-off-by: Jianbo Liu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758179963-649455-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index 65dc3529283b..417c8b654880 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -585,6 +585,20 @@ static int ipsec_miss_create(struct mlx5_core_dev *mdev, + return err; + } + ++static struct mlx5_flow_destination ++ipsec_rx_decrypted_pkt_def_dest(struct mlx5_ttc_table *ttc, u32 family) ++{ ++ struct mlx5_flow_destination dest; ++ ++ if (!mlx5_ttc_has_esp_flow_group(ttc)) ++ return mlx5_ttc_get_default_dest(ttc, family2tt(family)); ++ ++ dest.ft = mlx5_get_ttc_flow_table(ttc); ++ dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; ++ ++ return dest; ++} ++ + static void ipsec_rx_update_default_dest(struct mlx5e_ipsec_rx *rx, + struct mlx5_flow_destination *old_dest, + struct mlx5_flow_destination *new_dest) +@@ -598,10 +612,10 @@ static void handle_ipsec_rx_bringup(struct mlx5e_ipsec *ipsec, u32 family) + { + struct mlx5e_ipsec_rx *rx = ipsec_rx(ipsec, family, XFRM_DEV_OFFLOAD_PACKET); + struct mlx5_flow_namespace *ns = mlx5e_fs_get_ns(ipsec->fs, false); ++ struct mlx5_ttc_table *ttc = mlx5e_fs_get_ttc(ipsec->fs, false); + struct mlx5_flow_destination old_dest, new_dest; + +- old_dest = mlx5_ttc_get_default_dest(mlx5e_fs_get_ttc(ipsec->fs, false), +- family2tt(family)); ++ old_dest = ipsec_rx_decrypted_pkt_def_dest(ttc, family); + + mlx5_ipsec_fs_roce_rx_create(ipsec->mdev, ipsec->roce, ns, &old_dest, family, + MLX5E_ACCEL_FS_ESP_FT_ROCE_LEVEL, MLX5E_NIC_PRIO); +@@ -614,12 +628,12 @@ static void handle_ipsec_rx_bringup(struct mlx5e_ipsec *ipsec, u32 family) + static void handle_ipsec_rx_cleanup(struct mlx5e_ipsec *ipsec, u32 family) + { + struct mlx5e_ipsec_rx *rx = ipsec_rx(ipsec, family, XFRM_DEV_OFFLOAD_PACKET); ++ struct mlx5_ttc_table *ttc = mlx5e_fs_get_ttc(ipsec->fs, false); + struct mlx5_flow_destination old_dest, new_dest; + + old_dest.ft = mlx5_ipsec_fs_roce_ft_get(ipsec->roce, family); + old_dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; +- new_dest = mlx5_ttc_get_default_dest(mlx5e_fs_get_ttc(ipsec->fs, false), +- family2tt(family)); ++ new_dest = ipsec_rx_decrypted_pkt_def_dest(ttc, family); + ipsec_rx_update_default_dest(rx, &old_dest, &new_dest); + + mlx5_ipsec_fs_roce_rx_destroy(ipsec->roce, family, ipsec->mdev); +@@ -764,7 +778,7 @@ static int ipsec_rx_status_pass_dest_get(struct mlx5e_ipsec *ipsec, + if (rx == ipsec->rx_esw) + return mlx5_esw_ipsec_rx_status_pass_dest_get(ipsec, dest); + +- *dest = mlx5_ttc_get_default_dest(attr->ttc, family2tt(attr->family)); ++ *dest = ipsec_rx_decrypted_pkt_def_dest(attr->ttc, attr->family); + err = mlx5_ipsec_fs_roce_rx_create(ipsec->mdev, ipsec->roce, attr->ns, dest, + attr->family, MLX5E_ACCEL_FS_ESP_FT_ROCE_LEVEL, + attr->prio); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c +index b7d4b1a2baf2..d524f0220513 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c +@@ -164,6 +164,8 @@ ipsec_fs_roce_rx_rule_setup(struct mlx5_core_dev *mdev, + roce->rule = rule; + + memset(spec, 0, sizeof(*spec)); ++ if (default_dst->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE) ++ flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL; + rule = mlx5_add_flow_rules(roce->ft, spec, &flow_act, default_dst, 1); + if (IS_ERR(rule)) { + err = PTR_ERR(rule); +@@ -178,6 +180,8 @@ ipsec_fs_roce_rx_rule_setup(struct mlx5_core_dev *mdev, + goto out; + + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; ++ if (default_dst->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE) ++ flow_act.flags &= ~FLOW_ACT_IGNORE_FLOW_LEVEL; + dst.type = MLX5_FLOW_DESTINATION_TYPE_TABLE_TYPE; + dst.ft = roce->ft_rdma; + rule = mlx5_add_flow_rules(roce->nic_master_ft, NULL, &flow_act, &dst, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1554-net-mlx5e-add-flow-groups-for-the-packets-decrypted-by-crypt.patch b/SOURCES/1554-net-mlx5e-add-flow-groups-for-the-packets-decrypted-by-crypt.patch new file mode 100644 index 000000000..a244f9f9d --- /dev/null +++ b/SOURCES/1554-net-mlx5e-add-flow-groups-for-the-packets-decrypted-by-crypt.patch @@ -0,0 +1,275 @@ +From 7f11b0bbd0d16e938d09619d971b88504bdd4dd1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:29 -0400 +Subject: [PATCH] net/mlx5e: Add flow groups for the packets decrypted by + crypto offload + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d8693cac22c7fa7ef13c836327f1720d3fe414f6 +Author: Jianbo Liu +Date: Thu Sep 18 10:19:22 2025 +0300 + + net/mlx5e: Add flow groups for the packets decrypted by crypto offload + + When using IPsec crypto offload, the hardware decrypts the packet + payload but preserves the ESP header. This prevents the standard RSS + mechanism from accessing the inner L4 (TCP/UDP) headers. As a result, + the RSS hash is calculated only on the outer L3 IP headers, causing + all traffic for a given IPsec tunnel to be directed to a single queue, + leading to poor traffic distribution. + + Newer firmware introduces the ability to match on l4_type_ext, which + exposes the L4 protocol type following an ESP header. This allows the + driver to create steering rules that can identify the inner protocols + of decrypted packets. + + This commit leverages this new capability to improve traffic + distribution. It adds two new flow groups to steer decrypted packets + to dedicated TIRs that was configured to perform RSS on the inner L4 + headers. + + These groups are inserted after the standard L4 group and before the + group that handles undecrypted ESP packets added in this series. The + first new group matches decrypted packets based on the outer IP + version (or ethertype) and l4_type_ext. The second new group matches + decrypted tunneled packets based on the inner IP version and + l4_type_ext. Eight new traffic types are also defined to support this + functionality. + + Signed-off-by: Jianbo Liu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758179963-649455-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +index 3fc093ec1f50..eb142f358470 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +@@ -57,7 +57,7 @@ struct mlx5e_l2_table { + bool promisc_enabled; + }; + +-#define MLX5E_NUM_INDIR_TIRS (MLX5_NUM_TT - 1) ++#define MLX5E_NUM_INDIR_TIRS (MLX5_NUM_INDIR_TIRS) + + #define MLX5_HASH_IP (MLX5_HASH_FIELD_SEL_SRC_IP |\ + MLX5_HASH_FIELD_SEL_DST_IP) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +index 701147d5e2e1..b18ef92837c1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +@@ -901,6 +901,9 @@ static void mlx5e_set_inner_ttc_params(struct mlx5e_flow_steering *fs, + ft_attr->prio = MLX5E_NIC_PRIO; + + for (tt = 0; tt < MLX5_NUM_TT; tt++) { ++ if (mlx5_ttc_is_decrypted_esp_tt(tt)) ++ continue; ++ + ttc_params->dests[tt].type = MLX5_FLOW_DESTINATION_TYPE_TIR; + ttc_params->dests[tt].tir_num = + tt == MLX5_TT_ANY ? +@@ -910,6 +913,13 @@ static void mlx5e_set_inner_ttc_params(struct mlx5e_flow_steering *fs, + } + } + ++static bool mlx5e_ipsec_rss_supported(struct mlx5_core_dev *mdev) ++{ ++ return MLX5_CAP_NIC_RX_FT_FIELD_SUPPORT_2(mdev, ipsec_next_header) && ++ MLX5_CAP_NIC_RX_FT_FIELD_SUPPORT_2(mdev, outer_l4_type_ext) && ++ MLX5_CAP_NIC_RX_FT_FIELD_SUPPORT_2(mdev, inner_l4_type_ext); ++} ++ + void mlx5e_set_ttc_params(struct mlx5e_flow_steering *fs, + struct mlx5e_rx_res *rx_res, + struct ttc_params *ttc_params, bool tunnel, +@@ -925,9 +935,12 @@ void mlx5e_set_ttc_params(struct mlx5e_flow_steering *fs, + ft_attr->prio = MLX5E_NIC_PRIO; + + ttc_params->ipsec_rss = ipsec_rss && +- MLX5_CAP_NIC_RX_FT_FIELD_SUPPORT_2(fs->mdev, ipsec_next_header); ++ mlx5e_ipsec_rss_supported(fs->mdev); + + for (tt = 0; tt < MLX5_NUM_TT; tt++) { ++ if (mlx5_ttc_is_decrypted_esp_tt(tt)) ++ continue; ++ + ttc_params->dests[tt].type = MLX5_FLOW_DESTINATION_TYPE_TIR; + ttc_params->dests[tt].tir_num = + tt == MLX5_TT_ANY ? +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index 1ddefeeeca01..e1599817c3b2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -838,6 +838,9 @@ static void mlx5e_hairpin_set_ttc_params(struct mlx5e_hairpin *hp, + + ttc_params->ns_type = MLX5_FLOW_NAMESPACE_KERNEL; + for (tt = 0; tt < MLX5_NUM_TT; tt++) { ++ if (mlx5_ttc_is_decrypted_esp_tt(tt)) ++ continue; ++ + ttc_params->dests[tt].type = MLX5_FLOW_DESTINATION_TYPE_TIR; + ttc_params->dests[tt].tir_num = + tt == MLX5_TT_ANY ? +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c +index 850fff4548c8..3cd5de6f714f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c +@@ -9,7 +9,7 @@ + #include "mlx5_core.h" + #include "lib/fs_ttc.h" + +-#define MLX5_TTC_MAX_NUM_GROUPS 5 ++#define MLX5_TTC_MAX_NUM_GROUPS 7 + #define MLX5_TTC_GROUP_TCPUDP_SIZE (MLX5_TT_IPV6_UDP + 1) + + struct mlx5_fs_ttc_groups { +@@ -188,10 +188,12 @@ static const struct mlx5_fs_ttc_groups ttc_groups[] = { + }, + }, + [TTC_GROUPS_DEFAULT_ESP] = { +- .num_groups = 4, ++ .num_groups = 6, + .group_size = { + MLX5_TTC_GROUP_TCPUDP_SIZE + BIT(1) + + MLX5_NUM_TUNNEL_TT, ++ BIT(2), /* decrypted outer L4 */ ++ BIT(2), /* decrypted inner L4 */ + BIT(1), /* ESP */ + BIT(1), + BIT(0), +@@ -199,10 +201,12 @@ static const struct mlx5_fs_ttc_groups ttc_groups[] = { + }, + [TTC_GROUPS_USE_L4_TYPE_ESP] = { + .use_l4_type = true, +- .num_groups = 5, ++ .num_groups = 7, + .group_size = { + MLX5_TTC_GROUP_TCPUDP_SIZE, + BIT(1) + MLX5_NUM_TUNNEL_TT, ++ BIT(2), /* decrypted outer L4 */ ++ BIT(2), /* decrypted inner L4 */ + BIT(1), /* ESP */ + BIT(1), + BIT(0), +@@ -391,6 +395,9 @@ static int mlx5_generate_ttc_table_rules(struct mlx5_core_dev *dev, + for (tt = 0; tt < MLX5_NUM_TT; tt++) { + struct mlx5_ttc_rule *rule = &rules[tt]; + ++ if (mlx5_ttc_is_decrypted_esp_tt(tt)) ++ continue; ++ + if (test_bit(tt, params->ignore_dests)) + continue; + rule->rule = mlx5_generate_ttc_rule(dev, ft, ¶ms->dests[tt], +@@ -436,15 +443,55 @@ static int mlx5_generate_ttc_table_rules(struct mlx5_core_dev *dev, + } + + static int mlx5_create_ttc_table_ipsec_groups(struct mlx5_ttc_table *ttc, ++ bool use_ipv, + u32 *in, int *next_ix) + { + u8 *mc = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria); + const struct mlx5_fs_ttc_groups *groups = ttc->groups; + int ix = *next_ix; + ++ MLX5_SET(fte_match_param, mc, outer_headers.ip_protocol, 0); ++ ++ /* decrypted ESP outer group */ ++ MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS); ++ MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.l4_type_ext); ++ MLX5_SET_CFG(in, start_flow_index, ix); ++ ix += groups->group_size[ttc->num_groups]; ++ MLX5_SET_CFG(in, end_flow_index, ix - 1); ++ ttc->g[ttc->num_groups] = mlx5_create_flow_group(ttc->t, in); ++ if (IS_ERR(ttc->g[ttc->num_groups])) ++ goto err; ++ ttc->num_groups++; ++ ++ MLX5_SET(fte_match_param, mc, outer_headers.l4_type_ext, 0); ++ ++ /* decrypted ESP inner group */ ++ MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_INNER_HEADERS); ++ if (use_ipv) ++ MLX5_SET(fte_match_param, mc, outer_headers.ip_version, 0); ++ else ++ MLX5_SET(fte_match_param, mc, outer_headers.ethertype, 0); ++ MLX5_SET_TO_ONES(fte_match_param, mc, inner_headers.ip_version); ++ MLX5_SET_TO_ONES(fte_match_param, mc, inner_headers.l4_type_ext); ++ MLX5_SET_CFG(in, start_flow_index, ix); ++ ix += groups->group_size[ttc->num_groups]; ++ MLX5_SET_CFG(in, end_flow_index, ix - 1); ++ ttc->g[ttc->num_groups] = mlx5_create_flow_group(ttc->t, in); ++ if (IS_ERR(ttc->g[ttc->num_groups])) ++ goto err; ++ ttc->num_groups++; ++ ++ MLX5_SET(fte_match_param, mc, inner_headers.ip_version, 0); ++ MLX5_SET(fte_match_param, mc, inner_headers.l4_type_ext, 0); ++ + /* undecrypted ESP group */ + MLX5_SET_CFG(in, match_criteria_enable, + MLX5_MATCH_OUTER_HEADERS | MLX5_MATCH_MISC_PARAMETERS_2); ++ if (use_ipv) ++ MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.ip_version); ++ else ++ MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.ethertype); ++ MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.ip_protocol); + MLX5_SET_TO_ONES(fte_match_param, mc, + misc_parameters_2.ipsec_next_header); + MLX5_SET_CFG(in, start_flow_index, ix); +@@ -515,7 +562,7 @@ static int mlx5_create_ttc_table_groups(struct mlx5_ttc_table *ttc, + ttc->num_groups++; + + if (mlx5_ttc_has_esp_flow_group(ttc)) { +- err = mlx5_create_ttc_table_ipsec_groups(ttc, in, &ix); ++ err = mlx5_create_ttc_table_ipsec_groups(ttc, use_ipv, in, &ix); + if (err) + goto err; + +@@ -615,6 +662,9 @@ static int mlx5_generate_inner_ttc_table_rules(struct mlx5_core_dev *dev, + for (tt = 0; tt < MLX5_NUM_TT; tt++) { + struct mlx5_ttc_rule *rule = &rules[tt]; + ++ if (mlx5_ttc_is_decrypted_esp_tt(tt)) ++ continue; ++ + if (test_bit(tt, params->ignore_dests)) + continue; + rule->rule = mlx5_generate_inner_ttc_rule(dev, ft, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h +index aead62441550..cae6a8ba0491 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h +@@ -18,6 +18,14 @@ enum mlx5_traffic_types { + MLX5_TT_IPV4, + MLX5_TT_IPV6, + MLX5_TT_ANY, ++ MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_TCP, ++ MLX5_TT_DECRYPTED_ESP_OUTER_IPV6_TCP, ++ MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_UDP, ++ MLX5_TT_DECRYPTED_ESP_OUTER_IPV6_UDP, ++ MLX5_TT_DECRYPTED_ESP_INNER_IPV4_TCP, ++ MLX5_TT_DECRYPTED_ESP_INNER_IPV6_TCP, ++ MLX5_TT_DECRYPTED_ESP_INNER_IPV4_UDP, ++ MLX5_TT_DECRYPTED_ESP_INNER_IPV6_UDP, + MLX5_NUM_TT, + MLX5_NUM_INDIR_TIRS = MLX5_TT_ANY, + }; +@@ -72,5 +80,10 @@ bool mlx5_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev); + u8 mlx5_get_proto_by_tunnel_type(enum mlx5_tunnel_types tt); + + bool mlx5_ttc_has_esp_flow_group(struct mlx5_ttc_table *ttc); ++static inline bool mlx5_ttc_is_decrypted_esp_tt(enum mlx5_traffic_types tt) ++{ ++ return tt >= MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_TCP && ++ tt <= MLX5_TT_DECRYPTED_ESP_INNER_IPV6_UDP; ++} + + #endif /* __MLX5_FS_TTC_H__ */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1555-net-mlx5e-add-flow-rules-for-the-decrypted-esp-packets.patch b/SOURCES/1555-net-mlx5e-add-flow-rules-for-the-decrypted-esp-packets.patch new file mode 100644 index 000000000..63242f3cb --- /dev/null +++ b/SOURCES/1555-net-mlx5e-add-flow-rules-for-the-decrypted-esp-packets.patch @@ -0,0 +1,402 @@ +From 8d599bd39e82ff0badd6f3b005532853e11fbc27 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:29 -0400 +Subject: [PATCH] net/mlx5e: Add flow rules for the decrypted ESP packets + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 72ed3ebf95a73b3c822ab7efb6a46114672179c5 +Author: Jianbo Liu +Date: Thu Sep 18 10:19:23 2025 +0300 + + net/mlx5e: Add flow rules for the decrypted ESP packets + + The previous commit introduced two new flow groups to enable L4 RSS + for decrypted IPsec traffic. This commit implements the logic to + populate these groups with the necessary steering rules. + + The rules are created dynamically whenever the first IPSec offload + rule is configured via the xfrm subsystem and the decryption tables + for RX are created. Each rule matches a specific decrypted traffic + type based on its ip version (or ethertype) and outer/inner + l4_type_ext, directing it to the appropriate L4 RSS-enabled TIR. + + The lifecycle of these steering rules is tied directly to the RX + tables. They are deleted when the RX tables are destroyed. + + Signed-off-by: Jianbo Liu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758179963-649455-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index 417c8b654880..ef2878f0c20e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -61,6 +61,7 @@ struct mlx5e_ipsec_rx { + struct mlx5_flow_table *pol_miss_ft; + struct mlx5_flow_handle *pol_miss_rule; + u8 allow_tunnel_mode : 1; ++ u8 ttc_rules_added : 1; + }; + + /* IPsec RX flow steering */ +@@ -683,10 +684,13 @@ static void ipsec_mpv_work_handler(struct work_struct *_work) + complete(&work->master_priv->ipsec->comp); + } + +-static void ipsec_rx_ft_disconnect(struct mlx5e_ipsec *ipsec, u32 family) ++static void ipsec_rx_ft_disconnect(struct mlx5e_ipsec *ipsec, ++ struct mlx5e_ipsec_rx *rx, u32 family) + { + struct mlx5_ttc_table *ttc = mlx5e_fs_get_ttc(ipsec->fs, false); + ++ if (rx->ttc_rules_added) ++ mlx5_ttc_destroy_ipsec_rules(ttc); + mlx5_ttc_fwd_default_dest(ttc, family2tt(family)); + } + +@@ -721,7 +725,7 @@ static void rx_destroy(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec, + { + /* disconnect */ + if (rx != ipsec->rx_esw) +- ipsec_rx_ft_disconnect(ipsec, family); ++ ipsec_rx_ft_disconnect(ipsec, rx, family); + + mlx5_del_flow_rules(rx->sa.rule); + mlx5_destroy_flow_group(rx->sa.group); +@@ -821,10 +825,16 @@ static void ipsec_rx_ft_connect(struct mlx5e_ipsec *ipsec, + struct mlx5e_ipsec_rx_create_attr *attr) + { + struct mlx5_flow_destination dest = {}; ++ struct mlx5_ttc_table *ttc, *inner_ttc; + + dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; + dest.ft = rx->ft.sa; +- mlx5_ttc_fwd_dest(attr->ttc, family2tt(attr->family), &dest); ++ if (mlx5_ttc_fwd_dest(attr->ttc, family2tt(attr->family), &dest)) ++ return; ++ ++ ttc = mlx5e_fs_get_ttc(ipsec->fs, false); ++ inner_ttc = mlx5e_fs_get_ttc(ipsec->fs, true); ++ rx->ttc_rules_added = !mlx5_ttc_create_ipsec_rules(ttc, inner_ttc); + } + + static int ipsec_rx_chains_create_miss(struct mlx5e_ipsec *ipsec, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c +index 3cd5de6f714f..7adad784ad46 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.c +@@ -32,10 +32,13 @@ static int mlx5_fs_ttc_table_size(const struct mlx5_fs_ttc_groups *groups) + struct mlx5_ttc_table { + int num_groups; + const struct mlx5_fs_ttc_groups *groups; ++ struct mlx5_core_dev *mdev; + struct mlx5_flow_table *t; + struct mlx5_flow_group **g; + struct mlx5_ttc_rule rules[MLX5_NUM_TT]; + struct mlx5_flow_handle *tunnel_rules[MLX5_NUM_TUNNEL_TT]; ++ u32 refcnt; ++ struct mutex mutex; /* Protect adding rules for ipsec crypto offload */ + }; + + struct mlx5_flow_table *mlx5_get_ttc_flow_table(struct mlx5_ttc_table *ttc) +@@ -302,6 +305,31 @@ static u8 mlx5_etype_to_ipv(u16 ethertype) + return 0; + } + ++static void mlx5_fs_ttc_set_match_ipv_outer(struct mlx5_core_dev *mdev, ++ struct mlx5_flow_spec *spec, ++ u16 etype) ++{ ++ int match_ipv_outer = ++ MLX5_CAP_FLOWTABLE_NIC_RX(mdev, ++ ft_field_support.outer_ip_version); ++ u8 ipv; ++ ++ ipv = mlx5_etype_to_ipv(etype); ++ if (match_ipv_outer && ipv) { ++ MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, ++ outer_headers.ip_version); ++ MLX5_SET(fte_match_param, spec->match_value, ++ outer_headers.ip_version, ipv); ++ } else { ++ MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, ++ outer_headers.ethertype); ++ MLX5_SET(fte_match_param, spec->match_value, ++ outer_headers.ethertype, etype); ++ } ++ ++ spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; ++} ++ + static void mlx5_fs_ttc_set_match_proto(void *headers_c, void *headers_v, + u8 proto, bool use_l4_type) + { +@@ -326,14 +354,10 @@ mlx5_generate_ttc_rule(struct mlx5_core_dev *dev, struct mlx5_flow_table *ft, + struct mlx5_flow_destination *dest, u16 etype, u8 proto, + bool use_l4_type, bool ipsec_rss) + { +- int match_ipv_outer = +- MLX5_CAP_FLOWTABLE_NIC_RX(dev, +- ft_field_support.outer_ip_version); + MLX5_DECLARE_FLOW_ACT(flow_act); + struct mlx5_flow_handle *rule; + struct mlx5_flow_spec *spec; + int err = 0; +- u8 ipv; + + spec = kvzalloc(sizeof(*spec), GFP_KERNEL); + if (!spec) +@@ -350,16 +374,8 @@ mlx5_generate_ttc_rule(struct mlx5_core_dev *dev, struct mlx5_flow_table *ft, + proto, use_l4_type); + } + +- ipv = mlx5_etype_to_ipv(etype); +- if (match_ipv_outer && ipv) { +- spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; +- MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, outer_headers.ip_version); +- MLX5_SET(fte_match_param, spec->match_value, outer_headers.ip_version, ipv); +- } else if (etype) { +- spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; +- MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, outer_headers.ethertype); +- MLX5_SET(fte_match_param, spec->match_value, outer_headers.ethertype, etype); +- } ++ if (etype) ++ mlx5_fs_ttc_set_match_ipv_outer(dev, spec, etype); + + if (ipsec_rss && proto == IPPROTO_ESP) { + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, +@@ -838,6 +854,7 @@ void mlx5_destroy_ttc_table(struct mlx5_ttc_table *ttc) + + kfree(ttc->g); + mlx5_destroy_flow_table(ttc->t); ++ mutex_destroy(&ttc->mutex); + kvfree(ttc); + } + +@@ -894,6 +911,9 @@ struct mlx5_ttc_table *mlx5_create_ttc_table(struct mlx5_core_dev *dev, + if (err) + goto destroy_ft; + ++ ttc->mdev = dev; ++ mutex_init(&ttc->mutex); ++ + return ttc; + + destroy_ft: +@@ -927,3 +947,194 @@ int mlx5_ttc_fwd_default_dest(struct mlx5_ttc_table *ttc, + + return mlx5_ttc_fwd_dest(ttc, type, &dest); + } ++ ++static void _mlx5_ttc_destroy_ipsec_rules(struct mlx5_ttc_table *ttc) ++{ ++ enum mlx5_traffic_types i; ++ ++ for (i = MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_TCP; ++ i <= MLX5_TT_DECRYPTED_ESP_INNER_IPV6_UDP; i++) { ++ if (!ttc->rules[i].rule) ++ continue; ++ ++ mlx5_del_flow_rules(ttc->rules[i].rule); ++ ttc->rules[i].rule = NULL; ++ } ++} ++ ++void mlx5_ttc_destroy_ipsec_rules(struct mlx5_ttc_table *ttc) ++{ ++ if (!mlx5_ttc_has_esp_flow_group(ttc)) ++ return; ++ ++ mutex_lock(&ttc->mutex); ++ if (--ttc->refcnt) ++ goto unlock; ++ ++ _mlx5_ttc_destroy_ipsec_rules(ttc); ++unlock: ++ mutex_unlock(&ttc->mutex); ++} ++ ++static int mlx5_ttc_get_tt_attrs(enum mlx5_traffic_types type, ++ u16 *etype, int *l4_type_ext, ++ enum mlx5_traffic_types *tir_tt) ++{ ++ switch (type) { ++ case MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_TCP: ++ case MLX5_TT_DECRYPTED_ESP_INNER_IPV4_TCP: ++ *etype = ETH_P_IP; ++ *l4_type_ext = MLX5_PACKET_L4_TYPE_EXT_TCP; ++ *tir_tt = MLX5_TT_IPV4_TCP; ++ break; ++ case MLX5_TT_DECRYPTED_ESP_OUTER_IPV6_TCP: ++ case MLX5_TT_DECRYPTED_ESP_INNER_IPV6_TCP: ++ *etype = ETH_P_IPV6; ++ *l4_type_ext = MLX5_PACKET_L4_TYPE_EXT_TCP; ++ *tir_tt = MLX5_TT_IPV6_TCP; ++ break; ++ case MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_UDP: ++ case MLX5_TT_DECRYPTED_ESP_INNER_IPV4_UDP: ++ *etype = ETH_P_IP; ++ *l4_type_ext = MLX5_PACKET_L4_TYPE_EXT_UDP; ++ *tir_tt = MLX5_TT_IPV4_UDP; ++ break; ++ case MLX5_TT_DECRYPTED_ESP_OUTER_IPV6_UDP: ++ case MLX5_TT_DECRYPTED_ESP_INNER_IPV6_UDP: ++ *etype = ETH_P_IPV6; ++ *l4_type_ext = MLX5_PACKET_L4_TYPE_EXT_UDP; ++ *tir_tt = MLX5_TT_IPV6_UDP; ++ break; ++ default: ++ return -EINVAL; ++ } ++ ++ return 0; ++} ++ ++static struct mlx5_flow_handle * ++mlx5_ttc_create_ipsec_outer_rule(struct mlx5_ttc_table *ttc, ++ enum mlx5_traffic_types type) ++{ ++ struct mlx5_flow_destination dest; ++ MLX5_DECLARE_FLOW_ACT(flow_act); ++ enum mlx5_traffic_types tir_tt; ++ struct mlx5_flow_handle *rule; ++ struct mlx5_flow_spec *spec; ++ int l4_type_ext; ++ u16 etype; ++ int err; ++ ++ err = mlx5_ttc_get_tt_attrs(type, &etype, &l4_type_ext, &tir_tt); ++ if (err) ++ return ERR_PTR(err); ++ ++ spec = kvzalloc(sizeof(*spec), GFP_KERNEL); ++ if (!spec) ++ return ERR_PTR(-ENOMEM); ++ ++ mlx5_fs_ttc_set_match_ipv_outer(ttc->mdev, spec, etype); ++ ++ MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, ++ outer_headers.l4_type_ext); ++ MLX5_SET(fte_match_param, spec->match_value, ++ outer_headers.l4_type_ext, l4_type_ext); ++ ++ dest = mlx5_ttc_get_default_dest(ttc, tir_tt); ++ ++ rule = mlx5_add_flow_rules(ttc->t, spec, &flow_act, &dest, 1); ++ if (IS_ERR(rule)) { ++ err = PTR_ERR(rule); ++ mlx5_core_err(ttc->mdev, "%s: add rule failed\n", __func__); ++ } ++ ++ kvfree(spec); ++ return err ? ERR_PTR(err) : rule; ++} ++ ++static struct mlx5_flow_handle * ++mlx5_ttc_create_ipsec_inner_rule(struct mlx5_ttc_table *ttc, ++ struct mlx5_ttc_table *inner_ttc, ++ enum mlx5_traffic_types type) ++{ ++ struct mlx5_flow_destination dest; ++ MLX5_DECLARE_FLOW_ACT(flow_act); ++ enum mlx5_traffic_types tir_tt; ++ struct mlx5_flow_handle *rule; ++ struct mlx5_flow_spec *spec; ++ int l4_type_ext; ++ u16 etype; ++ int err; ++ ++ err = mlx5_ttc_get_tt_attrs(type, &etype, &l4_type_ext, &tir_tt); ++ if (err) ++ return ERR_PTR(err); ++ ++ spec = kvzalloc(sizeof(*spec), GFP_KERNEL); ++ if (!spec) ++ return ERR_PTR(-ENOMEM); ++ ++ MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, ++ inner_headers.ip_version); ++ MLX5_SET(fte_match_param, spec->match_value, ++ inner_headers.ip_version, mlx5_etype_to_ipv(etype)); ++ MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, ++ inner_headers.l4_type_ext); ++ MLX5_SET(fte_match_param, spec->match_value, ++ inner_headers.l4_type_ext, l4_type_ext); ++ ++ dest = mlx5_ttc_get_default_dest(inner_ttc, tir_tt); ++ ++ spec->match_criteria_enable = MLX5_MATCH_INNER_HEADERS; ++ ++ rule = mlx5_add_flow_rules(ttc->t, spec, &flow_act, &dest, 1); ++ if (IS_ERR(rule)) { ++ err = PTR_ERR(rule); ++ mlx5_core_err(ttc->mdev, "%s: add rule failed\n", __func__); ++ } ++ ++ kvfree(spec); ++ return err ? ERR_PTR(err) : rule; ++} ++ ++int mlx5_ttc_create_ipsec_rules(struct mlx5_ttc_table *ttc, ++ struct mlx5_ttc_table *inner_ttc) ++{ ++ struct mlx5_flow_handle *rule; ++ enum mlx5_traffic_types i; ++ ++ if (!mlx5_ttc_has_esp_flow_group(ttc)) ++ return 0; ++ ++ mutex_lock(&ttc->mutex); ++ if (ttc->refcnt) ++ goto skip; ++ ++ for (i = MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_TCP; ++ i <= MLX5_TT_DECRYPTED_ESP_OUTER_IPV6_UDP; i++) { ++ rule = mlx5_ttc_create_ipsec_outer_rule(ttc, i); ++ if (IS_ERR(rule)) ++ goto err_out; ++ ++ ttc->rules[i].rule = rule; ++ } ++ ++ for (i = MLX5_TT_DECRYPTED_ESP_INNER_IPV4_TCP; ++ i <= MLX5_TT_DECRYPTED_ESP_INNER_IPV6_UDP; i++) { ++ rule = mlx5_ttc_create_ipsec_inner_rule(ttc, inner_ttc, i); ++ if (IS_ERR(rule)) ++ goto err_out; ++ ++ ttc->rules[i].rule = rule; ++ } ++ ++skip: ++ ttc->refcnt++; ++ mutex_unlock(&ttc->mutex); ++ return 0; ++ ++err_out: ++ _mlx5_ttc_destroy_ipsec_rules(ttc); ++ mutex_unlock(&ttc->mutex); ++ return PTR_ERR(rule); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h +index cae6a8ba0491..95f6e56724a2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_ttc.h +@@ -80,6 +80,9 @@ bool mlx5_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev); + u8 mlx5_get_proto_by_tunnel_type(enum mlx5_tunnel_types tt); + + bool mlx5_ttc_has_esp_flow_group(struct mlx5_ttc_table *ttc); ++int mlx5_ttc_create_ipsec_rules(struct mlx5_ttc_table *ttc, ++ struct mlx5_ttc_table *inner_ttc); ++void mlx5_ttc_destroy_ipsec_rules(struct mlx5_ttc_table *ttc); + static inline bool mlx5_ttc_is_decrypted_esp_tt(enum mlx5_traffic_types tt) + { + return tt >= MLX5_TT_DECRYPTED_ESP_OUTER_IPV4_TCP && +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1556-net-mlx5-remove-dead-code-from-total-vfs-setter.patch b/SOURCES/1556-net-mlx5-remove-dead-code-from-total-vfs-setter.patch new file mode 100644 index 000000000..fff9d8feb --- /dev/null +++ b/SOURCES/1556-net-mlx5-remove-dead-code-from-total-vfs-setter.patch @@ -0,0 +1,70 @@ +From 2817cd94836b1f849a6902fcab3c06edada5ad66 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:29 -0400 +Subject: [PATCH] net/mlx5: Remove dead code from total_vfs setter + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6a46e4faa8fd848acec81bd6149a44c3b9b17de6 +Author: Vlad Dumitrescu +Date: Thu Sep 18 15:05:07 2025 -0700 + + net/mlx5: Remove dead code from total_vfs setter + + The mlx5_devlink_total_vfs_set function branches based on per_pf_support + twice. Remove the second branch as the first one exits the function when + per_pf_support is false. + + Accidentally added as part of commit a4c49611cf4f ("net/mlx5: Implement + devlink total_vfs parameter"). + + Reported-by: Dan Carpenter + Closes: https://lore.kernel.org/linux-rdma/aMQWenzpdjhAX4fm@stanley.mountain/ + Signed-off-by: Vlad Dumitrescu + Link: https://patch.msgid.link/a6142a60-1948-439a-b0ae-ff1df26a37f8@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +index 383d8cfe4c0a..459a0b4d08e6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c +@@ -458,7 +458,6 @@ static int mlx5_devlink_total_vfs_set(struct devlink *devlink, u32 id, + { + struct mlx5_core_dev *dev = devlink_priv(devlink); + u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)]; +- bool per_pf_support; + void *data; + int err; + +@@ -474,9 +473,7 @@ static int mlx5_devlink_total_vfs_set(struct devlink *devlink, u32 id, + return -EOPNOTSUPP; + } + +- per_pf_support = MLX5_GET(nv_global_pci_cap, data, +- per_pf_total_vf_supported); +- if (!per_pf_support) { ++ if (!MLX5_GET(nv_global_pci_cap, data, per_pf_total_vf_supported)) { + /* We don't allow global SRIOV setting on per PF devlink */ + NL_SET_ERR_MSG_MOD(extack, + "SRIOV is not per PF on this device"); +@@ -489,14 +486,8 @@ static int mlx5_devlink_total_vfs_set(struct devlink *devlink, u32 id, + return err; + + MLX5_SET(nv_global_pci_conf, data, sriov_valid, 1); +- MLX5_SET(nv_global_pci_conf, data, per_pf_total_vf, per_pf_support); +- +- if (!per_pf_support) { +- MLX5_SET(nv_global_pci_conf, data, total_vfs, ctx->val.vu32); +- return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); +- } ++ MLX5_SET(nv_global_pci_conf, data, per_pf_total_vf, 1); + +- /* SRIOV is per PF */ + err = mlx5_nv_param_write(dev, mnvda, sizeof(mnvda)); + if (err) + return err; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1557-net-mlx5-use-pe-format-specifier-for-error-pointers.patch b/SOURCES/1557-net-mlx5-use-pe-format-specifier-for-error-pointers.patch new file mode 100644 index 000000000..3dddec0ad --- /dev/null +++ b/SOURCES/1557-net-mlx5-use-pe-format-specifier-for-error-pointers.patch @@ -0,0 +1,553 @@ +From a21dd06612a0f800db77e4067232edc09af2ae4e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:30 -0400 +Subject: [PATCH] net/mlx5: Use %pe format specifier for error pointers + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b89cd87b77d48ff14aea743cdba4193e76feb588 +Author: Gal Pressman +Date: Thu Sep 18 13:43:47 2025 +0300 + + net/mlx5: Use %pe format specifier for error pointers + + Using the coccinelle test introduced in previous commit + (scripts/coccinelle/misc/ptr_err_to_pe.cocci), convert error logging + throughout the mlx5 driver to use the %pe format specifier instead of + PTR_ERR() with integer format specifiers. + + Signed-off-by: Gal Pressman + Reviewed-by: Alexei Lazar + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1758192227-701925-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +index 73f5b62b8c7f..a17f82321c25 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +@@ -138,8 +138,8 @@ void mlx5_reporter_vnic_create(struct mlx5_core_dev *dev) + dev); + if (IS_ERR(health->vnic_reporter)) + mlx5_core_warn(dev, +- "Failed to create vnic reporter, err = %ld\n", +- PTR_ERR(health->vnic_reporter)); ++ "Failed to create vnic reporter, err = %pe\n", ++ health->vnic_reporter); + } + + void mlx5_reporter_vnic_destroy(struct mlx5_core_dev *dev) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c +index b4f3bd7d346e..195863b2c013 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c +@@ -138,8 +138,8 @@ void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv) + if (IS_ERR_OR_NULL(agent)) { + if (IS_ERR(agent)) + netdev_warn(priv->netdev, +- "Failed to create hv vhca stats agent, err = %ld\n", +- PTR_ERR(agent)); ++ "Failed to create hv vhca stats agent, err = %pe\n", ++ agent); + + kvfree(priv->stats_agent.buf); + return; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c +index 0f5d7ea8956f..9d1c677814e0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c +@@ -488,8 +488,8 @@ static int mlx5_esw_bridge_switchdev_event(struct notifier_block *nb, + fdb_info, + br_offloads); + if (IS_ERR(work)) { +- WARN_ONCE(1, "Failed to init switchdev work, err=%ld", +- PTR_ERR(work)); ++ WARN_ONCE(1, "Failed to init switchdev work, err=%pe", ++ work); + return notifier_from_errno(PTR_ERR(work)); + } + +@@ -527,7 +527,8 @@ void mlx5e_rep_bridge_init(struct mlx5e_priv *priv) + br_offloads = mlx5_esw_bridge_init(esw); + rtnl_unlock(); + if (IS_ERR(br_offloads)) { +- esw_warn(mdev, "Failed to init esw bridge (err=%ld)\n", PTR_ERR(br_offloads)); ++ esw_warn(mdev, "Failed to init esw bridge (err=%pe)\n", ++ br_offloads); + return; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +index eb1cace5910c..b1415992ffa2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +@@ -672,8 +672,8 @@ void mlx5e_reporter_rx_create(struct mlx5e_priv *priv) + &mlx5_rx_reporter_ops, + priv); + if (IS_ERR(reporter)) { +- netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n", +- PTR_ERR(reporter)); ++ netdev_warn(priv->netdev, "Failed to create rx reporter, err = %pe\n", ++ reporter); + return; + } + priv->rx_reporter = reporter; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +index 8907c5378f54..f10b9c5bf55b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +@@ -565,8 +565,8 @@ void mlx5e_reporter_tx_create(struct mlx5e_priv *priv) + priv); + if (IS_ERR(reporter)) { + netdev_warn(priv->netdev, +- "Failed to create tx reporter, err = %ld\n", +- PTR_ERR(reporter)); ++ "Failed to create tx reporter, err = %pe\n", ++ reporter); + return; + } + priv->tx_reporter = reporter; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c +index 01d522b02947..d3db6146fcad 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_hmfs.c +@@ -136,8 +136,8 @@ mlx5_ct_fs_hmfs_matcher_get(struct mlx5_ct_fs *fs, struct mlx5_flow_spec *spec, + hws_bwc_matcher = mlx5_ct_fs_hmfs_matcher_create(fs, tbl, spec, ipv4, tcp, gre); + if (IS_ERR(hws_bwc_matcher)) { + netdev_warn(fs->netdev, +- "ct_fs_hmfs: failed to create bwc matcher (nat %d, ipv4 %d, tcp %d, gre %d), err: %ld\n", +- nat, ipv4, tcp, gre, PTR_ERR(hws_bwc_matcher)); ++ "ct_fs_hmfs: failed to create bwc matcher (nat %d, ipv4 %d, tcp %d, gre %d), err: %pe\n", ++ nat, ipv4, tcp, gre, hws_bwc_matcher); + + hmfs_matcher = ERR_CAST(hws_bwc_matcher); + goto out_unlock; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_smfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_smfs.c +index 0c97c5899904..4d6924b644c9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_smfs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/ct_fs_smfs.c +@@ -148,8 +148,8 @@ mlx5_ct_fs_smfs_matcher_get(struct mlx5_ct_fs *fs, bool nat, bool ipv4, bool tcp + dr_matcher = mlx5_ct_fs_smfs_matcher_create(fs, tbl, ipv4, tcp, gre, prio); + if (IS_ERR(dr_matcher)) { + netdev_warn(fs->netdev, +- "ct_fs_smfs: failed to create matcher (nat %d, ipv4 %d, tcp %d, gre %d), err: %ld\n", +- nat, ipv4, tcp, gre, PTR_ERR(dr_matcher)); ++ "ct_fs_smfs: failed to create matcher (nat %d, ipv4 %d, tcp %d, gre %d), err: %pe\n", ++ nat, ipv4, tcp, gre, dr_matcher); + + smfs_matcher = ERR_CAST(dr_matcher); + goto out_unlock; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c +index 8afcec0c5d3c..896f718483c3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c +@@ -93,8 +93,8 @@ mlx5e_int_port_create_rx_rule(struct mlx5_eswitch *esw, + flow_rule = mlx5_add_flow_rules(esw->offloads.ft_offloads, spec, + &flow_act, dest, 1); + if (IS_ERR(flow_rule)) +- mlx5_core_warn(esw->dev, "ft offloads: Failed to add internal vport rx rule err %ld\n", +- PTR_ERR(flow_rule)); ++ mlx5_core_warn(esw->dev, "ft offloads: Failed to add internal vport rx rule err %pe\n", ++ flow_rule); + + kvfree(spec); + +@@ -322,8 +322,8 @@ mlx5e_tc_int_port_init(struct mlx5e_priv *priv) + sizeof(u32) * 2, + (1 << ESW_VPORT_BITS) - 1, true); + if (IS_ERR(int_port_priv->metadata_mapping)) { +- mlx5_core_warn(priv->mdev, "Can't allocate metadata mapping of int port offload, err=%ld\n", +- PTR_ERR(int_port_priv->metadata_mapping)); ++ mlx5_core_warn(priv->mdev, "Can't allocate metadata mapping of int port offload, err=%pe\n", ++ int_port_priv->metadata_mapping); + goto err_mapping; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +index a0fc76a1bc08..0735d10f2bac 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +@@ -172,8 +172,8 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, + &reformat_params, + MLX5_FLOW_NAMESPACE_FDB); + if (IS_ERR(e->pkt_reformat)) { +- mlx5_core_warn(priv->mdev, "Failed to offload cached encapsulation header, %lu\n", +- PTR_ERR(e->pkt_reformat)); ++ mlx5_core_warn(priv->mdev, "Failed to offload cached encapsulation header, %pe\n", ++ e->pkt_reformat); + return; + } + e->flags |= MLX5_ENCAP_ENTRY_VALID; +@@ -1845,8 +1845,8 @@ static int mlx5e_tc_tun_fib_event(struct notifier_block *nb, unsigned long event + queue_work(priv->wq, &fib_work->work); + } else if (IS_ERR(fib_work)) { + NL_SET_ERR_MSG_MOD(info->extack, "Failed to init fib work"); +- mlx5_core_warn(priv->mdev, "Failed to init fib work, %ld\n", +- PTR_ERR(fib_work)); ++ mlx5_core_warn(priv->mdev, "Failed to init fib work, %pe\n", ++ fib_work); + } + + break; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c +index 4f83e3172767..1febdc5b81f9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c +@@ -138,7 +138,7 @@ struct mlx5_flow_handle *mlx5e_accel_fs_add_sk(struct mlx5e_flow_steering *fs, + flow = mlx5_add_flow_rules(ft->t, spec, &flow_act, &dest, 1); + + if (IS_ERR(flow)) +- fs_err(fs, "mlx5_add_flow_rules() failed, flow is %ld\n", PTR_ERR(flow)); ++ fs_err(fs, "mlx5_add_flow_rules() failed, flow is %pe\n", flow); + + out: + kvfree(spec); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index ef2878f0c20e..6ccfc2af07b7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -1729,8 +1729,8 @@ static int setup_modify_header(struct mlx5e_ipsec *ipsec, int type, u32 val, u8 + + modify_hdr = mlx5_modify_header_alloc(mdev, ns_type, num_of_actions, action); + if (IS_ERR(modify_hdr)) { +- mlx5_core_err(mdev, "Failed to allocate modify_header %ld\n", +- PTR_ERR(modify_hdr)); ++ mlx5_core_err(mdev, "Failed to allocate modify_header %pe\n", ++ modify_hdr); + return PTR_ERR(modify_hdr); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +index 96b744ceaf13..30424ccad584 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +@@ -210,8 +210,8 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) + + mdev->mlx5e_res.dek_priv = mlx5_crypto_dek_init(mdev); + if (IS_ERR(mdev->mlx5e_res.dek_priv)) { +- mlx5_core_err(mdev, "crypto dek init failed, %ld\n", +- PTR_ERR(mdev->mlx5e_res.dek_priv)); ++ mlx5_core_err(mdev, "crypto dek init failed, %pe\n", ++ mdev->mlx5e_res.dek_priv); + mdev->mlx5e_res.dek_priv = NULL; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +index 2ce31ebd70d3..d19d743a88ae 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +@@ -1443,8 +1443,8 @@ static void mlx5e_rep_vnic_reporter_create(struct mlx5e_priv *priv, + rpriv); + if (IS_ERR(reporter)) { + mlx5_core_err(priv->mdev, +- "Failed to create representor vnic reporter, err = %ld\n", +- PTR_ERR(reporter)); ++ "Failed to create representor vnic reporter, err = %pe\n", ++ reporter); + return; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c +index 7dd1dc3f77c7..c9a1654d83a2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c +@@ -87,8 +87,8 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw, + drop_counter = mlx5_fc_create(esw->dev, false); + if (IS_ERR(drop_counter)) { + esw_warn(esw->dev, +- "vport[%d] configure egress drop rule counter err(%ld)\n", +- vport->vport, PTR_ERR(drop_counter)); ++ "vport[%d] configure egress drop rule counter err(%pe)\n", ++ vport->vport, drop_counter); + drop_counter = NULL; + } + vport->egress.legacy.drop_counter = drop_counter; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c +index 76e35c827da0..60e10047770f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c +@@ -81,7 +81,8 @@ mlx5_esw_bridge_table_create(int max_fte, u32 level, struct mlx5_eswitch *esw) + ft_attr.prio = FDB_BR_OFFLOAD; + fdb = mlx5_create_flow_table(ns, &ft_attr); + if (IS_ERR(fdb)) +- esw_warn(dev, "Failed to create bridge FDB Table (err=%ld)\n", PTR_ERR(fdb)); ++ esw_warn(dev, "Failed to create bridge FDB Table (err=%pe)\n", ++ fdb); + + return fdb; + } +@@ -121,8 +122,8 @@ mlx5_esw_bridge_ingress_vlan_proto_fg_create(unsigned int from, unsigned int to, + kvfree(in); + if (IS_ERR(fg)) + esw_warn(esw->dev, +- "Failed to create VLAN(proto=%x) flow group for bridge ingress table (err=%ld)\n", +- vlan_proto, PTR_ERR(fg)); ++ "Failed to create VLAN(proto=%x) flow group for bridge ingress table (err=%pe)\n", ++ vlan_proto, fg); + + return fg; + } +@@ -180,8 +181,8 @@ mlx5_esw_bridge_ingress_vlan_proto_filter_fg_create(unsigned int from, unsigned + fg = mlx5_create_flow_group(ingress_ft, in); + if (IS_ERR(fg)) + esw_warn(esw->dev, +- "Failed to create bridge ingress table VLAN filter flow group (err=%ld)\n", +- PTR_ERR(fg)); ++ "Failed to create bridge ingress table VLAN filter flow group (err=%pe)\n", ++ fg); + kvfree(in); + return fg; + } +@@ -237,8 +238,8 @@ mlx5_esw_bridge_ingress_mac_fg_create(struct mlx5_eswitch *esw, struct mlx5_flow + fg = mlx5_create_flow_group(ingress_ft, in); + if (IS_ERR(fg)) + esw_warn(esw->dev, +- "Failed to create MAC flow group for bridge ingress table (err=%ld)\n", +- PTR_ERR(fg)); ++ "Failed to create MAC flow group for bridge ingress table (err=%pe)\n", ++ fg); + + kvfree(in); + return fg; +@@ -274,8 +275,8 @@ mlx5_esw_bridge_egress_vlan_proto_fg_create(unsigned int from, unsigned int to, + fg = mlx5_create_flow_group(egress_ft, in); + if (IS_ERR(fg)) + esw_warn(esw->dev, +- "Failed to create VLAN flow group for bridge egress table (err=%ld)\n", +- PTR_ERR(fg)); ++ "Failed to create VLAN flow group for bridge egress table (err=%pe)\n", ++ fg); + kvfree(in); + return fg; + } +@@ -324,8 +325,8 @@ mlx5_esw_bridge_egress_mac_fg_create(struct mlx5_eswitch *esw, struct mlx5_flow_ + fg = mlx5_create_flow_group(egress_ft, in); + if (IS_ERR(fg)) + esw_warn(esw->dev, +- "Failed to create bridge egress table MAC flow group (err=%ld)\n", +- PTR_ERR(fg)); ++ "Failed to create bridge egress table MAC flow group (err=%pe)\n", ++ fg); + kvfree(in); + return fg; + } +@@ -354,8 +355,8 @@ mlx5_esw_bridge_egress_miss_fg_create(struct mlx5_eswitch *esw, struct mlx5_flow + fg = mlx5_create_flow_group(egress_ft, in); + if (IS_ERR(fg)) + esw_warn(esw->dev, +- "Failed to create bridge egress table miss flow group (err=%ld)\n", +- PTR_ERR(fg)); ++ "Failed to create bridge egress table miss flow group (err=%pe)\n", ++ fg); + kvfree(in); + return fg; + } +@@ -501,8 +502,8 @@ mlx5_esw_bridge_egress_table_init(struct mlx5_esw_bridge_offloads *br_offloads, + if (mlx5_esw_bridge_pkt_reformat_vlan_pop_supported(esw)) { + miss_fg = mlx5_esw_bridge_egress_miss_fg_create(esw, egress_ft); + if (IS_ERR(miss_fg)) { +- esw_warn(esw->dev, "Failed to create miss flow group (err=%ld)\n", +- PTR_ERR(miss_fg)); ++ esw_warn(esw->dev, "Failed to create miss flow group (err=%pe)\n", ++ miss_fg); + miss_fg = NULL; + goto skip_miss_flow; + } +@@ -510,8 +511,8 @@ mlx5_esw_bridge_egress_table_init(struct mlx5_esw_bridge_offloads *br_offloads, + miss_pkt_reformat = mlx5_esw_bridge_pkt_reformat_vlan_pop_create(esw); + if (IS_ERR(miss_pkt_reformat)) { + esw_warn(esw->dev, +- "Failed to alloc packet reformat REMOVE_HEADER (err=%ld)\n", +- PTR_ERR(miss_pkt_reformat)); ++ "Failed to alloc packet reformat REMOVE_HEADER (err=%pe)\n", ++ miss_pkt_reformat); + miss_pkt_reformat = NULL; + mlx5_destroy_flow_group(miss_fg); + miss_fg = NULL; +@@ -522,8 +523,8 @@ mlx5_esw_bridge_egress_table_init(struct mlx5_esw_bridge_offloads *br_offloads, + br_offloads->skip_ft, + miss_pkt_reformat); + if (IS_ERR(miss_handle)) { +- esw_warn(esw->dev, "Failed to create miss flow (err=%ld)\n", +- PTR_ERR(miss_handle)); ++ esw_warn(esw->dev, "Failed to create miss flow (err=%pe)\n", ++ miss_handle); + miss_handle = NULL; + mlx5_packet_reformat_dealloc(esw->dev, miss_pkt_reformat); + miss_pkt_reformat = NULL; +@@ -1048,8 +1049,8 @@ mlx5_esw_bridge_vlan_push_create(u16 vlan_proto, struct mlx5_esw_bridge_vlan *vl + &reformat_params, + MLX5_FLOW_NAMESPACE_FDB); + if (IS_ERR(pkt_reformat)) { +- esw_warn(esw->dev, "Failed to alloc packet reformat INSERT_HEADER (err=%ld)\n", +- PTR_ERR(pkt_reformat)); ++ esw_warn(esw->dev, "Failed to alloc packet reformat INSERT_HEADER (err=%pe)\n", ++ pkt_reformat); + return PTR_ERR(pkt_reformat); + } + +@@ -1076,8 +1077,8 @@ mlx5_esw_bridge_vlan_pop_create(struct mlx5_esw_bridge_vlan *vlan, struct mlx5_e + + pkt_reformat = mlx5_esw_bridge_pkt_reformat_vlan_pop_create(esw); + if (IS_ERR(pkt_reformat)) { +- esw_warn(esw->dev, "Failed to alloc packet reformat REMOVE_HEADER (err=%ld)\n", +- PTR_ERR(pkt_reformat)); ++ esw_warn(esw->dev, "Failed to alloc packet reformat REMOVE_HEADER (err=%pe)\n", ++ pkt_reformat); + return PTR_ERR(pkt_reformat); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/vporttbl.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/vporttbl.c +index 749c3957a128..407062096a82 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/vporttbl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/vporttbl.c +@@ -45,8 +45,8 @@ esw_vport_tbl_create(struct mlx5_eswitch *esw, struct mlx5_flow_namespace *ns, + ft_attr.flags = vport_ns->flags; + fdb = mlx5_create_auto_grouped_flow_table(ns, &ft_attr); + if (IS_ERR(fdb)) { +- esw_warn(esw->dev, "Failed to create per vport FDB Table err %ld\n", +- PTR_ERR(fdb)); ++ esw_warn(esw->dev, "Failed to create per vport FDB Table err %pe\n", ++ fdb); + } + + return fdb; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index 10eca910a2db..e2ffb87b94cb 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -257,8 +257,8 @@ __esw_fdb_set_vport_rule(struct mlx5_eswitch *esw, u16 vport, bool rx_rule, + &flow_act, &dest, 1); + if (IS_ERR(flow_rule)) { + esw_warn(esw->dev, +- "FDB: Failed to add flow rule: dmac_v(%pM) dmac_c(%pM) -> vport(%d), err(%ld)\n", +- dmac_v, dmac_c, vport, PTR_ERR(flow_rule)); ++ "FDB: Failed to add flow rule: dmac_v(%pM) dmac_c(%pM) -> vport(%d), err(%pe)\n", ++ dmac_v, dmac_c, vport, flow_rule); + flow_rule = NULL; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index bc9838dc5bf8..b8ec55929ab1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -1016,8 +1016,8 @@ mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *on_esw, + flow_rule = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(on_esw), + spec, &flow_act, &dest, 1); + if (IS_ERR(flow_rule)) +- esw_warn(on_esw->dev, "FDB: Failed to add send to vport rule err %ld\n", +- PTR_ERR(flow_rule)); ++ esw_warn(on_esw->dev, "FDB: Failed to add send to vport rule err %pe\n", ++ flow_rule); + out: + kvfree(spec); + return flow_rule; +@@ -1065,8 +1065,8 @@ mlx5_eswitch_add_send_to_vport_meta_rule(struct mlx5_eswitch *esw, u16 vport_num + flow_rule = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), + spec, &flow_act, &dest, 1); + if (IS_ERR(flow_rule)) +- esw_warn(esw->dev, "FDB: Failed to add send to vport meta rule vport %d, err %ld\n", +- vport_num, PTR_ERR(flow_rule)); ++ esw_warn(esw->dev, "FDB: Failed to add send to vport meta rule vport %d, err %pe\n", ++ vport_num, flow_rule); + + kvfree(spec); + return flow_rule; +@@ -2159,7 +2159,9 @@ mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, u16 vport, + flow_rule = mlx5_add_flow_rules(esw->offloads.ft_offloads, spec, + &flow_act, dest, 1); + if (IS_ERR(flow_rule)) { +- esw_warn(esw->dev, "fs offloads: Failed to add vport rx rule err %ld\n", PTR_ERR(flow_rule)); ++ esw_warn(esw->dev, ++ "fs offloads: Failed to add vport rx rule err %pe\n", ++ flow_rule); + goto out; + } + +@@ -2178,8 +2180,8 @@ static int esw_create_vport_rx_drop_rule(struct mlx5_eswitch *esw) + &flow_act, NULL, 0); + if (IS_ERR(flow_rule)) { + esw_warn(esw->dev, +- "fs offloads: Failed to add vport rx drop rule err %ld\n", +- PTR_ERR(flow_rule)); ++ "fs offloads: Failed to add vport rx drop rule err %pe\n", ++ flow_rule); + return PTR_ERR(flow_rule); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c +index b63c5a221eb9..aeeb136f5ebc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c +@@ -718,15 +718,15 @@ void mlx5_fw_reporters_create(struct mlx5_core_dev *dev) + + health->fw_reporter = devl_health_reporter_create(devlink, fw_ops, dev); + if (IS_ERR(health->fw_reporter)) +- mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n", +- PTR_ERR(health->fw_reporter)); ++ mlx5_core_warn(dev, "Failed to create fw reporter, err = %pe\n", ++ health->fw_reporter); + + health->fw_fatal_reporter = devl_health_reporter_create(devlink, + fw_fatal_ops, + dev); + if (IS_ERR(health->fw_fatal_reporter)) +- mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %ld\n", +- PTR_ERR(health->fw_fatal_reporter)); ++ mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %pe\n", ++ health->fw_fatal_reporter); + } + + static void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c b/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c +index 82d3c2568244..14d339eceb92 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/irq_affinity.c +@@ -150,8 +150,8 @@ mlx5_irq_affinity_request(struct mlx5_core_dev *dev, struct mlx5_irq_pool *pool, + if (IS_ERR(new_irq)) { + if (!least_loaded_irq) { + /* We failed to create an IRQ and we didn't find an IRQ */ +- mlx5_core_err(pool->dev, "Didn't find a matching IRQ. err = %ld\n", +- PTR_ERR(new_irq)); ++ mlx5_core_err(pool->dev, "Didn't find a matching IRQ. err = %pe\n", ++ new_irq); + mutex_unlock(&pool->lock); + return new_irq; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +index 8f2ad45bec9f..d0ba83d77cd1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +@@ -1365,9 +1365,9 @@ static void mlx5_init_clock_dev(struct mlx5_core_dev *mdev) + clock->ptp = ptp_clock_register(&clock->ptp_info, + clock->shared ? NULL : &mdev->pdev->dev); + if (IS_ERR(clock->ptp)) { +- mlx5_core_warn(mdev, "%sptp_clock_register failed %ld\n", ++ mlx5_core_warn(mdev, "%sptp_clock_register failed %pe\n", + clock->shared ? "shared clock " : "", +- PTR_ERR(clock->ptp)); ++ clock->ptp); + clock->ptp = NULL; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index c48f3d9765f7..77f587f97a2d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -979,8 +979,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + + dev->priv.devc = mlx5_devcom_register_device(dev); + if (IS_ERR(dev->priv.devc)) +- mlx5_core_warn(dev, "failed to register devcom device %ld\n", +- PTR_ERR(dev->priv.devc)); ++ mlx5_core_warn(dev, "failed to register devcom device %pe\n", ++ dev->priv.devc); + + err = mlx5_query_board_id(dev); + if (err) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1558-net-mlx5-expose-uar-access-and-odp-page-fault-counters.patch b/SOURCES/1558-net-mlx5-expose-uar-access-and-odp-page-fault-counters.patch new file mode 100644 index 000000000..69c562bba --- /dev/null +++ b/SOURCES/1558-net-mlx5-expose-uar-access-and-odp-page-fault-counters.patch @@ -0,0 +1,84 @@ +From fecdf816a498c9d63e80e7253fdc1f84cfd02001 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:30 -0400 +Subject: [PATCH] net/mlx5: Expose uar access and odp page fault counters + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e835faaed2f80ee8652f59a54703edceab04f0d9 +Author: Akiva Goldberger +Date: Thu Sep 25 13:45:30 2025 +0300 + + net/mlx5: Expose uar access and odp page fault counters + + Add three counters to vnic health reporter: + bar_uar_access, odp_local_triggered_page_fault, and + odp_remote_triggered_page_fault. + + - bar_uar_access + number of WRITE or READ access operations to the UAR on the PCIe + BAR. + - odp_local_triggered_page_fault + number of locally-triggered page-faults due to ODP. + - odp_remote_triggered_page_fault + number of remotly-triggered page-faults due to ODP. + + Example access: + $ devlink health diagnose pci/0000:08:00.0 reporter vnic + vNIC env counters: + total_error_queues: 0 send_queue_priority_update_flow: 0 + comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 + invalid_command: 0 quota_exceeded_command: 0 + nic_receive_steering_discard: 0 icm_consumption: 1032 + bar_uar_access: 1279 odp_local_triggered_page_fault: 20 + odp_remote_triggered_page_fault: 34 + + Signed-off-by: Akiva Goldberger + Reviewed-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1758797130-829564-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst +index 41c9b716699e..0e5f9c76e514 100644 +--- a/Documentation/networking/devlink/mlx5.rst ++++ b/Documentation/networking/devlink/mlx5.rst +@@ -385,6 +385,12 @@ Description of the vnic counters: + amount of Interconnect Host Memory (ICM) consumed by the vnic in + granularity of 4KB. ICM is host memory allocated by SW upon HCA request + and is used for storing data structures that control HCA operation. ++- bar_uar_access ++ number of WRITE or READ access operations to the UAR on the PCIe BAR. ++- odp_local_triggered_page_fault ++ number of locally-triggered page-faults due to ODP. ++- odp_remote_triggered_page_fault ++ number of remotly-triggered page-faults due to ODP. + + User commands examples: + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +index a17f82321c25..7cae0c6e5e8a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +@@ -107,6 +107,15 @@ void mlx5_reporter_vnic_diagnose_counters(struct mlx5_core_dev *dev, + } + if (MLX5_CAP_GEN(dev, nic_cap_reg)) + mlx5_reporter_vnic_diagnose_counter_icm(dev, fmsg, vport_num, other_vport); ++ if (MLX5_CAP_GEN(dev, vnic_env_cnt_bar_uar_access)) ++ devlink_fmsg_u32_pair_put(fmsg, "bar_uar_access", ++ VNIC_ENV_GET(&vnic, bar_uar_access)); ++ if (MLX5_CAP_GEN(dev, vnic_env_cnt_odp_page_fault)) { ++ devlink_fmsg_u32_pair_put(fmsg, "odp_local_triggered_page_fault", ++ VNIC_ENV_GET(&vnic, odp_local_triggered_page_fault)); ++ devlink_fmsg_u32_pair_put(fmsg, "odp_remote_triggered_page_fault", ++ VNIC_ENV_GET(&vnic, odp_remote_triggered_page_fault)); ++ } + + devlink_fmsg_obj_nest_end(fmsg); + devlink_fmsg_pair_nest_end(fmsg); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1559-net-mlx5-add-ifc-bit-for-tir-sq-order-capability.patch b/SOURCES/1559-net-mlx5-add-ifc-bit-for-tir-sq-order-capability.patch new file mode 100644 index 000000000..8ecb7beed --- /dev/null +++ b/SOURCES/1559-net-mlx5-add-ifc-bit-for-tir-sq-order-capability.patch @@ -0,0 +1,46 @@ +From 0601b67c4234f8bb9e709a59115a2865e5c14437 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:30 -0400 +Subject: [PATCH] net/mlx5: Add IFC bit for TIR/SQ order capability + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1ddf1636e0e058adf2231486da0419243eb49539 +Author: Tariq Toukan +Date: Mon Sep 22 09:06:30 2025 +0300 + + net/mlx5: Add IFC bit for TIR/SQ order capability + + Before this cap, firmware requested a certain creation order between TIR + objects and SQs of the same transport domain to properly support the + self loopback prevention feature. If order is not preserved, explicit + modify_tir operations are necessary after the opening of the SQs. + + When set, this cap bit indicates that this firmware requirement / + limitation no longer holds. + + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758521191-814350-2-git-send-email-tariqt@nvidia.com + Reviewed-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 0ef2af28d424..3d0322b1bd5a 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -1894,7 +1894,8 @@ struct mlx5_ifc_cmd_hca_cap_bits { + + u8 reserved_at_2a0[0x7]; + u8 mkey_pcie_tph[0x1]; +- u8 reserved_at_2a8[0x2]; ++ u8 reserved_at_2a8[0x1]; ++ u8 tis_tir_td_order[0x1]; + + u8 psp[0x1]; + u8 shampo[0x1]; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1560-net-mlx5-ifc-add-balance-id-and-lag-per-mp-group-bits.patch b/SOURCES/1560-net-mlx5-ifc-add-balance-id-and-lag-per-mp-group-bits.patch new file mode 100644 index 000000000..9ee995f7e --- /dev/null +++ b/SOURCES/1560-net-mlx5-ifc-add-balance-id-and-lag-per-mp-group-bits.patch @@ -0,0 +1,58 @@ +From fde9ec2ca86396c0145e4498a5337df946ade79a Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:30 -0400 +Subject: [PATCH] net/mlx5: IFC add balance ID and LAG per MP group bits + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 137d1a6355131457723b51a34192320d93d15654 +Author: Mark Bloch +Date: Mon Sep 22 09:06:31 2025 +0300 + + net/mlx5: IFC add balance ID and LAG per MP group bits + + Add interface definitions for load balance ID and LAG per multiplane group + functionality. This patch introduces the hardware capability bits needed + to support balance ID in multiplane LAG configurations. + + The new fields include: + - load_balance_id: 4-bit field for balance identifier. + - lag_per_mp_group: capability bit for LAG per multiplane group support. + + These interface additions are prerequisites for implementing balance ID + support in the MLX5 driver. + + Signed-off-by: Mark Bloch + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1758521191-814350-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 3d0322b1bd5a..5e2bc469ca64 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -2234,12 +2234,16 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { + u8 reserved_at_440[0x8]; + u8 max_num_eqs_24b[0x18]; + +- u8 reserved_at_460[0x160]; ++ u8 reserved_at_460[0x144]; ++ u8 load_balance_id[0x4]; ++ u8 reserved_at_5a8[0x18]; + + u8 query_adjacent_functions_id[0x1]; + u8 ingress_egress_esw_vport_connect[0x1]; + u8 function_id_type_vhca_id[0x1]; +- u8 reserved_at_5c3[0xd]; ++ u8 reserved_at_5c3[0x1]; ++ u8 lag_per_mp_group[0x1]; ++ u8 reserved_at_5c5[0xb]; + u8 delegate_vhca_management_profiles[0x10]; + + u8 delegated_vhca_max[0x10]; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1561-net-mlx5-stop-polling-for-command-response-if-interface-goes.patch b/SOURCES/1561-net-mlx5-stop-polling-for-command-response-if-interface-goes.patch new file mode 100644 index 000000000..513849859 --- /dev/null +++ b/SOURCES/1561-net-mlx5-stop-polling-for-command-response-if-interface-goes.patch @@ -0,0 +1,56 @@ +From 0064b668708144129c1860bcc8a92fb74d68f8e1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:30 -0400 +Subject: [PATCH] net/mlx5: Stop polling for command response if interface goes + down + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b1f0349bd6d320c382df2e7f6fc2ac95c85f2b18 +Author: Moshe Shemesh +Date: Mon Sep 29 00:02:07 2025 +0300 + + net/mlx5: Stop polling for command response if interface goes down + + Stop polling on firmware response to command in polling mode if the + command interface got down. This situation can occur, for example, if a + firmware fatal error is detected during polling. + + This change halts the polling process when the command interface goes + down, preventing unnecessary waits. + + Fixes: b898ce7bccf1 ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible") + Signed-off-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +index e395ef5f356e..722282cebce9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +@@ -294,6 +294,10 @@ static void poll_timeout(struct mlx5_cmd_work_ent *ent) + return; + } + cond_resched(); ++ if (mlx5_cmd_is_down(dev)) { ++ ent->ret = -ENXIO; ++ return; ++ } + } while (time_before(jiffies, poll_end)); + + ent->ret = -ETIMEDOUT; +@@ -1070,7 +1074,7 @@ static void cmd_work_handler(struct work_struct *work) + poll_timeout(ent); + /* make sure we read the descriptor after ownership is SW */ + rmb(); +- mlx5_cmd_comp_handler(dev, 1ULL << ent->idx, (ent->ret == -ETIMEDOUT)); ++ mlx5_cmd_comp_handler(dev, 1ULL << ent->idx, !!ent->ret); + } + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1562-net-mlx5-pagealloc-fix-reclaim-race-during-command-interface.patch b/SOURCES/1562-net-mlx5-pagealloc-fix-reclaim-race-during-command-interface.patch new file mode 100644 index 000000000..a7e868c7e --- /dev/null +++ b/SOURCES/1562-net-mlx5-pagealloc-fix-reclaim-race-during-command-interface.patch @@ -0,0 +1,59 @@ +From 0c5ea60367085f9ccb735f4c05ee9e35faebf640 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:30 -0400 +Subject: [PATCH] net/mlx5: pagealloc: Fix reclaim race during command + interface teardown + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 79a0e32b32ac4e4f9e4bb22be97f371c8c116c88 +Author: Shay Drory +Date: Mon Sep 29 00:02:08 2025 +0300 + + net/mlx5: pagealloc: Fix reclaim race during command interface teardown + + The reclaim_pages_cmd() function sends a command to the firmware to + reclaim pages if the command interface is active. + + A race condition can occur if the command interface goes down (e.g., due + to a PCI error) while the mlx5_cmd_do() call is in flight. In this + case, mlx5_cmd_do() will return an error. The original code would + propagate this error immediately, bypassing the software-based page + reclamation logic that is supposed to run when the command interface is + down. + + Fix this by checking whether mlx5_cmd_do() returns -ENXIO, which mark + that command interface is down. If this is the case, fall through to + the software reclamation path. If the command failed for any another + reason, or finished successfully, return as before. + + Fixes: b898ce7bccf1 ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible") + Signed-off-by: Shay Drory + Reviewed-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +index 9bc9bd83c232..cd68c4b2c0bf 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +@@ -489,9 +489,12 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, + u32 func_id; + u32 npages; + u32 i = 0; ++ int err; + +- if (!mlx5_cmd_is_down(dev)) +- return mlx5_cmd_do(dev, in, in_size, out, out_size); ++ err = mlx5_cmd_do(dev, in, in_size, out, out_size); ++ /* If FW is gone (-ENXIO), proceed to forceful reclaim */ ++ if (err != -ENXIO) ++ return err; + + /* No hard feelings, we want our pages back! */ + npages = MLX5_GET(manage_pages_in, in, input_num_entries); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1563-net-mlx5-fw-reset-add-reset-timeout-work.patch b/SOURCES/1563-net-mlx5-fw-reset-add-reset-timeout-work.patch new file mode 100644 index 000000000..d292f8731 --- /dev/null +++ b/SOURCES/1563-net-mlx5-fw-reset-add-reset-timeout-work.patch @@ -0,0 +1,99 @@ +From 4f09ee86bf6f20ac7470200bfb81f70ef9f10779 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:30 -0400 +Subject: [PATCH] net/mlx5: fw reset, add reset timeout work + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5cfbe7ebfa42fd3c517a701dab5bd73524da9088 +Author: Moshe Shemesh +Date: Mon Sep 29 00:02:09 2025 +0300 + + net/mlx5: fw reset, add reset timeout work + + Add sync reset timeout to stop poll_sync_reset in case there was no + reset done or abort event within timeout. Otherwise poll sync reset will + just continue and in case of fw fatal error no health reporting will be + done. + + Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") + Signed-off-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +index 22995131824a..89e399606877 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +@@ -27,6 +27,7 @@ struct mlx5_fw_reset { + struct work_struct reset_reload_work; + struct work_struct reset_now_work; + struct work_struct reset_abort_work; ++ struct delayed_work reset_timeout_work; + unsigned long reset_flags; + u8 reset_method; + struct timer_list timer; +@@ -259,6 +260,8 @@ static int mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool + return -EALREADY; + } + ++ if (current_work() != &fw_reset->reset_timeout_work.work) ++ cancel_delayed_work(&fw_reset->reset_timeout_work); + mlx5_stop_sync_reset_poll(dev); + if (poll_health) + mlx5_start_health_poll(dev); +@@ -330,6 +333,11 @@ static int mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev) + } + mlx5_stop_health_poll(dev, true); + mlx5_start_sync_reset_poll(dev); ++ ++ if (!test_bit(MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS, ++ &fw_reset->reset_flags)) ++ schedule_delayed_work(&fw_reset->reset_timeout_work, ++ msecs_to_jiffies(mlx5_tout_ms(dev, PCI_SYNC_UPDATE))); + return 0; + } + +@@ -739,6 +747,19 @@ static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct + } + } + ++static void mlx5_sync_reset_timeout_work(struct work_struct *work) ++{ ++ struct delayed_work *dwork = container_of(work, struct delayed_work, ++ work); ++ struct mlx5_fw_reset *fw_reset = ++ container_of(dwork, struct mlx5_fw_reset, reset_timeout_work); ++ struct mlx5_core_dev *dev = fw_reset->dev; ++ ++ if (mlx5_sync_reset_clear_reset_requested(dev, true)) ++ return; ++ mlx5_core_warn(dev, "PCI Sync FW Update Reset Timeout.\n"); ++} ++ + static int fw_reset_event_notifier(struct notifier_block *nb, unsigned long action, void *data) + { + struct mlx5_fw_reset *fw_reset = mlx5_nb_cof(nb, struct mlx5_fw_reset, nb); +@@ -822,6 +843,7 @@ void mlx5_drain_fw_reset(struct mlx5_core_dev *dev) + cancel_work_sync(&fw_reset->reset_reload_work); + cancel_work_sync(&fw_reset->reset_now_work); + cancel_work_sync(&fw_reset->reset_abort_work); ++ cancel_delayed_work(&fw_reset->reset_timeout_work); + } + + static const struct devlink_param mlx5_fw_reset_devlink_params[] = { +@@ -865,6 +887,8 @@ int mlx5_fw_reset_init(struct mlx5_core_dev *dev) + INIT_WORK(&fw_reset->reset_reload_work, mlx5_sync_reset_reload_work); + INIT_WORK(&fw_reset->reset_now_work, mlx5_sync_reset_now_event); + INIT_WORK(&fw_reset->reset_abort_work, mlx5_sync_reset_abort_event); ++ INIT_DELAYED_WORK(&fw_reset->reset_timeout_work, ++ mlx5_sync_reset_timeout_work); + + init_completion(&fw_reset->done); + return 0; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1564-net-mlx5-improve-write-combining-test-reliability-for-arm64-.patch b/SOURCES/1564-net-mlx5-improve-write-combining-test-reliability-for-arm64-.patch new file mode 100644 index 000000000..9a3c52cc5 --- /dev/null +++ b/SOURCES/1564-net-mlx5-improve-write-combining-test-reliability-for-arm64-.patch @@ -0,0 +1,164 @@ +From f373b5ca04e4165f2d83fcb1b275b2b8d567ed67 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5: Improve write-combining test reliability for ARM64 + Grace CPUs + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fd8c8216648cd8c047bd3bcad65424ed44b5b450 +Author: Patrisious Haddad +Date: Mon Sep 29 00:08:08 2025 +0300 + + net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs + + Write combining is an optimization feature in CPUs that is frequently + used by modern devices to generate 32 or 64 byte TLPs at the PCIe level. + These large TLPs allow certain optimizations in the driver to HW + communication that improve performance. As WC is unpredictable and + optional the HW designs all tolerate cases where combining doesn't + happen and simply experience a performance degradation. + + Unfortunately many virtualization environments on all architectures have + done things that completely disable WC inside the VM with no generic way + to detect this. For example WC was fully blocked in ARM64 KVM until + commit 8c47ce3e1d2c ("KVM: arm64: Set io memory s2 pte as normalnc for + vfio pci device"). + + Trying to use WC when it is known not to work has a measurable + performance cost (~5%). Long ago mlx5 developed an boot time algorithm + to test if WC is available or not by using unique mlx5 HW features to + measure how many large TLPs the device is receiving. The SW generates a + large number of combining opportunities and if any succeed then WC is + declared working. + + In mlx5 the WC optimization feature is never used by the kernel except + for the boot time test. The WC is only used by userspace in rdma-core. + + Sadly modern ARM CPUs, especially NVIDIA Grace, have a combining + implementation that is very unreliable compared to pretty much + everything prior. This is being fixed architecturally in new CPUs with a + new ST64B instruction, but current shipping devices suffer this problem. + + Unreliable means the SW can present thousands of combining opportunities + and the HW will not combine for any of them, which creates a performance + degradation, and critically fails the mlx5 boot test. However, the CPU + is very sensitive to the instruction sequence used, with the better + options being sufficiently good that the performance loss from the + unreliable CPU is not measurable. + + Broadly there are several options, from worst to best: + 1) A C loop doing a u64 memcpy. + This was used prior to commit ef302283ddfc + ("IB/mlx5: Use __iowrite64_copy() for write combining stores") + and failed almost all the time on Grace CPUs. + + 2) ARM64 assembly with consecutive 8 byte stores. This was implemented + as an arch-generic __iowriteXX_copy() family of functions suitable + for performance use in drivers for WC. commit ead79118dae6 + ("arm64/io: Provide a WC friendly __iowriteXX_copy()") provided the + ARM implementation. + + 3) ARM64 assembly with consecutive 16 byte stores. This was rejected + from kernel use over fears of virtualization failures. Common ARM + VMMs will crash if STP is used against emulated memory. + + 4) A single NEON store instruction. Userspace has used this option for a + very long time, it performs well. + + 5) For future silicon the new ST64B instruction is guaranteed to + generate a 64 byte TLP 100% of the time + + The past upgrade from #1 to #2 was thought to be sufficient to solve + this problem. However, more testing on more systems shows that #3 is + still problematic at a low frequency and the kernel test fails. + + Thus, make the mlx5 use the same instructions as userspace during the + boot time WC self test. This way the WC test matches the userspace and + will properly detect the ability of HW to support the WC workload that + userspace will generate. While #4 still has imperfect combining + performance, it is substantially better than #2, and does actually give + a performance win to applications. Self-test failures with #2 are like + 3/10 boots, on some systems, #4 has never seen a boot failure. + + There is no real general use case for a NEON based WC flow in the + kernel. This is not suitable for any performance path work as getting + into/out of a NEON context is fairly expensive compared to the gain of + WC. Future CPUs are going to fix this issue by using an new ARM + instruction and __iowriteXX_copy() will be updated to use that + automatically, probably using the ALTERNATES mechanism. + + Since this problem is constrained to mlx5's unique situation of needing + a non-performance code path to duplicate what mlx5 userspace is doing as + a matter of self-testing, implement it as a one line inline assembly in + the driver directly. + + Lastly, this was concluded from the discussion with ARM maintainers + which confirms that this is the best approach for the solution: + https://lore.kernel.org/r/aHqN_hpJl84T1Usi@arm.com + + Signed-off-by: Patrisious Haddad + Reviewed-by: Michael Guralnik + Reviewed-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759093688-841357-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +index 999d6216648a..c281153bd411 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +@@ -7,6 +7,10 @@ + #include "mlx5_core.h" + #include "wq.h" + ++#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64) ++#include ++#endif ++ + #define TEST_WC_NUM_WQES 255 + #define TEST_WC_LOG_CQ_SZ (order_base_2(TEST_WC_NUM_WQES)) + #define TEST_WC_SQ_LOG_WQ_SZ TEST_WC_LOG_CQ_SZ +@@ -255,6 +259,27 @@ static void mlx5_wc_destroy_sq(struct mlx5_wc_sq *sq) + mlx5_wq_destroy(&sq->wq_ctrl); + } + ++static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16], ++ size_t mmio_wqe_size, unsigned int offset) ++{ ++#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64) ++ if (cpu_has_neon()) { ++ kernel_neon_begin(); ++ asm volatile ++ (".arch_extension simd;\n\t" ++ "ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t" ++ "st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%1]" ++ : ++ : "r"(mmio_wqe), "r"(sq->bfreg.map + offset) ++ : "memory", "v0", "v1", "v2", "v3"); ++ kernel_neon_end(); ++ return; ++ } ++#endif ++ __iowrite64_copy(sq->bfreg.map + offset, mmio_wqe, ++ mmio_wqe_size / 8); ++} ++ + static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset, + bool signaled) + { +@@ -289,8 +314,7 @@ static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset, + */ + wmb(); + +- __iowrite64_copy(sq->bfreg.map + *offset, mmio_wqe, +- sizeof(mmio_wqe) / 8); ++ mlx5_iowrite64_copy(sq, mmio_wqe, sizeof(mmio_wqe), *offset); + + *offset ^= buf_size; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1565-net-mlx5-hws-generalize-complex-matchers.patch b/SOURCES/1565-net-mlx5-hws-generalize-complex-matchers.patch new file mode 100644 index 000000000..2c6d9e62e --- /dev/null +++ b/SOURCES/1565-net-mlx5-hws-generalize-complex-matchers.patch @@ -0,0 +1,2553 @@ +From 64ec2870c12373c3f1c66188d1ae32408554f918 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5: HWS, Generalize complex matchers + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 906154caa7d3d750d47cd18f9349b75b77e12854 +Author: Vlad Dogaru +Date: Mon Sep 29 00:25:17 2025 +0300 + + net/mlx5: HWS, Generalize complex matchers + + The existing solution of complex matchers splits the match parameters + across two, and exactly two, matchers. For some rather extreme cases + (e.g. IPv6-in-IPv6 tunnels), even two matchers are not enough. + + Generalize complex matchers to up to 4 submatchers, and allow easy + extension to more if needed. This resulted in rewriting a large part + of the high-level complex matchers logic, but the original concepts + were rock solid and still hold. + + Key characteristics of the new implementation: + + * Rework complex matchers to include multiple submatchers. All + submatchers but the first are isolated, in keeping with the existing + paradigm of handing off to specialized matchers that are not otherwise + reachable by regular rules. + + * Similarly, rework complex rules to allow splitting them into more than + two simple rules. Rules continue to be refcounted to allow for + multiple complex rules matching on identical parts of the match + params. + + * Rely on the match tag, as opposed to the entire match_param, to hash + subrules. This results in lower memory usage. + + * Prefer to split the original user-supplied match parameters rather + than the internal field descriptors. This avoids the awkward + transition back and forth between the two formats. + + * Allow splitting multi-dword fields across matchers. The only + restrictions that the new implementation impose are: a) any fragment + of an IP address must be accompanied by a match on the IP version; and + b) a single lower dword of an IPv6 address cannot be present in a + submatcher as it would be interpreted as an IPv4 address. + + * Employ a greedy algorithm to split the match params, as opposed to + complete search. The results are not optimal, but the algorithm is now + linear compared to exponential. Consequently, we see complex matcher + creation time drops two orders of magnitude in our tests. + + Signed-off-by: Vlad Dogaru + Signed-off-by: Yevgeny Kliteynik + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759094723-843774-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +index adeccc588e5d..6ef0c4be27e1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +@@ -51,9 +51,6 @@ static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + u8 size_log_rx, u8 size_log_tx, + struct mlx5hws_matcher_attr *attr) + { +- struct mlx5hws_bwc_matcher *first_matcher = +- bwc_matcher->complex_first_bwc_matcher; +- + memset(attr, 0, sizeof(*attr)); + + attr->priority = priority; +@@ -66,9 +63,6 @@ static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher, + attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX].rule.num_log = size_log_tx; + attr->resizable = true; + attr->max_num_of_at_attach = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM; +- +- attr->isolated_matcher_end_ft_id = +- first_matcher ? first_matcher->matcher->end_ft_id : 0; + } + + static int +@@ -171,10 +165,16 @@ hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher) + + static int hws_bwc_matcher_move_all(struct mlx5hws_bwc_matcher *bwc_matcher) + { +- if (!bwc_matcher->complex) ++ switch (bwc_matcher->matcher_type) { ++ case MLX5HWS_BWC_MATCHER_SIMPLE: + return hws_bwc_matcher_move_all_simple(bwc_matcher); +- +- return mlx5hws_bwc_matcher_move_all_complex(bwc_matcher); ++ case MLX5HWS_BWC_MATCHER_COMPLEX_FIRST: ++ return mlx5hws_bwc_matcher_complex_move_first(bwc_matcher); ++ case MLX5HWS_BWC_MATCHER_COMPLEX_SUBMATCHER: ++ return mlx5hws_bwc_matcher_complex_move(bwc_matcher); ++ default: ++ return -EINVAL; ++ } + } + + static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher) +@@ -249,6 +249,7 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher, + bwc_matcher->tx_size.size_log, + &attr); + ++ bwc_matcher->matcher_type = MLX5HWS_BWC_MATCHER_SIMPLE; + bwc_matcher->priority = priority; + + bwc_matcher->size_of_at_array = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM; +@@ -393,7 +394,7 @@ int mlx5hws_bwc_matcher_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) + "BWC matcher destroy: matcher still has %u RX and %u TX rules\n", + rx_rules, tx_rules); + +- if (bwc_matcher->complex) ++ if (bwc_matcher->matcher_type == MLX5HWS_BWC_MATCHER_COMPLEX_FIRST) + mlx5hws_bwc_matcher_destroy_complex(bwc_matcher); + else + mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); +@@ -651,7 +652,8 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule) + + int mlx5hws_bwc_rule_destroy(struct mlx5hws_bwc_rule *bwc_rule) + { +- bool is_complex = !!bwc_rule->bwc_matcher->complex; ++ bool is_complex = bwc_rule->bwc_matcher->matcher_type == ++ MLX5HWS_BWC_MATCHER_COMPLEX_FIRST; + int ret = 0; + + if (is_complex) +@@ -1147,7 +1149,7 @@ mlx5hws_bwc_rule_create(struct mlx5hws_bwc_matcher *bwc_matcher, + + bwc_queue_idx = hws_bwc_gen_queue_idx(ctx); + +- if (bwc_matcher->complex) ++ if (bwc_matcher->matcher_type == MLX5HWS_BWC_MATCHER_COMPLEX_FIRST) + ret = mlx5hws_bwc_rule_create_complex(bwc_rule, + params, + flow_source, +@@ -1216,10 +1218,9 @@ int mlx5hws_bwc_rule_action_update(struct mlx5hws_bwc_rule *bwc_rule, + return -EINVAL; + } + +- /* For complex rule, the update should happen on the second matcher */ +- if (bwc_rule->isolated_bwc_rule) +- return hws_bwc_rule_action_update(bwc_rule->isolated_bwc_rule, +- rule_actions); +- else +- return hws_bwc_rule_action_update(bwc_rule, rule_actions); ++ /* For complex rules, the update should happen on the last subrule. */ ++ while (bwc_rule->next_subrule) ++ bwc_rule = bwc_rule->next_subrule; ++ ++ return hws_bwc_rule_action_update(bwc_rule, rule_actions); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +index af391d70c14f..b905511f5c53 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h +@@ -18,6 +18,21 @@ + + #define MLX5HWS_BWC_POLLING_TIMEOUT 60 + ++enum mlx5hws_bwc_matcher_type { ++ /* Standalone bwc matcher. */ ++ MLX5HWS_BWC_MATCHER_SIMPLE, ++ /* The first matcher of a complex matcher. When rules are inserted into ++ * a matcher of this type, they are split into subrules and inserted ++ * into their corresponding submatchers. ++ */ ++ MLX5HWS_BWC_MATCHER_COMPLEX_FIRST, ++ /* A submatcher that is part of a complex matcher. For most purposes ++ * these are treated as simple matchers, except when it comes to moving ++ * rules during resize. ++ */ ++ MLX5HWS_BWC_MATCHER_COMPLEX_SUBMATCHER, ++}; ++ + struct mlx5hws_bwc_matcher_complex_data; + + struct mlx5hws_bwc_matcher_size { +@@ -31,9 +46,9 @@ struct mlx5hws_bwc_matcher { + struct mlx5hws_match_template *mt; + struct mlx5hws_action_template **at; + struct mlx5hws_bwc_matcher_complex_data *complex; +- struct mlx5hws_bwc_matcher *complex_first_bwc_matcher; + u8 num_of_at; + u8 size_of_at_array; ++ enum mlx5hws_bwc_matcher_type matcher_type; + u32 priority; + struct mlx5hws_bwc_matcher_size rx_size; + struct mlx5hws_bwc_matcher_size tx_size; +@@ -43,8 +58,8 @@ struct mlx5hws_bwc_matcher { + struct mlx5hws_bwc_rule { + struct mlx5hws_bwc_matcher *bwc_matcher; + struct mlx5hws_rule *rule; +- struct mlx5hws_bwc_rule *isolated_bwc_rule; +- struct mlx5hws_bwc_complex_rule_hash_node *complex_hash_node; ++ struct mlx5hws_bwc_rule *next_subrule; ++ struct mlx5hws_bwc_complex_subrule_data *subrule_data; + u32 flow_source; + u16 bwc_queue_idx; + bool skip_rx; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +index 14e79579c719..660630f18ce9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.c +@@ -3,25 +3,27 @@ + + #include "internal.h" + +-#define HWS_CLEAR_MATCH_PARAM(mask, field) \ +- MLX5_SET(fte_match_param, (mask)->match_buf, field, 0) +- +-#define HWS_SZ_MATCH_PARAM (MLX5_ST_SZ_DW_MATCH_PARAM * 4) +- +-static const struct rhashtable_params hws_refcount_hash = { +- .key_len = sizeof_field(struct mlx5hws_bwc_complex_rule_hash_node, +- match_buf), +- .key_offset = offsetof(struct mlx5hws_bwc_complex_rule_hash_node, +- match_buf), +- .head_offset = offsetof(struct mlx5hws_bwc_complex_rule_hash_node, +- hash_node), +- .automatic_shrinking = true, +- .min_size = 1, ++/* We chain submatchers by applying three rules on a subrule: modify header (to ++ * set register C6), jump to table (to the next submatcher) and the mandatory ++ * last rule. ++ */ ++#define HWS_NUM_CHAIN_ACTIONS 3 ++ ++static const struct rhashtable_params hws_rules_hash_params = { ++ .key_len = sizeof_field(struct mlx5hws_bwc_complex_subrule_data, ++ match_tag), ++ .key_offset = ++ offsetof(struct mlx5hws_bwc_complex_subrule_data, match_tag), ++ .head_offset = ++ offsetof(struct mlx5hws_bwc_complex_subrule_data, hash_node), ++ .automatic_shrinking = true, .min_size = 1, + }; + +-bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, +- u8 match_criteria_enable, +- struct mlx5hws_match_parameters *mask) ++static bool ++hws_match_params_exceeds_definer(struct mlx5hws_context *ctx, ++ u8 match_criteria_enable, ++ struct mlx5hws_match_parameters *mask, ++ bool allow_jumbo) + { + struct mlx5hws_definer match_layout = {0}; + struct mlx5hws_match_template *mt; +@@ -36,11 +38,11 @@ bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, + mask->match_sz, + match_criteria_enable); + if (!mt) { +- mlx5hws_err(ctx, "BWC: failed creating match template\n"); ++ mlx5hws_err(ctx, "Complex matcher: failed creating match template\n"); + return false; + } + +- ret = mlx5hws_definer_calc_layout(ctx, mt, &match_layout); ++ ret = mlx5hws_definer_calc_layout(ctx, mt, &match_layout, allow_jumbo); + if (ret) { + /* The only case that we're interested in is E2BIG, + * which means that the match parameters need to be +@@ -64,825 +66,481 @@ bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, + return is_complex; + } + +-static void +-hws_bwc_matcher_complex_params_clear_fld(struct mlx5hws_context *ctx, +- enum mlx5hws_definer_fname fname, ++bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, ++ u8 match_criteria_enable, + struct mlx5hws_match_parameters *mask) + { +- struct mlx5hws_cmd_query_caps *caps = ctx->caps; +- +- switch (fname) { +- case MLX5HWS_DEFINER_FNAME_ETH_TYPE_O: +- case MLX5HWS_DEFINER_FNAME_ETH_TYPE_I: +- case MLX5HWS_DEFINER_FNAME_ETH_L3_TYPE_O: +- case MLX5HWS_DEFINER_FNAME_ETH_L3_TYPE_I: +- case MLX5HWS_DEFINER_FNAME_IP_VERSION_O: +- case MLX5HWS_DEFINER_FNAME_IP_VERSION_I: +- /* Because of the strict requirements for IP address matching +- * that require ethtype/ip_version matching as well, don't clear +- * these fields - have them in both parts of the complex matcher +- */ +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.smac_47_16); +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.smac_47_16); +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.smac_15_0); +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.smac_15_0); +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.dmac_47_16); +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.dmac_47_16); +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.dmac_15_0); +- break; +- case MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.dmac_15_0); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_TYPE_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.cvlan_tag); +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.svlan_tag); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_TYPE_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.cvlan_tag); +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.svlan_tag); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_FIRST_PRIO_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.first_prio); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_FIRST_PRIO_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.first_prio); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_CFI_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.first_cfi); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_CFI_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.first_cfi); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_ID_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.first_vid); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_ID_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.first_vid); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_TYPE_O: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.outer_second_cvlan_tag); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.outer_second_svlan_tag); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_TYPE_I: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.inner_second_cvlan_tag); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.inner_second_svlan_tag); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_PRIO_O: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.outer_second_prio); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_PRIO_I: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.inner_second_prio); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_CFI_O: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.outer_second_cfi); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_CFI_I: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.inner_second_cfi); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_ID_O: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.outer_second_vid); +- break; +- case MLX5HWS_DEFINER_FNAME_VLAN_SECOND_ID_I: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.inner_second_vid); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV4_IHL_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ipv4_ihl); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV4_IHL_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ipv4_ihl); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_DSCP_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ip_dscp); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_DSCP_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ip_dscp); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_ECN_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ip_ecn); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_ECN_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ip_ecn); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_TTL_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ttl_hoplimit); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_TTL_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ttl_hoplimit); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV4_DST_O: +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV4_SRC_O: +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV4_DST_I: +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV4_SRC_I: +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_FRAG_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.frag); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_FRAG_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.frag); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV6_FLOW_LABEL_O: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.outer_ipv6_flow_label); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV6_FLOW_LABEL_I: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.inner_ipv6_flow_label); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_O: +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_O: +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_O: +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_O: +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_127_96); +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_95_64); +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_63_32); +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_O: +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_O: +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_O: +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_O: +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_127_96); +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_95_64); +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_63_32); +- HWS_CLEAR_MATCH_PARAM(mask, +- outer_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_I: +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_I: +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_I: +- case MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_I: +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_127_96); +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_95_64); +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_63_32); +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.dst_ipv4_dst_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_I: +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_I: +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_I: +- case MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_I: +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_127_96); +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_95_64); +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_63_32); +- HWS_CLEAR_MATCH_PARAM(mask, +- inner_headers.src_ipv4_src_ipv6.ipv6_simple_layout.ipv6_31_0); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_PROTOCOL_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.ip_protocol); +- break; +- case MLX5HWS_DEFINER_FNAME_IP_PROTOCOL_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.ip_protocol); +- break; +- case MLX5HWS_DEFINER_FNAME_L4_SPORT_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.tcp_sport); +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.udp_sport); +- break; +- case MLX5HWS_DEFINER_FNAME_L4_SPORT_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.tcp_dport); +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.udp_dport); +- break; +- case MLX5HWS_DEFINER_FNAME_L4_DPORT_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.tcp_dport); +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.udp_dport); +- break; +- case MLX5HWS_DEFINER_FNAME_L4_DPORT_I: +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.tcp_dport); +- HWS_CLEAR_MATCH_PARAM(mask, inner_headers.udp_dport); +- break; +- case MLX5HWS_DEFINER_FNAME_TCP_FLAGS_O: +- HWS_CLEAR_MATCH_PARAM(mask, outer_headers.tcp_flags); +- break; +- case MLX5HWS_DEFINER_FNAME_TCP_ACK_NUM: +- case MLX5HWS_DEFINER_FNAME_TCP_SEQ_NUM: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.outer_tcp_seq_num); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.outer_tcp_ack_num); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.inner_tcp_seq_num); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.inner_tcp_ack_num); +- break; +- case MLX5HWS_DEFINER_FNAME_GTP_TEID: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_teid); +- break; +- case MLX5HWS_DEFINER_FNAME_GTP_MSG_TYPE: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_msg_type); +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_msg_flags); +- break; +- case MLX5HWS_DEFINER_FNAME_GTPU_FIRST_EXT_DW0: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.gtpu_first_ext_dw_0); +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_dw_0); +- break; +- case MLX5HWS_DEFINER_FNAME_GTPU_DW2: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.gtpu_dw_2); +- break; +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_0: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_1: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_2: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_3: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_4: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_5: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_6: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER_7: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_2.outer_first_mpls_over_gre); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_2.outer_first_mpls_over_udp); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.geneve_tlv_option_0_data); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_4.prog_sample_field_id_0); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_4.prog_sample_field_value_0); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_4.prog_sample_field_value_1); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_4.prog_sample_field_id_2); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_4.prog_sample_field_value_2); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_4.prog_sample_field_id_3); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_4.prog_sample_field_value_3); +- break; +- case MLX5HWS_DEFINER_FNAME_VXLAN_VNI: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.vxlan_vni); +- break; +- case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_FLAGS: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.outer_vxlan_gpe_flags); +- break; +- case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_RSVD0: +- break; +- case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_PROTO: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.outer_vxlan_gpe_next_protocol); +- break; +- case MLX5HWS_DEFINER_FNAME_VXLAN_GPE_VNI: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.outer_vxlan_gpe_vni); +- break; +- case MLX5HWS_DEFINER_FNAME_GENEVE_OPT_LEN: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.geneve_opt_len); +- break; +- case MLX5HWS_DEFINER_FNAME_GENEVE_OAM: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.geneve_oam); +- break; +- case MLX5HWS_DEFINER_FNAME_GENEVE_PROTO: +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.geneve_protocol_type); +- break; +- case MLX5HWS_DEFINER_FNAME_GENEVE_VNI: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.geneve_vni); +- break; +- case MLX5HWS_DEFINER_FNAME_SOURCE_QP: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.source_sqn); +- break; +- case MLX5HWS_DEFINER_FNAME_SOURCE_GVMI: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.source_port); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.source_eswitch_owner_vhca_id); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_0: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_0); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_1: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_1); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_2: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_2); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_3: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_3); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_4: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_4); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_5: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_5); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_7: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_c_7); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_A: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.metadata_reg_a); +- break; +- case MLX5HWS_DEFINER_FNAME_GRE_C: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_c_present); +- break; +- case MLX5HWS_DEFINER_FNAME_GRE_K: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_k_present); +- break; +- case MLX5HWS_DEFINER_FNAME_GRE_S: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_s_present); +- break; +- case MLX5HWS_DEFINER_FNAME_GRE_PROTOCOL: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_protocol); +- break; +- case MLX5HWS_DEFINER_FNAME_GRE_OPT_KEY: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters.gre_key.key); +- break; +- case MLX5HWS_DEFINER_FNAME_ICMP_DW1: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmp_header_data); +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmp_type); +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmp_code); +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters_3.icmpv6_header_data); +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmpv6_type); +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_3.icmpv6_code); +- break; +- case MLX5HWS_DEFINER_FNAME_MPLS0_O: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.outer_first_mpls); +- break; +- case MLX5HWS_DEFINER_FNAME_MPLS0_I: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_2.inner_first_mpls); +- break; +- case MLX5HWS_DEFINER_FNAME_TNL_HDR_0: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_0); +- break; +- case MLX5HWS_DEFINER_FNAME_TNL_HDR_1: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_1); +- break; +- case MLX5HWS_DEFINER_FNAME_TNL_HDR_2: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_2); +- break; +- case MLX5HWS_DEFINER_FNAME_TNL_HDR_3: +- HWS_CLEAR_MATCH_PARAM(mask, misc_parameters_5.tunnel_header_3); +- break; +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER0_OK: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER1_OK: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER2_OK: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER3_OK: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER4_OK: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER5_OK: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER6_OK: +- case MLX5HWS_DEFINER_FNAME_FLEX_PARSER7_OK: +- /* assuming this is flex parser for geneve option */ +- if ((fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER0_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 0) || +- (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER1_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 1) || +- (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER2_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 2) || +- (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER3_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 3) || +- (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER4_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 4) || +- (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER5_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 5) || +- (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER6_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 6) || +- (fname == MLX5HWS_DEFINER_FNAME_FLEX_PARSER7_OK && +- ctx->caps->flex_parser_id_geneve_tlv_option_0 != 7)) { +- mlx5hws_err(ctx, +- "Complex params: unsupported field %s (%d), flex parser ID for geneve is %d\n", +- mlx5hws_definer_fname_to_str(fname), fname, +- caps->flex_parser_id_geneve_tlv_option_0); +- break; +- } +- HWS_CLEAR_MATCH_PARAM(mask, +- misc_parameters.geneve_tlv_option_0_exist); +- break; +- case MLX5HWS_DEFINER_FNAME_REG_6: +- default: +- mlx5hws_err(ctx, "Complex params: unsupported field %s (%d)\n", +- mlx5hws_definer_fname_to_str(fname), fname); +- break; +- } ++ return hws_match_params_exceeds_definer(ctx, match_criteria_enable, ++ mask, true); + } + +-static bool +-hws_bwc_matcher_complex_params_comb_is_valid(struct mlx5hws_definer_fc *fc, +- int fc_sz, +- u32 combination_num) ++static int ++hws_get_last_set_dword_idx(const struct mlx5hws_match_parameters *mask) + { +- bool m1[MLX5HWS_DEFINER_FNAME_MAX] = {0}; +- bool m2[MLX5HWS_DEFINER_FNAME_MAX] = {0}; +- bool is_first_matcher; + int i; + +- for (i = 0; i < fc_sz; i++) { +- is_first_matcher = !(combination_num & BIT(i)); +- if (is_first_matcher) +- m1[fc[i].fname] = true; +- else +- m2[fc[i].fname] = true; +- } +- +- /* Not all the fields can be split into separate matchers. +- * Some should be together on the same matcher. +- * For example, IPv6 parts - the whole IPv6 address should be on the +- * same matcher in order for us to deduce if it's IPv6 or IPv4 address. +- */ +- if (m1[MLX5HWS_DEFINER_FNAME_IP_FRAG_O] && +- (m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_O] || +- m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_O] || +- m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_O] || +- m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_O])) +- return false; +- +- if (m2[MLX5HWS_DEFINER_FNAME_IP_FRAG_O] && +- (m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_O] || +- m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_O] || +- m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_O] || +- m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_O])) +- return false; ++ for (i = mask->match_sz / 4 - 1; i >= 0; i--) ++ if (mask->match_buf[i]) ++ return i; + +- if (m1[MLX5HWS_DEFINER_FNAME_IP_FRAG_I] && +- (m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_I] || +- m2[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_I] || +- m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_I] || +- m2[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_I])) +- return false; ++ return -1; ++} + +- if (m2[MLX5HWS_DEFINER_FNAME_IP_FRAG_I] && +- (m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_47_16_I] || +- m1[MLX5HWS_DEFINER_FNAME_ETH_SMAC_15_0_I] || +- m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_47_16_I] || +- m1[MLX5HWS_DEFINER_FNAME_ETH_DMAC_15_0_I])) +- return false; ++static bool hws_match_mask_is_empty(const struct mlx5hws_match_parameters *mask) ++{ ++ return hws_get_last_set_dword_idx(mask) == -1; ++} + +- /* Don't split outer IPv6 dest address. */ +- if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_O] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_O] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_O] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_O]) && +- (m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_O] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_O] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_O] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_O])) +- return false; ++static bool hws_dword_is_inner_ipaddr_off(int dword_off) ++{ ++ /* IPv4 and IPv6 addresses share the same entry via a union, and the ++ * source and dest addresses are contiguous in the fte_match_param. So ++ * we need to check 8 words. ++ */ ++ static const int inner_ip_dword_off = ++ __mlx5_dw_off(fte_match_param, inner_headers.src_ipv4_src_ipv6); + +- /* Don't split outer IPv6 source address. */ +- if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_O] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_O] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_O] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_O]) && +- (m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_O] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_O] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_O] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_O])) +- return false; ++ return dword_off >= inner_ip_dword_off && ++ dword_off < inner_ip_dword_off + 8; ++} + +- /* Don't split inner IPv6 dest address. */ +- if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_I] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_I] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_I] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_I]) && +- (m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_127_96_I] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_95_64_I] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_63_32_I] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_DST_31_0_I])) +- return false; ++static bool hws_dword_is_outer_ipaddr_off(int dword_off) ++{ ++ static const int outer_ip_dword_off = ++ __mlx5_dw_off(fte_match_param, outer_headers.src_ipv4_src_ipv6); + +- /* Don't split inner IPv6 source address. */ +- if ((m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_I] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_I] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_I] || +- m1[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_I]) && +- (m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_127_96_I] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_95_64_I] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_63_32_I] || +- m2[MLX5HWS_DEFINER_FNAME_IPV6_SRC_31_0_I])) +- return false; ++ return dword_off >= outer_ip_dword_off && ++ dword_off < outer_ip_dword_off + 8; ++} + +- /* Don't split GRE parameters. */ +- if ((m1[MLX5HWS_DEFINER_FNAME_GRE_C] || +- m1[MLX5HWS_DEFINER_FNAME_GRE_K] || +- m1[MLX5HWS_DEFINER_FNAME_GRE_S] || +- m1[MLX5HWS_DEFINER_FNAME_GRE_PROTOCOL]) && +- (m2[MLX5HWS_DEFINER_FNAME_GRE_C] || +- m2[MLX5HWS_DEFINER_FNAME_GRE_K] || +- m2[MLX5HWS_DEFINER_FNAME_GRE_S] || +- m2[MLX5HWS_DEFINER_FNAME_GRE_PROTOCOL])) +- return false; ++static void hws_add_dword_to_mask(struct mlx5hws_match_parameters *mask, ++ const struct mlx5hws_match_parameters *orig, ++ int dword_idx, bool *added_inner_ipv, ++ bool *added_outer_ipv) ++{ ++ mask->match_buf[dword_idx] |= orig->match_buf[dword_idx]; + +- /* Don't split TCP ack/seq numbers. */ +- if ((m1[MLX5HWS_DEFINER_FNAME_TCP_ACK_NUM] || +- m1[MLX5HWS_DEFINER_FNAME_TCP_SEQ_NUM]) && +- (m2[MLX5HWS_DEFINER_FNAME_TCP_ACK_NUM] || +- m2[MLX5HWS_DEFINER_FNAME_TCP_SEQ_NUM])) +- return false; ++ *added_inner_ipv = false; ++ *added_outer_ipv = false; + +- /* Don't split flex parser. */ +- if ((m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_0] || +- m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_1] || +- m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_2] || +- m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_3] || +- m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_4] || +- m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_5] || +- m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_6] || +- m1[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_7]) && +- (m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_0] || +- m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_1] || +- m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_2] || +- m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_3] || +- m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_4] || +- m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_5] || +- m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_6] || +- m2[MLX5HWS_DEFINER_FNAME_FLEX_PARSER_7])) +- return false; ++ /* Any IP address fragment must be accompanied by a match on IP version. ++ * Use the `added_ipv` variables to keep track if we added IP versions ++ * specifically for this dword, so that we can roll them back if the ++ * match params become too large to fit into a definer. ++ */ ++ if (hws_dword_is_inner_ipaddr_off(dword_idx) && ++ !MLX5_GET(fte_match_param, mask->match_buf, ++ inner_headers.ip_version)) { ++ MLX5_SET(fte_match_param, mask->match_buf, ++ inner_headers.ip_version, 0xf); ++ *added_inner_ipv = true; ++ } ++ if (hws_dword_is_outer_ipaddr_off(dword_idx) && ++ !MLX5_GET(fte_match_param, mask->match_buf, ++ outer_headers.ip_version)) { ++ MLX5_SET(fte_match_param, mask->match_buf, ++ outer_headers.ip_version, 0xf); ++ *added_outer_ipv = true; ++ } ++} + +- return true; ++static void hws_remove_dword_from_mask(struct mlx5hws_match_parameters *mask, ++ int dword_idx, bool added_inner_ipv, ++ bool added_outer_ipv) ++{ ++ mask->match_buf[dword_idx] = 0; ++ if (added_inner_ipv) ++ MLX5_SET(fte_match_param, mask->match_buf, ++ inner_headers.ip_version, 0); ++ if (added_outer_ipv) ++ MLX5_SET(fte_match_param, mask->match_buf, ++ outer_headers.ip_version, 0); + } + +-static void +-hws_bwc_matcher_complex_params_comb_create(struct mlx5hws_context *ctx, +- struct mlx5hws_match_parameters *m, +- struct mlx5hws_match_parameters *m1, +- struct mlx5hws_match_parameters *m2, +- struct mlx5hws_definer_fc *fc, +- int fc_sz, +- u32 combination_num) ++/* Avoid leaving a single lower dword in `mask` if there are others present in ++ * `orig`. Splitting IPv6 addresses like this causes them to be interpreted as ++ * IPv4. ++ */ ++static void hws_avoid_ipv6_split_of(struct mlx5hws_match_parameters *orig, ++ struct mlx5hws_match_parameters *mask, ++ int off) + { +- bool is_first_matcher; +- int i; ++ /* Masks are allocated to a full fte_match_param, but it can't hurt to ++ * double check. ++ */ ++ if (orig->match_sz <= off + 3 || mask->match_sz <= off + 3) ++ return; + +- memcpy(m1->match_buf, m->match_buf, m->match_sz); +- memcpy(m2->match_buf, m->match_buf, m->match_sz); ++ /* Lower dword is not set, nothing to do. */ ++ if (!mask->match_buf[off + 3]) ++ return; + +- for (i = 0; i < fc_sz; i++) { +- is_first_matcher = !(combination_num & BIT(i)); +- hws_bwc_matcher_complex_params_clear_fld(ctx, +- fc[i].fname, +- is_first_matcher ? +- m2 : m1); +- } ++ /* Higher dwords also present in `mask`, no ambiguity. */ ++ if (mask->match_buf[off] || mask->match_buf[off + 1] || ++ mask->match_buf[off + 2]) ++ return; ++ ++ /* There are no higher dwords in `orig`, i.e. we match on IPv4. */ ++ if (!orig->match_buf[off] && !orig->match_buf[off + 1] && ++ !orig->match_buf[off + 2]) ++ return; + +- MLX5_SET(fte_match_param, m2->match_buf, +- misc_parameters_2.metadata_reg_c_6, -1); ++ /* Put the lower dword back in `orig`. It is always safe to do this, the ++ * dword will just be picked up in the next submask. ++ */ ++ orig->match_buf[off + 3] = mask->match_buf[off + 3]; ++ mask->match_buf[off + 3] = 0; + } + +-static void +-hws_bwc_matcher_complex_params_destroy(struct mlx5hws_match_parameters *mask_1, +- struct mlx5hws_match_parameters *mask_2) ++static void hws_avoid_ipv6_split(struct mlx5hws_match_parameters *orig, ++ struct mlx5hws_match_parameters *mask) + { +- kfree(mask_1->match_buf); +- kfree(mask_2->match_buf); ++ hws_avoid_ipv6_split_of(orig, mask, ++ __mlx5_dw_off(fte_match_param, ++ outer_headers.src_ipv4_src_ipv6)); ++ hws_avoid_ipv6_split_of(orig, mask, ++ __mlx5_dw_off(fte_match_param, ++ outer_headers.dst_ipv4_dst_ipv6)); ++ hws_avoid_ipv6_split_of(orig, mask, ++ __mlx5_dw_off(fte_match_param, ++ inner_headers.src_ipv4_src_ipv6)); ++ hws_avoid_ipv6_split_of(orig, mask, ++ __mlx5_dw_off(fte_match_param, ++ inner_headers.dst_ipv4_dst_ipv6)); + } + +-static int +-hws_bwc_matcher_complex_params_create(struct mlx5hws_context *ctx, +- u8 match_criteria, +- struct mlx5hws_match_parameters *mask, +- struct mlx5hws_match_parameters *mask_1, +- struct mlx5hws_match_parameters *mask_2) ++/* Build a subset of the `orig` match parameters into `mask`. This subset is ++ * guaranteed to fit in a single definer an as such is a candidate for being a ++ * part of a complex matcher. Upon successful execution, the match params that ++ * go into `mask` are cleared from `orig`. ++ */ ++static int hws_get_simple_params(struct mlx5hws_context *ctx, u8 match_criteria, ++ struct mlx5hws_match_parameters *orig, ++ struct mlx5hws_match_parameters *mask) + { +- struct mlx5hws_definer_fc *fc; +- u32 num_of_combinations; +- int fc_sz = 0; +- int res = 0; +- u32 i; +- +- if (MLX5_GET(fte_match_param, mask->match_buf, +- misc_parameters_2.metadata_reg_c_6)) { +- mlx5hws_err(ctx, "Complex matcher: REG_C_6 matching is reserved\n"); +- res = -EINVAL; +- goto out; +- } ++ bool added_inner_ipv, added_outer_ipv; ++ int dword_idx; ++ u32 *backup; ++ int ret; + +- mask_1->match_buf = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), +- GFP_KERNEL); +- mask_2->match_buf = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), +- GFP_KERNEL); +- if (!mask_1->match_buf || !mask_2->match_buf) { +- mlx5hws_err(ctx, "Complex matcher: failed to allocate match_param\n"); +- res = -ENOMEM; +- goto free_params; +- } ++ dword_idx = hws_get_last_set_dword_idx(orig); ++ /* Nothing to do, we consumed all of the match params before. */ ++ if (dword_idx == -1) ++ return 0; + +- mask_1->match_sz = mask->match_sz; +- mask_2->match_sz = mask->match_sz; ++ backup = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL); ++ if (!backup) ++ return -ENOMEM; + +- fc = mlx5hws_definer_conv_match_params_to_compressed_fc(ctx, +- match_criteria, +- mask->match_buf, +- &fc_sz); +- if (!fc) { +- res = -ENOMEM; +- goto free_params; +- } ++ while (1) { ++ dword_idx = hws_get_last_set_dword_idx(orig); ++ /* Nothing to do, we consumed all of the original match params ++ * into this subset, which still fits into a single matcher. ++ */ ++ if (dword_idx == -1) { ++ ret = 0; ++ goto free_backup; ++ } + +- if (fc_sz >= sizeof(num_of_combinations) * BITS_PER_BYTE) { +- mlx5hws_err(ctx, +- "Complex matcher: too many match parameters (%d)\n", +- fc_sz); +- res = -EINVAL; +- goto free_fc; ++ memcpy(backup, mask->match_buf, mask->match_sz); ++ ++ /* Try to add this dword to the current subset. */ ++ hws_add_dword_to_mask(mask, orig, dword_idx, &added_inner_ipv, ++ &added_outer_ipv); ++ ++ if (hws_match_params_exceeds_definer(ctx, match_criteria, mask, ++ false)) { ++ /* We just added a match param that makes the definer ++ * too large. Revert and return what we had before. ++ * Note that we can't just zero out the affected fields, ++ * because it's possible that the dword we're looking at ++ * wasn't zero before (e.g. it included auto-added ++ * matches in IP version. This is why we employ the ++ * rather cumbersome memcpy for backing up. ++ */ ++ memcpy(mask->match_buf, backup, mask->match_sz); ++ /* Possible future improvement: We can't add any more ++ * dwords, but it may be possible to squeeze in ++ * individual bytes, as definers have special slots for ++ * those. ++ * ++ * For now, keep the code simple. This results in an ++ * extra submatcher in some cases, but it's good enough. ++ */ ++ ret = 0; ++ break; ++ } ++ ++ /* The current subset of match params still fits in a single ++ * definer. Remove the dword from the original mask. ++ * ++ * Also remove any explicit match on IP version if we just ++ * included one here. We will still automatically add it to ++ * accompany any IP address fragment, but do not need to ++ * consider it by itself. ++ */ ++ hws_remove_dword_from_mask(orig, dword_idx, added_inner_ipv, ++ added_outer_ipv); + } + +- /* We have list of all the match fields from the match parameter. +- * Now try all the possibilities of splitting them into two match +- * buffers and look for the supported combination. ++ /* Make sure we have not picked up a single lower dword of an IPv6 ++ * address, as the firmware will erroneously treat it as an IPv4 ++ * address. + */ +- num_of_combinations = 1 << fc_sz; ++ hws_avoid_ipv6_split(orig, mask); + +- /* Start from combination at index 1 - we know that 0 is unsupported */ +- for (i = 1; i < num_of_combinations; i++) { +- if (!hws_bwc_matcher_complex_params_comb_is_valid(fc, fc_sz, i)) +- continue; ++free_backup: ++ kfree(backup); + +- hws_bwc_matcher_complex_params_comb_create(ctx, +- mask, mask_1, mask_2, +- fc, fc_sz, i); +- /* We now have two separate sets of match params. +- * Check if each of them can be used in its own matcher. ++ return ret; ++} ++ ++static int ++hws_bwc_matcher_split_mask(struct mlx5hws_context *ctx, u8 match_criteria, ++ const struct mlx5hws_match_parameters *mask, ++ struct mlx5hws_match_parameters *submasks, ++ int *num_submasks) ++{ ++ struct mlx5hws_match_parameters mask_copy; ++ int ret, i = 0; ++ ++ mask_copy.match_sz = MLX5_ST_SZ_BYTES(fte_match_param); ++ mask_copy.match_buf = kzalloc(mask_copy.match_sz, GFP_KERNEL); ++ if (!mask_copy.match_buf) ++ return -ENOMEM; ++ ++ memcpy(mask_copy.match_buf, mask->match_buf, mask->match_sz); ++ ++ while (!hws_match_mask_is_empty(&mask_copy)) { ++ if (i >= MLX5HWS_BWC_COMPLEX_MAX_SUBMATCHERS) { ++ mlx5hws_err(ctx, ++ "Complex matcher: mask too large for %d matchers\n", ++ MLX5HWS_BWC_COMPLEX_MAX_SUBMATCHERS); ++ ret = -E2BIG; ++ goto free_copy; ++ } ++ /* All but the first matcher need to match on register C6 to ++ * connect pieces of the complex rule together. + */ +- if (!mlx5hws_bwc_match_params_is_complex(ctx, +- match_criteria, +- mask_1) && +- !mlx5hws_bwc_match_params_is_complex(ctx, +- match_criteria, +- mask_2)) +- break; ++ if (i > 0) { ++ MLX5_SET(fte_match_param, submasks[i].match_buf, ++ misc_parameters_2.metadata_reg_c_6, -1); ++ match_criteria |= MLX5HWS_DEFINER_MATCH_CRITERIA_MISC2; ++ } ++ ret = hws_get_simple_params(ctx, match_criteria, &mask_copy, ++ &submasks[i]); ++ if (ret < 0) ++ goto free_copy; ++ i++; + } + +- if (i == num_of_combinations) { +- /* We've scanned all the combinations, but to no avail */ +- mlx5hws_err(ctx, "Complex matcher: couldn't find match params combination\n"); +- res = -EINVAL; +- goto free_fc; +- } ++ *num_submasks = i; ++ ret = 0; + +- kfree(fc); +- return 0; ++free_copy: ++ kfree(mask_copy.match_buf); + +-free_fc: +- kfree(fc); +-free_params: +- hws_bwc_matcher_complex_params_destroy(mask_1, mask_2); +-out: +- return res; ++ return ret; + } + +-static int +-hws_bwc_isolated_table_create(struct mlx5hws_bwc_matcher *bwc_matcher, +- struct mlx5hws_table *table) ++static struct mlx5hws_table * ++hws_isolated_table_create(const struct mlx5hws_bwc_matcher *cmatcher) + { ++ struct mlx5hws_bwc_complex_submatcher *first_subm; + struct mlx5hws_cmd_ft_modify_attr ft_attr = {0}; +- struct mlx5hws_context *ctx = table->ctx; + struct mlx5hws_table_attr tbl_attr = {0}; +- struct mlx5hws_table *isolated_tbl; +- int ret = 0; ++ struct mlx5hws_table *orig_tbl; ++ struct mlx5hws_context *ctx; ++ struct mlx5hws_table *tbl; ++ int ret; + +- tbl_attr.type = table->type; +- tbl_attr.level = table->level; ++ first_subm = &cmatcher->complex->submatchers[0]; ++ orig_tbl = first_subm->tbl; ++ ctx = orig_tbl->ctx; + +- bwc_matcher->complex->isolated_tbl = +- mlx5hws_table_create(ctx, &tbl_attr); +- isolated_tbl = bwc_matcher->complex->isolated_tbl; +- if (!isolated_tbl) +- return -EINVAL; ++ tbl_attr.type = orig_tbl->type; ++ tbl_attr.level = orig_tbl->level; ++ tbl = mlx5hws_table_create(ctx, &tbl_attr); ++ if (!tbl) ++ return ERR_PTR(-EINVAL); + +- /* Set the default miss of the isolated table to +- * point to the end anchor of the original matcher. ++ /* Set the default miss of the isolated table to point ++ * to the end anchor of the original matcher. + */ +- mlx5hws_cmd_set_attr_connect_miss_tbl(ctx, +- isolated_tbl->fw_ft_type, +- isolated_tbl->type, +- &ft_attr); +- ft_attr.table_miss_id = bwc_matcher->matcher->end_ft_id; +- +- ret = mlx5hws_cmd_flow_table_modify(ctx->mdev, +- &ft_attr, +- isolated_tbl->ft_id); ++ mlx5hws_cmd_set_attr_connect_miss_tbl(ctx, tbl->fw_ft_type, ++ tbl->type, &ft_attr); ++ ft_attr.table_miss_id = first_subm->bwc_matcher->matcher->end_ft_id; ++ ++ ret = mlx5hws_cmd_flow_table_modify(ctx->mdev, &ft_attr, tbl->ft_id); + if (ret) { +- mlx5hws_err(ctx, "Failed setting isolated tbl default miss\n"); ++ mlx5hws_err(ctx, "Complex matcher: failed to set isolated tbl default miss\n"); + goto destroy_tbl; + } + +- return 0; ++ return tbl; + + destroy_tbl: +- mlx5hws_table_destroy(isolated_tbl); +- return ret; ++ mlx5hws_table_destroy(tbl); ++ ++ return ERR_PTR(ret); + } + +-static void hws_bwc_isolated_table_destroy(struct mlx5hws_table *isolated_tbl) ++static int hws_submatcher_init_first(struct mlx5hws_bwc_matcher *cmatcher, ++ struct mlx5hws_table *table, u32 priority, ++ u8 match_criteria, ++ struct mlx5hws_match_parameters *mask) + { +- /* This table is isolated - no table is pointing to it, no need to +- * disconnect it from anywhere, it won't affect any other table's miss. ++ enum mlx5hws_action_type action_types[HWS_NUM_CHAIN_ACTIONS]; ++ struct mlx5hws_bwc_complex_submatcher *subm; ++ int ret; ++ ++ subm = &cmatcher->complex->submatchers[0]; ++ ++ /* The first submatcher lives in the original table and does not have an ++ * associated jump to table action. It also points to the outer complex ++ * matcher. + */ +- mlx5hws_table_destroy(isolated_tbl); ++ subm->tbl = table; ++ subm->action_tbl = NULL; ++ subm->bwc_matcher = cmatcher; ++ ++ action_types[0] = MLX5HWS_ACTION_TYP_MODIFY_HDR; ++ action_types[1] = MLX5HWS_ACTION_TYP_TBL; ++ action_types[2] = MLX5HWS_ACTION_TYP_LAST; ++ ++ ret = mlx5hws_bwc_matcher_create_simple(subm->bwc_matcher, subm->tbl, ++ priority, match_criteria, mask, ++ action_types); ++ if (ret) ++ return ret; ++ ++ subm->bwc_matcher->matcher_type = MLX5HWS_BWC_MATCHER_COMPLEX_FIRST; ++ ++ ret = rhashtable_init(&subm->rules_hash, &hws_rules_hash_params); ++ if (ret) ++ goto destroy_matcher; ++ mutex_init(&subm->hash_lock); ++ ida_init(&subm->chain_ida); ++ ++ return 0; ++ ++destroy_matcher: ++ mlx5hws_bwc_matcher_destroy_simple(subm->bwc_matcher); ++ ++ return ret; + } + +-static int +-hws_bwc_isolated_matcher_create(struct mlx5hws_bwc_matcher *bwc_matcher, +- struct mlx5hws_table *table, +- u8 match_criteria_enable, +- struct mlx5hws_match_parameters *mask) ++static int hws_submatcher_init(struct mlx5hws_bwc_matcher *cmatcher, int idx, ++ struct mlx5hws_table *table, u32 priority, ++ u8 match_criteria, ++ struct mlx5hws_match_parameters *mask) + { +- struct mlx5hws_table *isolated_tbl = bwc_matcher->complex->isolated_tbl; +- struct mlx5hws_bwc_matcher *isolated_bwc_matcher; +- struct mlx5hws_context *ctx = table->ctx; ++ enum mlx5hws_action_type action_types[HWS_NUM_CHAIN_ACTIONS]; ++ struct mlx5hws_bwc_complex_submatcher *subm; ++ bool is_last; + int ret; + +- isolated_bwc_matcher = kzalloc(sizeof(*bwc_matcher), GFP_KERNEL); +- if (!isolated_bwc_matcher) +- return -ENOMEM; ++ if (!idx) ++ return hws_submatcher_init_first(cmatcher, table, priority, ++ match_criteria, mask); ++ ++ subm = &cmatcher->complex->submatchers[idx]; ++ is_last = idx == cmatcher->complex->num_submatchers - 1; ++ ++ subm->tbl = hws_isolated_table_create(cmatcher); ++ if (IS_ERR(subm->tbl)) ++ return PTR_ERR(subm->tbl); ++ ++ subm->action_tbl = ++ mlx5hws_action_create_dest_table(subm->tbl->ctx, subm->tbl, ++ MLX5HWS_ACTION_FLAG_HWS_FDB); ++ if (!subm->action_tbl) { ++ ret = -EINVAL; ++ goto destroy_tbl; ++ } ++ ++ subm->bwc_matcher = kzalloc(sizeof(*subm->bwc_matcher), GFP_KERNEL); ++ if (!subm->bwc_matcher) { ++ ret = -ENOMEM; ++ goto destroy_action; ++ } + +- bwc_matcher->complex->isolated_bwc_matcher = isolated_bwc_matcher; ++ /* Every matcher other than the first also matches of register C6 to ++ * bind subrules together in the complex rule using the chain ids. ++ */ ++ match_criteria |= MLX5HWS_DEFINER_MATCH_CRITERIA_MISC2; + +- /* Isolated BWC matcher needs access to the first BWC matcher */ +- isolated_bwc_matcher->complex_first_bwc_matcher = bwc_matcher; ++ action_types[0] = MLX5HWS_ACTION_TYP_MODIFY_HDR; ++ action_types[1] = MLX5HWS_ACTION_TYP_TBL; ++ action_types[2] = MLX5HWS_ACTION_TYP_LAST; + +- /* Isolated matcher needs to match on REG_C_6, +- * so make sure its criteria bit is on. ++ /* Every matcher other than the last sets register C6 and jumps to the ++ * next submatcher's table. The final submatcher will use the ++ * user-supplied actions and will attach an action template at rule ++ * insertion time. + */ +- match_criteria_enable |= MLX5HWS_DEFINER_MATCH_CRITERIA_MISC2; +- +- ret = mlx5hws_bwc_matcher_create_simple(isolated_bwc_matcher, +- isolated_tbl, +- 0, +- match_criteria_enable, +- mask, +- NULL); +- if (ret) { +- mlx5hws_err(ctx, "Complex matcher: failed creating isolated BWC matcher\n"); ++ ret = mlx5hws_bwc_matcher_create_simple(subm->bwc_matcher, subm->tbl, ++ priority, match_criteria, mask, ++ is_last ? NULL : action_types); ++ if (ret) + goto free_matcher; +- } ++ ++ subm->bwc_matcher->matcher_type = ++ MLX5HWS_BWC_MATCHER_COMPLEX_SUBMATCHER; ++ ++ ret = rhashtable_init(&subm->rules_hash, &hws_rules_hash_params); ++ if (ret) ++ goto destroy_matcher; ++ mutex_init(&subm->hash_lock); ++ ida_init(&subm->chain_ida); + + return 0; + ++destroy_matcher: ++ mlx5hws_bwc_matcher_destroy_simple(subm->bwc_matcher); + free_matcher: +- kfree(bwc_matcher->complex->isolated_bwc_matcher); ++ kfree(subm->bwc_matcher); ++destroy_action: ++ mlx5hws_action_destroy(subm->action_tbl); ++destroy_tbl: ++ mlx5hws_table_destroy(subm->tbl); ++ + return ret; + } + +-static void +-hws_bwc_isolated_matcher_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) ++static void hws_submatcher_destroy(struct mlx5hws_bwc_matcher *cmatcher, ++ int idx) + { +- mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); +- kfree(bwc_matcher); ++ struct mlx5hws_bwc_complex_submatcher *subm; ++ ++ subm = &cmatcher->complex->submatchers[idx]; ++ ++ ida_destroy(&subm->chain_ida); ++ mutex_destroy(&subm->hash_lock); ++ rhashtable_destroy(&subm->rules_hash); ++ ++ if (subm->bwc_matcher) { ++ mlx5hws_bwc_matcher_destroy_simple(subm->bwc_matcher); ++ if (idx) ++ kfree(subm->bwc_matcher); ++ } ++ ++ /* We own all of the isolated tables, but not the original one. */ ++ if (idx) { ++ mlx5hws_action_destroy(subm->action_tbl); ++ mlx5hws_table_destroy(subm->tbl); ++ } + } + + static int +-hws_bwc_isolated_actions_create(struct mlx5hws_bwc_matcher *bwc_matcher, +- struct mlx5hws_table *table) ++hws_complex_data_actions_init(struct mlx5hws_bwc_matcher_complex_data *cdata) + { +- struct mlx5hws_table *isolated_tbl = bwc_matcher->complex->isolated_tbl; ++ struct mlx5hws_context *ctx = cdata->submatchers[0].tbl->ctx; + u8 modify_hdr_action[MLX5_ST_SZ_BYTES(set_action_in)] = {0}; +- struct mlx5hws_context *ctx = table->ctx; + struct mlx5hws_action_mh_pattern ptrn; + int ret = 0; + +- /* Create action to jump to isolated table */ +- +- bwc_matcher->complex->action_go_to_tbl = +- mlx5hws_action_create_dest_table(ctx, +- isolated_tbl, +- MLX5HWS_ACTION_FLAG_HWS_FDB); +- if (!bwc_matcher->complex->action_go_to_tbl) { +- mlx5hws_err(ctx, "Complex matcher: failed to create go-to-tbl action\n"); +- return -EINVAL; +- } +- + /* Create modify header action to set REG_C_6 */ +- + MLX5_SET(set_action_in, modify_hdr_action, + action_type, MLX5_MODIFICATION_TYPE_SET); + MLX5_SET(set_action_in, modify_hdr_action, +@@ -895,19 +553,18 @@ hws_bwc_isolated_actions_create(struct mlx5hws_bwc_matcher *bwc_matcher, + ptrn.data = (void *)modify_hdr_action; + ptrn.sz = MLX5HWS_ACTION_DOUBLE_SIZE; + +- bwc_matcher->complex->action_metadata = ++ cdata->action_metadata = + mlx5hws_action_create_modify_header(ctx, 1, &ptrn, 0, + MLX5HWS_ACTION_FLAG_HWS_FDB); +- if (!bwc_matcher->complex->action_metadata) { +- ret = -EINVAL; +- goto destroy_action_go_to_tbl; ++ if (!cdata->action_metadata) { ++ mlx5hws_err(ctx, "Complex matcher: failed to create set reg C6 action\n"); ++ return -EINVAL; + } + + /* Create last action */ +- +- bwc_matcher->complex->action_last = ++ cdata->action_last = + mlx5hws_action_create_last(ctx, MLX5HWS_ACTION_FLAG_HWS_FDB); +- if (!bwc_matcher->complex->action_last) { ++ if (!cdata->action_last) { + mlx5hws_err(ctx, "Complex matcher: failed to create last action\n"); + ret = -EINVAL; + goto destroy_action_metadata; +@@ -916,196 +573,130 @@ hws_bwc_isolated_actions_create(struct mlx5hws_bwc_matcher *bwc_matcher, + return 0; + + destroy_action_metadata: +- mlx5hws_action_destroy(bwc_matcher->complex->action_metadata); +-destroy_action_go_to_tbl: +- mlx5hws_action_destroy(bwc_matcher->complex->action_go_to_tbl); ++ mlx5hws_action_destroy(cdata->action_metadata); ++ + return ret; + } + + static void +-hws_bwc_isolated_actions_destroy(struct mlx5hws_bwc_matcher *bwc_matcher) ++hws_complex_data_actions_destroy(struct mlx5hws_bwc_matcher_complex_data *cdata) + { +- mlx5hws_action_destroy(bwc_matcher->complex->action_last); +- mlx5hws_action_destroy(bwc_matcher->complex->action_metadata); +- mlx5hws_action_destroy(bwc_matcher->complex->action_go_to_tbl); ++ mlx5hws_action_destroy(cdata->action_last); ++ mlx5hws_action_destroy(cdata->action_metadata); + } + + int mlx5hws_bwc_matcher_create_complex(struct mlx5hws_bwc_matcher *bwc_matcher, + struct mlx5hws_table *table, +- u32 priority, +- u8 match_criteria_enable, ++ u32 priority, u8 match_criteria_enable, + struct mlx5hws_match_parameters *mask) + { +- enum mlx5hws_action_type complex_init_action_types[3]; +- struct mlx5hws_bwc_matcher *isolated_bwc_matcher; +- struct mlx5hws_match_parameters mask_1 = {0}; +- struct mlx5hws_match_parameters mask_2 = {0}; ++ struct mlx5hws_match_parameters ++ submasks[MLX5HWS_BWC_COMPLEX_MAX_SUBMATCHERS] = {0}; ++ struct mlx5hws_bwc_matcher_complex_data *cdata; + struct mlx5hws_context *ctx = table->ctx; +- int ret; +- +- ret = hws_bwc_matcher_complex_params_create(table->ctx, +- match_criteria_enable, +- mask, &mask_1, &mask_2); +- if (ret) +- goto err; +- +- bwc_matcher->complex = +- kzalloc(sizeof(*bwc_matcher->complex), GFP_KERNEL); +- if (!bwc_matcher->complex) { +- ret = -ENOMEM; +- goto free_masks; +- } ++ int num_submatchers; ++ int i, ret; + +- ret = rhashtable_init(&bwc_matcher->complex->refcount_hash, +- &hws_refcount_hash); +- if (ret) { +- mlx5hws_err(ctx, "Complex matcher: failed to initialize rhashtable\n"); +- goto free_complex; ++ for (i = 0; i < ARRAY_SIZE(submasks); i++) { ++ submasks[i].match_sz = MLX5_ST_SZ_BYTES(fte_match_param); ++ submasks[i].match_buf = kzalloc(submasks[i].match_sz, ++ GFP_KERNEL); ++ if (!submasks[i].match_buf) { ++ ret = -ENOMEM; ++ goto free_submasks; ++ } + } + +- mutex_init(&bwc_matcher->complex->hash_lock); +- ida_init(&bwc_matcher->complex->metadata_ida); +- +- /* Create initial action template for the first matcher. +- * Usually the initial AT is just dummy, but in case of complex +- * matcher we know exactly which actions should it have. +- */ +- +- complex_init_action_types[0] = MLX5HWS_ACTION_TYP_MODIFY_HDR; +- complex_init_action_types[1] = MLX5HWS_ACTION_TYP_TBL; +- complex_init_action_types[2] = MLX5HWS_ACTION_TYP_LAST; +- +- /* Create the first matcher */ +- +- ret = mlx5hws_bwc_matcher_create_simple(bwc_matcher, +- table, +- priority, +- match_criteria_enable, +- &mask_1, +- complex_init_action_types); ++ ret = hws_bwc_matcher_split_mask(ctx, match_criteria_enable, mask, ++ submasks, &num_submatchers); + if (ret) +- goto destroy_ida; +- +- /* Create isolated table to hold the second isolated matcher */ ++ goto free_submasks; + +- ret = hws_bwc_isolated_table_create(bwc_matcher, table); +- if (ret) { +- mlx5hws_err(ctx, "Complex matcher: failed creating isolated table\n"); +- goto destroy_first_matcher; ++ cdata = kzalloc(sizeof(*cdata), GFP_KERNEL); ++ if (!cdata) { ++ ret = -ENOMEM; ++ goto free_submasks; + } + +- /* Now create the second BWC matcher - the isolated one */ ++ bwc_matcher->complex = cdata; ++ cdata->num_submatchers = num_submatchers; + +- ret = hws_bwc_isolated_matcher_create(bwc_matcher, table, +- match_criteria_enable, &mask_2); +- if (ret) { +- mlx5hws_err(ctx, "Complex matcher: failed creating isolated matcher\n"); +- goto destroy_isolated_tbl; ++ for (i = 0; i < num_submatchers; i++) { ++ ret = hws_submatcher_init(bwc_matcher, i, table, priority, ++ match_criteria_enable, &submasks[i]); ++ if (ret) ++ goto destroy_submatchers; + } + +- /* Create action for isolated matcher's rules */ +- +- ret = hws_bwc_isolated_actions_create(bwc_matcher, table); +- if (ret) { +- mlx5hws_err(ctx, "Complex matcher: failed creating isolated actions\n"); +- goto destroy_isolated_matcher; +- } ++ ret = hws_complex_data_actions_init(cdata); ++ if (ret) ++ goto destroy_submatchers; + +- hws_bwc_matcher_complex_params_destroy(&mask_1, &mask_2); +- return 0; ++ ret = 0; ++ goto free_submasks; + +-destroy_isolated_matcher: +- isolated_bwc_matcher = bwc_matcher->complex->isolated_bwc_matcher; +- hws_bwc_isolated_matcher_destroy(isolated_bwc_matcher); +-destroy_isolated_tbl: +- hws_bwc_isolated_table_destroy(bwc_matcher->complex->isolated_tbl); +-destroy_first_matcher: +- mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); +-destroy_ida: +- ida_destroy(&bwc_matcher->complex->metadata_ida); +- mutex_destroy(&bwc_matcher->complex->hash_lock); +- rhashtable_destroy(&bwc_matcher->complex->refcount_hash); +-free_complex: +- kfree(bwc_matcher->complex); ++destroy_submatchers: ++ while (i--) ++ hws_submatcher_destroy(bwc_matcher, i); ++ kfree(cdata); + bwc_matcher->complex = NULL; +-free_masks: +- hws_bwc_matcher_complex_params_destroy(&mask_1, &mask_2); +-err: ++ ++free_submasks: ++ for (i = 0; i < ARRAY_SIZE(submasks); i++) ++ kfree(submasks[i].match_buf); ++ + return ret; + } + + void + mlx5hws_bwc_matcher_destroy_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + { +- struct mlx5hws_bwc_matcher *isolated_bwc_matcher = +- bwc_matcher->complex->isolated_bwc_matcher; +- +- hws_bwc_isolated_actions_destroy(bwc_matcher); +- hws_bwc_isolated_matcher_destroy(isolated_bwc_matcher); +- hws_bwc_isolated_table_destroy(bwc_matcher->complex->isolated_tbl); +- mlx5hws_bwc_matcher_destroy_simple(bwc_matcher); +- ida_destroy(&bwc_matcher->complex->metadata_ida); +- mutex_destroy(&bwc_matcher->complex->hash_lock); +- rhashtable_destroy(&bwc_matcher->complex->refcount_hash); ++ int i; ++ ++ hws_complex_data_actions_destroy(bwc_matcher->complex); ++ for (i = 0; i < bwc_matcher->complex->num_submatchers; i++) ++ hws_submatcher_destroy(bwc_matcher, i); + kfree(bwc_matcher->complex); + bwc_matcher->complex = NULL; + } + +-static void +-hws_bwc_matcher_complex_hash_lock(struct mlx5hws_bwc_matcher *bwc_matcher) +-{ +- mutex_lock(&bwc_matcher->complex->hash_lock); +-} +- +-static void +-hws_bwc_matcher_complex_hash_unlock(struct mlx5hws_bwc_matcher *bwc_matcher) +-{ +- mutex_unlock(&bwc_matcher->complex->hash_lock); +-} +- + static int +-hws_bwc_rule_complex_hash_node_get(struct mlx5hws_bwc_rule *bwc_rule, +- struct mlx5hws_match_parameters *params) ++hws_complex_get_subrule_data(struct mlx5hws_bwc_rule *bwc_rule, ++ struct mlx5hws_bwc_complex_submatcher *subm, ++ u32 *match_params) ++__must_hold(&subm->hash_lock) + { +- struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; +- struct mlx5hws_bwc_complex_rule_hash_node *node, *old_node; +- struct rhashtable *refcount_hash; +- int ret, i; +- +- bwc_rule->complex_hash_node = NULL; ++ struct mlx5hws_bwc_matcher *bwc_matcher = subm->bwc_matcher; ++ struct mlx5hws_bwc_complex_subrule_data *sr_data, *old_data; ++ struct mlx5hws_match_template *mt; ++ int ret; + +- node = kzalloc(sizeof(*node), GFP_KERNEL); +- if (unlikely(!node)) ++ sr_data = kzalloc(sizeof(*sr_data), GFP_KERNEL); ++ if (!sr_data) + return -ENOMEM; + +- ret = ida_alloc(&bwc_matcher->complex->metadata_ida, GFP_KERNEL); ++ ret = ida_alloc(&subm->chain_ida, GFP_KERNEL); + if (ret < 0) +- goto err_free_node; +- node->tag = ret; ++ goto free_sr_data; ++ sr_data->chain_id = ret; + +- refcount_set(&node->refcount, 1); ++ refcount_set(&sr_data->refcount, 1); + +- /* Clear match buffer - turn off all the unrelated fields +- * in accordance with the match params mask for the first +- * matcher out of the two parts of the complex matcher. +- * The resulting mask is the key for the hash. +- */ +- for (i = 0; i < MLX5_ST_SZ_DW_MATCH_PARAM; i++) +- node->match_buf[i] = params->match_buf[i] & +- bwc_matcher->mt->match_param[i]; +- +- refcount_hash = &bwc_matcher->complex->refcount_hash; +- old_node = rhashtable_lookup_get_insert_fast(refcount_hash, +- &node->hash_node, +- hws_refcount_hash); +- if (IS_ERR(old_node)) { +- ret = PTR_ERR(old_node); +- goto err_free_ida; ++ mt = bwc_matcher->matcher->mt; ++ mlx5hws_definer_create_tag(match_params, mt->fc, mt->fc_sz, ++ (u8 *)&sr_data->match_tag); ++ ++ old_data = rhashtable_lookup_get_insert_fast(&subm->rules_hash, ++ &sr_data->hash_node, ++ hws_rules_hash_params); ++ if (IS_ERR(old_data)) { ++ ret = PTR_ERR(old_data); ++ goto free_ida; + } + +- if (old_node) { ++ if (old_data) { + /* Rule with the same tag already exists - update refcount */ +- refcount_inc(&old_node->refcount); ++ refcount_inc(&old_data->refcount); + /* Let the new rule use the same tag as the existing rule. + * Note that we don't have any indication for the rule creation + * process that a rule with similar matching params already +@@ -1114,247 +705,281 @@ hws_bwc_rule_complex_hash_node_get(struct mlx5hws_bwc_rule *bwc_rule, + * There's some performance advantage in skipping such cases, + * so this is left for future optimizations. + */ +- ida_free(&bwc_matcher->complex->metadata_ida, node->tag); +- kfree(node); +- node = old_node; ++ bwc_rule->subrule_data = old_data; ++ ret = 0; ++ goto free_ida; + } + +- bwc_rule->complex_hash_node = node; ++ bwc_rule->subrule_data = sr_data; + return 0; + +-err_free_ida: +- ida_free(&bwc_matcher->complex->metadata_ida, node->tag); +-err_free_node: +- kfree(node); ++free_ida: ++ ida_free(&subm->chain_ida, sr_data->chain_id); ++free_sr_data: ++ kfree(sr_data); ++ + return ret; + } + + static void +-hws_bwc_rule_complex_hash_node_put(struct mlx5hws_bwc_rule *bwc_rule, +- bool *is_last_rule) ++hws_complex_put_subrule_data(struct mlx5hws_bwc_rule *bwc_rule, ++ struct mlx5hws_bwc_complex_submatcher *subm, ++ bool *is_last_rule) ++__must_hold(&subm->hash_lock) + { +- struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; +- struct mlx5hws_bwc_complex_rule_hash_node *node; ++ struct mlx5hws_bwc_complex_subrule_data *sr_data; + + if (is_last_rule) + *is_last_rule = false; + +- node = bwc_rule->complex_hash_node; +- if (refcount_dec_and_test(&node->refcount)) { +- rhashtable_remove_fast(&bwc_matcher->complex->refcount_hash, +- &node->hash_node, +- hws_refcount_hash); +- ida_free(&bwc_matcher->complex->metadata_ida, node->tag); +- kfree(node); ++ sr_data = bwc_rule->subrule_data; ++ if (refcount_dec_and_test(&sr_data->refcount)) { ++ rhashtable_remove_fast(&subm->rules_hash, ++ &sr_data->hash_node, ++ hws_rules_hash_params); ++ ida_free(&subm->chain_ida, sr_data->chain_id); ++ kfree(sr_data); + if (is_last_rule) + *is_last_rule = true; + } + +- bwc_rule->complex_hash_node = NULL; ++ bwc_rule->subrule_data = NULL; + } + +-int mlx5hws_bwc_rule_create_complex(struct mlx5hws_bwc_rule *bwc_rule, +- struct mlx5hws_match_parameters *params, +- u32 flow_source, +- struct mlx5hws_rule_action rule_actions[], +- u16 bwc_queue_idx) ++static int hws_complex_subrule_create(struct mlx5hws_bwc_matcher *cmatcher, ++ struct mlx5hws_bwc_rule *subrule, ++ u32 *match_params, u32 flow_source, ++ int bwc_queue_idx, int subm_idx, ++ struct mlx5hws_rule_action *actions, ++ u32 *chain_id) + { +- struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; +- struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mlx5hws_rule_action chain_actions[HWS_NUM_CHAIN_ACTIONS] = {0}; + u8 modify_hdr_action[MLX5_ST_SZ_BYTES(set_action_in)] = {0}; +- struct mlx5hws_rule_action rule_actions_1[3] = {0}; +- struct mlx5hws_bwc_matcher *isolated_bwc_matcher; +- u32 *match_buf_2; +- u32 metadata_val; +- int ret = 0; ++ struct mlx5hws_bwc_matcher_complex_data *cdata; ++ struct mlx5hws_bwc_complex_submatcher *subm; ++ int ret; + +- isolated_bwc_matcher = bwc_matcher->complex->isolated_bwc_matcher; +- bwc_rule->isolated_bwc_rule = +- mlx5hws_bwc_rule_alloc(isolated_bwc_matcher); +- if (unlikely(!bwc_rule->isolated_bwc_rule)) +- return -ENOMEM; ++ cdata = cmatcher->complex; ++ subm = &cdata->submatchers[subm_idx]; + +- hws_bwc_matcher_complex_hash_lock(bwc_matcher); ++ mutex_lock(&subm->hash_lock); + +- /* Get a new hash node for this complex rule. +- * If this is a unique set of match params for the first matcher, +- * we will get a new hash node with newly allocated IDA. +- * Otherwise we will get an existing node with IDA and updated refcount. +- */ +- ret = hws_bwc_rule_complex_hash_node_get(bwc_rule, params); +- if (unlikely(ret)) { +- mlx5hws_err(ctx, "Complex rule: failed getting RHT node for this rule\n"); +- goto free_isolated_rule; ++ ret = hws_complex_get_subrule_data(subrule, subm, match_params); ++ if (ret) ++ goto unlock; ++ ++ *chain_id = subrule->subrule_data->chain_id; ++ ++ if (!actions) { ++ MLX5_SET(set_action_in, modify_hdr_action, data, *chain_id); ++ chain_actions[0].action = cdata->action_metadata; ++ chain_actions[0].modify_header.data = modify_hdr_action; ++ chain_actions[1].action = ++ cdata->submatchers[subm_idx + 1].action_tbl; ++ chain_actions[2].action = cdata->action_last; ++ actions = chain_actions; + } + +- /* No need to clear match buffer's fields in accordance to what +- * will actually be matched on first and second matchers. +- * Both matchers were created with the appropriate masks +- * and each of them holds the appropriate field copy array, +- * so rule creation will use only the fields that will be copied +- * in accordance with setters in field copy array. +- * We do, however, need to temporary allocate match buffer +- * for the second (isolated) rule in order to not modify +- * user's match params buffer. +- */ +- +- match_buf_2 = kmemdup(params->match_buf, +- MLX5_ST_SZ_BYTES(fte_match_param), +- GFP_KERNEL); +- if (unlikely(!match_buf_2)) { +- mlx5hws_err(ctx, "Complex rule: failed allocating match_buf\n"); +- ret = -ENOMEM; +- goto hash_node_put; +- } ++ ret = mlx5hws_bwc_rule_create_simple(subrule, match_params, actions, ++ flow_source, bwc_queue_idx); ++ if (ret) ++ goto put_subrule_data; + +- /* On 2nd matcher, use unique 32-bit ID as a matching tag */ +- metadata_val = bwc_rule->complex_hash_node->tag; +- MLX5_SET(fte_match_param, match_buf_2, +- misc_parameters_2.metadata_reg_c_6, metadata_val); +- +- /* Isolated rule's rule_actions contain all the original actions */ +- ret = mlx5hws_bwc_rule_create_simple(bwc_rule->isolated_bwc_rule, +- match_buf_2, +- rule_actions, +- flow_source, +- bwc_queue_idx); +- kfree(match_buf_2); +- if (unlikely(ret)) { +- mlx5hws_err(ctx, +- "Complex rule: failed creating isolated BWC rule (%d)\n", +- ret); +- goto hash_node_put; +- } ++ ret = 0; ++ goto unlock; + +- /* First rule's rule_actions contain setting metadata and +- * jump to isolated table that contains the second matcher. +- * Set metadata value to a unique value for this rule. +- */ ++put_subrule_data: ++ hws_complex_put_subrule_data(subrule, subm, NULL); ++unlock: ++ mutex_unlock(&subm->hash_lock); + +- MLX5_SET(set_action_in, modify_hdr_action, +- action_type, MLX5_MODIFICATION_TYPE_SET); +- MLX5_SET(set_action_in, modify_hdr_action, +- field, MLX5_MODI_META_REG_C_6); +- MLX5_SET(set_action_in, modify_hdr_action, +- length, 0); /* zero means length of 32 */ +- MLX5_SET(set_action_in, modify_hdr_action, +- offset, 0); +- MLX5_SET(set_action_in, modify_hdr_action, +- data, metadata_val); ++ return ret; ++} + +- rule_actions_1[0].action = bwc_matcher->complex->action_metadata; +- rule_actions_1[0].modify_header.offset = 0; +- rule_actions_1[0].modify_header.data = modify_hdr_action; ++static int hws_complex_subrule_destroy(struct mlx5hws_bwc_rule *bwc_rule, ++ struct mlx5hws_bwc_matcher *cmatcher, ++ int subm_idx) ++{ ++ struct mlx5hws_bwc_matcher_complex_data *cdata; ++ struct mlx5hws_bwc_complex_submatcher *subm; ++ struct mlx5hws_context *ctx; ++ bool is_last_rule; ++ int ret = 0; + +- rule_actions_1[1].action = bwc_matcher->complex->action_go_to_tbl; +- rule_actions_1[2].action = bwc_matcher->complex->action_last; ++ cdata = cmatcher->complex; ++ subm = &cdata->submatchers[subm_idx]; ++ ctx = subm->tbl->ctx; + +- ret = mlx5hws_bwc_rule_create_simple(bwc_rule, +- params->match_buf, +- rule_actions_1, +- flow_source, +- bwc_queue_idx); ++ mutex_lock(&subm->hash_lock); + +- if (unlikely(ret)) { ++ hws_complex_put_subrule_data(bwc_rule, subm, &is_last_rule); ++ bwc_rule->rule->skip_delete = !is_last_rule; ++ ret = mlx5hws_bwc_rule_destroy_simple(bwc_rule); ++ if (unlikely(ret)) + mlx5hws_err(ctx, +- "Complex rule: failed creating first BWC rule (%d)\n", +- ret); +- goto destroy_isolated_rule; +- } ++ "Complex rule: failed to delete subrule %d (%d)\n", ++ subm_idx, ret); + +- hws_bwc_matcher_complex_hash_unlock(bwc_matcher); ++ if (subm_idx) ++ mlx5hws_bwc_rule_free(bwc_rule); + +- return 0; ++ mutex_unlock(&subm->hash_lock); + +-destroy_isolated_rule: +- mlx5hws_bwc_rule_destroy_simple(bwc_rule->isolated_bwc_rule); +-hash_node_put: +- hws_bwc_rule_complex_hash_node_put(bwc_rule, NULL); +-free_isolated_rule: +- hws_bwc_matcher_complex_hash_unlock(bwc_matcher); +- mlx5hws_bwc_rule_free(bwc_rule->isolated_bwc_rule); + return ret; + } + +-int mlx5hws_bwc_rule_destroy_complex(struct mlx5hws_bwc_rule *bwc_rule) ++int mlx5hws_bwc_rule_create_complex(struct mlx5hws_bwc_rule *bwc_rule, ++ struct mlx5hws_match_parameters *params, ++ u32 flow_source, ++ struct mlx5hws_rule_action rule_actions[], ++ u16 bwc_queue_idx) + { +- struct mlx5hws_context *ctx = bwc_rule->bwc_matcher->matcher->tbl->ctx; +- struct mlx5hws_bwc_rule *isolated_bwc_rule; +- int ret_isolated, ret; +- bool is_last_rule; ++ struct mlx5hws_bwc_rule ++ *subrules[MLX5HWS_BWC_COMPLEX_MAX_SUBMATCHERS] = {0}; ++ struct mlx5hws_bwc_matcher *cmatcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_bwc_matcher_complex_data *cdata; ++ struct mlx5hws_rule_action *subrule_actions; ++ struct mlx5hws_bwc_complex_submatcher *subm; ++ struct mlx5hws_bwc_rule *subrule; ++ u32 *match_params; ++ u32 chain_id; ++ int i, ret; + +- hws_bwc_matcher_complex_hash_lock(bwc_rule->bwc_matcher); ++ cdata = cmatcher->complex; ++ if (!cdata) ++ return -EINVAL; + +- hws_bwc_rule_complex_hash_node_put(bwc_rule, &is_last_rule); +- bwc_rule->rule->skip_delete = !is_last_rule; ++ /* Duplicate user data because we will modify it to set register C6 ++ * values. For the same reason, make sure that we allocate a full ++ * match_param even if the user gave us fewer bytes. We need to ensure ++ * there is space for the match on C6. ++ */ ++ match_params = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL); ++ if (!match_params) ++ return -ENOMEM; + +- ret = mlx5hws_bwc_rule_destroy_simple(bwc_rule); +- if (unlikely(ret)) +- mlx5hws_err(ctx, "BWC complex rule: failed destroying first rule\n"); ++ memcpy(match_params, params->match_buf, params->match_sz); ++ ++ ret = hws_complex_subrule_create(cmatcher, bwc_rule, match_params, ++ flow_source, bwc_queue_idx, 0, ++ NULL, &chain_id); ++ if (ret) ++ goto free_match_params; ++ subrules[0] = bwc_rule; ++ ++ for (i = 1; i < cdata->num_submatchers; i++) { ++ subm = &cdata->submatchers[i]; ++ subrule = mlx5hws_bwc_rule_alloc(subm->bwc_matcher); ++ if (!subrule) { ++ ret = -ENOMEM; ++ goto destroy_subrules; ++ } ++ ++ /* Match on the previous subrule's chain_id. This is how ++ * subrules are connected in steering. ++ */ ++ MLX5_SET(fte_match_param, match_params, ++ misc_parameters_2.metadata_reg_c_6, chain_id); ++ ++ /* The last subrule uses the complex rule's user-specified ++ * actions. Everything else uses the chaining rules based on the ++ * next table and chain_id. ++ */ ++ subrule_actions = ++ i == cdata->num_submatchers - 1 ? rule_actions : NULL; ++ ++ ret = hws_complex_subrule_create(cmatcher, subrule, ++ match_params, flow_source, ++ bwc_queue_idx, i, ++ subrule_actions, &chain_id); ++ if (ret) { ++ mlx5hws_bwc_rule_free(subrule); ++ goto destroy_subrules; ++ } ++ ++ subrules[i] = subrule; ++ } ++ ++ for (i = 0; i < cdata->num_submatchers - 1; i++) ++ subrules[i]->next_subrule = subrules[i + 1]; + +- isolated_bwc_rule = bwc_rule->isolated_bwc_rule; +- ret_isolated = mlx5hws_bwc_rule_destroy_simple(isolated_bwc_rule); +- if (unlikely(ret_isolated)) +- mlx5hws_err(ctx, "BWC complex rule: failed destroying second (isolated) rule\n"); ++ kfree(match_params); + +- hws_bwc_matcher_complex_hash_unlock(bwc_rule->bwc_matcher); ++ return 0; + +- mlx5hws_bwc_rule_free(isolated_bwc_rule); ++destroy_subrules: ++ while (i--) ++ hws_complex_subrule_destroy(subrules[i], cmatcher, i); ++free_match_params: ++ kfree(match_params); + +- return ret || ret_isolated; ++ return ret; + } + +-static void +-hws_bwc_matcher_clear_hash_rtcs(struct mlx5hws_bwc_matcher *bwc_matcher) ++int mlx5hws_bwc_rule_destroy_complex(struct mlx5hws_bwc_rule *bwc_rule) + { +- struct mlx5hws_bwc_complex_rule_hash_node *node; +- struct rhashtable_iter iter; ++ struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher; ++ struct mlx5hws_bwc_rule ++ *subrules[MLX5HWS_BWC_COMPLEX_MAX_SUBMATCHERS] = {0}; ++ struct mlx5hws_bwc_matcher_complex_data *cdata; ++ int i, err, ret_val; ++ ++ cdata = bwc_matcher->complex; ++ ++ /* Construct a list of all the subrules we need to destroy. */ ++ subrules[0] = bwc_rule; ++ for (i = 1; i < cdata->num_submatchers; i++) ++ subrules[i] = subrules[i - 1]->next_subrule; ++ ++ ret_val = 0; ++ for (i = 0; i < cdata->num_submatchers; i++) { ++ err = hws_complex_subrule_destroy(subrules[i], bwc_matcher, i); ++ /* If something goes wrong, plow along to destroy all of the ++ * subrules but return an error upstack. ++ */ ++ if (unlikely(err)) ++ ret_val = err; ++ } + +- rhashtable_walk_enter(&bwc_matcher->complex->refcount_hash, &iter); +- rhashtable_walk_start(&iter); ++ return ret_val; ++} + +- while ((node = rhashtable_walk_next(&iter)) != NULL) { +- if (IS_ERR(node)) ++static void ++hws_bwc_matcher_init_move(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ u16 bwc_queues = mlx5hws_bwc_queues(ctx); ++ struct mlx5hws_bwc_rule *bwc_rule; ++ struct list_head *rules_list; ++ int i; ++ ++ for (i = 0; i < bwc_queues; i++) { ++ rules_list = &bwc_matcher->rules[i]; ++ if (list_empty(rules_list)) + continue; +- node->rtc_valid = false; +- } + +- rhashtable_walk_stop(&iter); +- rhashtable_walk_exit(&iter); ++ list_for_each_entry(bwc_rule, rules_list, list_node) { ++ if (!bwc_rule->subrule_data) ++ continue; ++ bwc_rule->subrule_data->was_moved = false; ++ } ++ } + } + +-int +-mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) ++int mlx5hws_bwc_matcher_complex_move(struct mlx5hws_bwc_matcher *bwc_matcher) + { + struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; + struct mlx5hws_matcher *matcher = bwc_matcher->matcher; + u16 bwc_queues = mlx5hws_bwc_queues(ctx); + struct mlx5hws_bwc_rule *tmp_bwc_rule; + struct mlx5hws_rule_attr rule_attr; +- struct mlx5hws_table *isolated_tbl; + int move_error = 0, poll_error = 0; + struct mlx5hws_rule *tmp_rule; + struct list_head *rules_list; + u32 expected_completions = 1; +- u32 end_ft_id; +- int i, ret; ++ int i, ret = 0; + +- /* We are rehashing the matcher that is the first part of the complex +- * matcher. Need to update the isolated matcher to point to the end_ft +- * of this new matcher. This needs to be done before moving any rules +- * to prevent possible steering loops. +- */ +- isolated_tbl = bwc_matcher->complex->isolated_tbl; +- end_ft_id = bwc_matcher->matcher->resize_dst->end_ft_id; +- ret = mlx5hws_matcher_update_end_ft_isolated(isolated_tbl, end_ft_id); +- if (ret) { +- mlx5hws_err(ctx, +- "Failed updating end_ft of isolated matcher (%d)\n", +- ret); +- return ret; +- } +- +- hws_bwc_matcher_clear_hash_rtcs(bwc_matcher); ++ hws_bwc_matcher_init_move(bwc_matcher); + + mlx5hws_bwc_rule_fill_attr(bwc_matcher, 0, 0, &rule_attr); + +@@ -1369,15 +994,15 @@ mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + /* Check if a rule with similar tag has already + * been moved. + */ +- if (tmp_bwc_rule->complex_hash_node->rtc_valid) { +- /* This rule is a duplicate of rule with similar +- * tag that has already been moved earlier. +- * Just update this rule's RTCs. ++ if (tmp_bwc_rule->subrule_data->was_moved) { ++ /* This rule is a duplicate of rule with ++ * identical tag that has already been moved ++ * earlier. Just update this rule's RTCs. + */ + tmp_bwc_rule->rule->rtc_0 = +- tmp_bwc_rule->complex_hash_node->rtc_0; ++ tmp_bwc_rule->subrule_data->rtc_0; + tmp_bwc_rule->rule->rtc_1 = +- tmp_bwc_rule->complex_hash_node->rtc_1; ++ tmp_bwc_rule->subrule_data->rtc_1; + tmp_bwc_rule->rule->matcher = + tmp_bwc_rule->rule->matcher->resize_dst; + continue; +@@ -1425,12 +1050,12 @@ mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + /* Done moving the rule to the new matcher, + * now update RTCs for all the duplicated rules. + */ +- tmp_bwc_rule->complex_hash_node->rtc_0 = ++ tmp_bwc_rule->subrule_data->rtc_0 = + tmp_bwc_rule->rule->rtc_0; +- tmp_bwc_rule->complex_hash_node->rtc_1 = ++ tmp_bwc_rule->subrule_data->rtc_1 = + tmp_bwc_rule->rule->rtc_1; + +- tmp_bwc_rule->complex_hash_node->rtc_valid = true; ++ tmp_bwc_rule->subrule_data->was_moved = true; + } + } + +@@ -1442,3 +1067,35 @@ mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher) + + return ret; + } ++ ++int ++mlx5hws_bwc_matcher_complex_move_first(struct mlx5hws_bwc_matcher *bwc_matcher) ++{ ++ struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx; ++ struct mlx5hws_bwc_matcher_complex_data *cdata; ++ struct mlx5hws_table *isolated_tbl; ++ u32 end_ft_id; ++ int i, ret; ++ ++ cdata = bwc_matcher->complex; ++ ++ /* We are rehashing the first submatcher. We need to update the ++ * subsequent submatchers to point to the end_ft of this new matcher. ++ * This needs to be done before moving any rules to prevent possible ++ * steering loops. ++ */ ++ end_ft_id = bwc_matcher->matcher->resize_dst->end_ft_id; ++ for (i = 1; i < cdata->num_submatchers; i++) { ++ isolated_tbl = cdata->submatchers[i].tbl; ++ ret = mlx5hws_matcher_update_end_ft_isolated(isolated_tbl, ++ end_ft_id); ++ if (ret) { ++ mlx5hws_err(ctx, ++ "Complex matcher: failed updating end_ft of isolated matcher (%d)\n", ++ ret); ++ return ret; ++ } ++ } ++ ++ return mlx5hws_bwc_matcher_complex_move(bwc_matcher); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h +index a6887c7e39d5..d07de631ce9f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc_complex.h +@@ -4,25 +4,60 @@ + #ifndef HWS_BWC_COMPLEX_H_ + #define HWS_BWC_COMPLEX_H_ + +-struct mlx5hws_bwc_complex_rule_hash_node { +- u32 match_buf[MLX5_ST_SZ_DW_MATCH_PARAM]; +- u32 tag; ++#define MLX5HWS_BWC_COMPLEX_MAX_SUBMATCHERS 4 ++ ++/* A matcher can't contain two rules with the same match tag, but it is possible ++ * that two different complex rules' subrules have the same match tag. In that ++ * case, those subrules correspond to a single rule, and we need to refcount. ++ */ ++struct mlx5hws_bwc_complex_subrule_data { ++ struct mlx5hws_rule_match_tag match_tag; + refcount_t refcount; +- bool rtc_valid; ++ /* The chain_id is what glues individual subrules into larger complex ++ * rules. It is the value that this subrule writes to register C6, and ++ * that the next subrule matches against. ++ */ ++ u32 chain_id; + u32 rtc_0; + u32 rtc_1; ++ /* During rehash we iterate through all the subrules to move them. But ++ * two or more subrules can share the same physical rule in the ++ * submatcher, so we use `was_moved` to keep track if a given rule was ++ * already moved. ++ */ ++ bool was_moved; + struct rhash_head hash_node; + }; + ++struct mlx5hws_bwc_complex_submatcher { ++ /* Isolated table that the matcher lives in. Not set for the first ++ * matcher, which lives in the original table. ++ */ ++ struct mlx5hws_table *tbl; ++ /* Match a rule with this action to go to `tbl`. This is set in all ++ * submatchers but the first. ++ */ ++ struct mlx5hws_action *action_tbl; ++ /* This submatcher's simple matcher. The first submatcher points to the ++ * outer (complex) matcher. ++ */ ++ struct mlx5hws_bwc_matcher *bwc_matcher; ++ struct rhashtable rules_hash; ++ struct ida chain_ida; ++ struct mutex hash_lock; /* Protect the hash and ida. */ ++}; ++ + struct mlx5hws_bwc_matcher_complex_data { +- struct mlx5hws_table *isolated_tbl; +- struct mlx5hws_bwc_matcher *isolated_bwc_matcher; ++ struct mlx5hws_bwc_complex_submatcher ++ submatchers[MLX5HWS_BWC_COMPLEX_MAX_SUBMATCHERS]; ++ int num_submatchers; ++ /* Actions used by all but the last submatcher to point to the next ++ * submatcher in the chain. The last submatcher uses the action template ++ * from the complex matcher, to perform the actions that the user ++ * originally requested. ++ */ + struct mlx5hws_action *action_metadata; +- struct mlx5hws_action *action_go_to_tbl; + struct mlx5hws_action *action_last; +- struct rhashtable refcount_hash; +- struct mutex hash_lock; /* Protect the refcount rhashtable */ +- struct ida metadata_ida; + }; + + bool mlx5hws_bwc_match_params_is_complex(struct mlx5hws_context *ctx, +@@ -37,7 +72,10 @@ int mlx5hws_bwc_matcher_create_complex(struct mlx5hws_bwc_matcher *bwc_matcher, + + void mlx5hws_bwc_matcher_destroy_complex(struct mlx5hws_bwc_matcher *bwc_matcher); + +-int mlx5hws_bwc_matcher_move_all_complex(struct mlx5hws_bwc_matcher *bwc_matcher); ++int mlx5hws_bwc_matcher_complex_move(struct mlx5hws_bwc_matcher *bwc_matcher); ++ ++int ++mlx5hws_bwc_matcher_complex_move_first(struct mlx5hws_bwc_matcher *bwc_matcher); + + int mlx5hws_bwc_rule_create_complex(struct mlx5hws_bwc_rule *bwc_rule, + struct mlx5hws_match_parameters *params, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +index c4bb6967f74d..82fd122d4284 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c +@@ -1831,80 +1831,6 @@ hws_definer_conv_match_params_to_hl(struct mlx5hws_context *ctx, + return ret; + } + +-struct mlx5hws_definer_fc * +-mlx5hws_definer_conv_match_params_to_compressed_fc(struct mlx5hws_context *ctx, +- u8 match_criteria_enable, +- u32 *match_param, +- int *fc_sz) +-{ +- struct mlx5hws_definer_fc *compressed_fc = NULL; +- struct mlx5hws_definer_conv_data cd = {0}; +- struct mlx5hws_definer_fc *fc; +- int ret; +- +- fc = hws_definer_alloc_fc(ctx, MLX5HWS_DEFINER_FNAME_MAX); +- if (!fc) +- return NULL; +- +- cd.fc = fc; +- cd.ctx = ctx; +- +- if (match_criteria_enable & MLX5HWS_DEFINER_MATCH_CRITERIA_OUTER) { +- ret = hws_definer_conv_outer(&cd, match_param); +- if (ret) +- goto err_free_fc; +- } +- +- if (match_criteria_enable & MLX5HWS_DEFINER_MATCH_CRITERIA_INNER) { +- ret = hws_definer_conv_inner(&cd, match_param); +- if (ret) +- goto err_free_fc; +- } +- +- if (match_criteria_enable & MLX5HWS_DEFINER_MATCH_CRITERIA_MISC) { +- ret = hws_definer_conv_misc(&cd, match_param); +- if (ret) +- goto err_free_fc; +- } +- +- if (match_criteria_enable & MLX5HWS_DEFINER_MATCH_CRITERIA_MISC2) { +- ret = hws_definer_conv_misc2(&cd, match_param); +- if (ret) +- goto err_free_fc; +- } +- +- if (match_criteria_enable & MLX5HWS_DEFINER_MATCH_CRITERIA_MISC3) { +- ret = hws_definer_conv_misc3(&cd, match_param); +- if (ret) +- goto err_free_fc; +- } +- +- if (match_criteria_enable & MLX5HWS_DEFINER_MATCH_CRITERIA_MISC4) { +- ret = hws_definer_conv_misc4(&cd, match_param); +- if (ret) +- goto err_free_fc; +- } +- +- if (match_criteria_enable & MLX5HWS_DEFINER_MATCH_CRITERIA_MISC5) { +- ret = hws_definer_conv_misc5(&cd, match_param); +- if (ret) +- goto err_free_fc; +- } +- +- /* Allocate fc array on mt */ +- compressed_fc = hws_definer_alloc_compressed_fc(fc); +- if (!compressed_fc) { +- mlx5hws_err(ctx, +- "Convert to compressed fc: failed to set field copy to match template\n"); +- goto err_free_fc; +- } +- *fc_sz = hws_definer_get_fc_size(fc); +- +-err_free_fc: +- kfree(fc); +- return compressed_fc; +-} +- + static int + hws_definer_find_byte_in_tag(struct mlx5hws_definer *definer, + u32 hl_byte_off, +@@ -2067,7 +1993,7 @@ hws_definer_copy_sel_ctrl(struct mlx5hws_definer_sel_ctrl *ctrl, + static int + hws_definer_find_best_match_fit(struct mlx5hws_context *ctx, + struct mlx5hws_definer *definer, +- u8 *hl) ++ u8 *hl, bool allow_jumbo) + { + struct mlx5hws_definer_sel_ctrl ctrl = {0}; + bool found; +@@ -2084,6 +2010,9 @@ hws_definer_find_best_match_fit(struct mlx5hws_context *ctx, + return 0; + } + ++ if (!allow_jumbo) ++ return -E2BIG; ++ + /* Try to create a full/limited jumbo definer */ + ctrl.allowed_full_dw = ctx->caps->full_dw_jumbo_support ? DW_SELECTORS : + DW_SELECTORS_MATCH; +@@ -2160,7 +2089,8 @@ int mlx5hws_definer_compare(struct mlx5hws_definer *definer_a, + int + mlx5hws_definer_calc_layout(struct mlx5hws_context *ctx, + struct mlx5hws_match_template *mt, +- struct mlx5hws_definer *match_definer) ++ struct mlx5hws_definer *match_definer, ++ bool allow_jumbo) + { + u8 *match_hl; + int ret; +@@ -2182,7 +2112,8 @@ mlx5hws_definer_calc_layout(struct mlx5hws_context *ctx, + } + + /* Find the match definer layout for header layout match union */ +- ret = hws_definer_find_best_match_fit(ctx, match_definer, match_hl); ++ ret = hws_definer_find_best_match_fit(ctx, match_definer, match_hl, ++ allow_jumbo); + if (ret) { + if (ret == -E2BIG) + mlx5hws_dbg(ctx, +@@ -2370,7 +2301,7 @@ int mlx5hws_definer_mt_init(struct mlx5hws_context *ctx, + struct mlx5hws_definer match_layout = {0}; + int ret; + +- ret = mlx5hws_definer_calc_layout(ctx, mt, &match_layout); ++ ret = mlx5hws_definer_calc_layout(ctx, mt, &match_layout, true); + if (ret) { + mlx5hws_err(ctx, "Failed to calculate matcher definer layout\n"); + return ret; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h +index 62da55389331..141f3eb2e307 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.h +@@ -823,13 +823,8 @@ void mlx5hws_definer_free(struct mlx5hws_context *ctx, + + int mlx5hws_definer_calc_layout(struct mlx5hws_context *ctx, + struct mlx5hws_match_template *mt, +- struct mlx5hws_definer *match_definer); +- +-struct mlx5hws_definer_fc * +-mlx5hws_definer_conv_match_params_to_compressed_fc(struct mlx5hws_context *ctx, +- u8 match_criteria_enable, +- u32 *match_param, +- int *fc_sz); ++ struct mlx5hws_definer *match_definer, ++ bool allow_jumbo); + + const char *mlx5hws_definer_fname_to_str(enum mlx5hws_definer_fname fname); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1566-net-mlx5e-prevent-entering-switchdev-mode-with-inconsistent-.patch b/SOURCES/1566-net-mlx5e-prevent-entering-switchdev-mode-with-inconsistent-.patch new file mode 100644 index 000000000..17a13e4e4 --- /dev/null +++ b/SOURCES/1566-net-mlx5e-prevent-entering-switchdev-mode-with-inconsistent-.patch @@ -0,0 +1,105 @@ +From e372fb3a61e3df224c1a4e95424e8124dbd9dbef Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5e: Prevent entering switchdev mode with inconsistent + netns + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 06fdc45f16c392dc3394c67e7c17ae63935715d3 +Author: Jianbo Liu +Date: Mon Sep 29 00:25:18 2025 +0300 + + net/mlx5e: Prevent entering switchdev mode with inconsistent netns + + When a PF enters switchdev mode, its netdevice becomes the uplink + representor but remains in its current network namespace. All other + representors (VFs, SFs) are created in the netns of the devlink + instance. + + If the PF's netns has been moved and differs from the devlink's netns, + enabling switchdev mode would create a state where the OVS control + plane (ovs-vsctl) cannot manage the switch because the PF uplink + representor and the other representors are split across different + namespaces. + + To prevent this inconsistent configuration, block the request to enter + switchdev mode if the PF netdevice's netns does not match the netns of + its devlink instance. + + As part of this change, the PF's netns is first marked as immutable. + This prevents race conditions where the netns could be changed after + the check is performed but before the mode transition is complete, and + it aligns the PF's behavior with that of the final uplink representor. + + Signed-off-by: Jianbo Liu + Reviewed-by: Cosmin Ratiu + Reviewed-by: Jiri Pirko + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759094723-843774-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index b8ec55929ab1..52c3de24bea3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -3774,6 +3774,29 @@ void mlx5_eswitch_unblock_mode(struct mlx5_core_dev *dev) + up_write(&esw->mode_lock); + } + ++/* Returns false only when uplink netdev exists and its netns is different from ++ * devlink's netns. True for all others so entering switchdev mode is allowed. ++ */ ++static bool mlx5_devlink_netdev_netns_immutable_set(struct devlink *devlink, ++ bool immutable) ++{ ++ struct mlx5_core_dev *mdev = devlink_priv(devlink); ++ struct net_device *netdev; ++ bool ret; ++ ++ netdev = mlx5_uplink_netdev_get(mdev); ++ if (!netdev) ++ return true; ++ ++ rtnl_lock(); ++ netdev->netns_immutable = immutable; ++ ret = net_eq(dev_net(netdev), devlink_net(devlink)); ++ rtnl_unlock(); ++ ++ mlx5_uplink_netdev_put(mdev, netdev); ++ return ret; ++} ++ + int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + struct netlink_ext_ack *extack) + { +@@ -3816,6 +3839,14 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + esw->eswitch_operation_in_progress = true; + up_write(&esw->mode_lock); + ++ if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && ++ !mlx5_devlink_netdev_netns_immutable_set(devlink, true)) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's."); ++ err = -EINVAL; ++ goto skip; ++ } ++ + if (mode == DEVLINK_ESWITCH_MODE_LEGACY) + esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY; + mlx5_eswitch_disable_locked(esw); +@@ -3834,6 +3865,8 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + } + + skip: ++ if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && err) ++ mlx5_devlink_netdev_netns_immutable_set(devlink, false); + down_write(&esw->mode_lock); + esw->eswitch_operation_in_progress = false; + unlock: +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1567-net-mlx5-improve-qos-error-messages-with-actual-depth-values.patch b/SOURCES/1567-net-mlx5-improve-qos-error-messages-with-actual-depth-values.patch new file mode 100644 index 000000000..80e4b700f --- /dev/null +++ b/SOURCES/1567-net-mlx5-improve-qos-error-messages-with-actual-depth-values.patch @@ -0,0 +1,68 @@ +From 12a0a2705638c0500f2b3b6f06925dd54e277143 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5: Improve QoS error messages with actual depth values + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 33dbaa54ef431b416c1ddb2c25b9b201634edcfa +Author: Carolina Jubran +Date: Mon Sep 29 00:25:19 2025 +0300 + + net/mlx5: Improve QoS error messages with actual depth values + + Enhance error messages in MLX5 QoS scheduling depth validation by + including the actual values that caused the validation to fail. + + Suggested-by: Paolo Abeni + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759094723-843774-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 5f2d6c35f1ad..56e6f54b1e2e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -971,8 +971,9 @@ esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type type, + max_level = 1 << MLX5_CAP_QOS(vport_node->esw->dev, + log_esw_max_sched_depth); + if (new_level > max_level) { +- NL_SET_ERR_MSG_MOD(extack, +- "TC arbitration on leafs is not supported beyond max scheduling depth"); ++ NL_SET_ERR_MSG_FMT_MOD(extack, ++ "TC arbitration on leafs is not supported beyond max depth %d", ++ max_level); + return -EOPNOTSUPP; + } + } +@@ -1444,8 +1445,9 @@ static int esw_qos_node_enable_tc_arbitration(struct mlx5_esw_sched_node *node, + new_level = node->level + 1; + max_level = 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); + if (new_level > max_level) { +- NL_SET_ERR_MSG_MOD(extack, +- "TC arbitration on nodes is not supported beyond max scheduling depth"); ++ NL_SET_ERR_MSG_FMT_MOD(extack, ++ "TC arbitration on nodes is not supported beyond max depth %d", ++ max_level); + return -EOPNOTSUPP; + } + +@@ -1997,8 +1999,9 @@ mlx5_esw_qos_node_validate_set_parent(struct mlx5_esw_sched_node *node, + + max_level = 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); + if (new_level > max_level) { +- NL_SET_ERR_MSG_MOD(extack, +- "Node hierarchy depth exceeds the maximum supported level"); ++ NL_SET_ERR_MSG_FMT_MOD(extack, ++ "Node hierarchy depth %d exceeds the maximum supported level %d", ++ new_level, max_level); + return -EOPNOTSUPP; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1568-net-mlx5e-remove-unused-mdev-param-from-rss-indir-init.patch b/SOURCES/1568-net-mlx5e-remove-unused-mdev-param-from-rss-indir-init.patch new file mode 100644 index 000000000..e613d4428 --- /dev/null +++ b/SOURCES/1568-net-mlx5e-remove-unused-mdev-param-from-rss-indir-init.patch @@ -0,0 +1,104 @@ +From 59a3d8fddda6c882efa6b79e0292733f9cf1f627 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5e: Remove unused mdev param from RSS indir init + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a3f69641cbbc36015eb50ad6170caeb26f9022de +Author: Carolina Jubran +Date: Mon Sep 29 00:25:20 2025 +0300 + + net/mlx5e: Remove unused mdev param from RSS indir init + + The mdev parameter is not used in mlx5e_rss_params_indir_init, so drop + it from the function and update all callers accordingly. + + No functional changes. + + Signed-off-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759094723-843774-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +index c68ba0e58fa6..6422eeabc334 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +@@ -91,7 +91,7 @@ void mlx5e_rss_params_indir_modify_actual_size(struct mlx5e_rss *rss, u32 num_ch + rss->indir.actual_table_size = mlx5e_rqt_size(rss->mdev, num_channels); + } + +-int mlx5e_rss_params_indir_init(struct mlx5e_rss_params_indir *indir, struct mlx5_core_dev *mdev, ++int mlx5e_rss_params_indir_init(struct mlx5e_rss_params_indir *indir, + u32 actual_table_size, u32 max_table_size) + { + indir->table = kvmalloc_array(max_table_size, sizeof(*indir->table), GFP_KERNEL); +@@ -139,7 +139,8 @@ static struct mlx5e_rss *mlx5e_rss_init_copy(const struct mlx5e_rss *from) + if (!rss) + return ERR_PTR(-ENOMEM); + +- err = mlx5e_rss_params_indir_init(&rss->indir, from->mdev, from->indir.actual_table_size, ++ err = mlx5e_rss_params_indir_init(&rss->indir, ++ from->indir.actual_table_size, + from->indir.max_table_size); + if (err) + goto err_free_rss; +@@ -363,6 +364,7 @@ struct mlx5e_rss *mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_suppo + enum mlx5e_rss_init_type type, unsigned int nch, + unsigned int max_nch) + { ++ u32 rqt_max_size, rqt_size; + struct mlx5e_rss *rss; + int err; + +@@ -370,9 +372,9 @@ struct mlx5e_rss *mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_suppo + if (!rss) + return ERR_PTR(-ENOMEM); + +- err = mlx5e_rss_params_indir_init(&rss->indir, mdev, +- mlx5e_rqt_size(mdev, nch), +- mlx5e_rqt_size(mdev, max_nch)); ++ rqt_size = mlx5e_rqt_size(mdev, nch); ++ rqt_max_size = mlx5e_rqt_size(mdev, max_nch); ++ err = mlx5e_rss_params_indir_init(&rss->indir, rqt_size, rqt_max_size); + if (err) + goto err_free_rss; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +index c6c1b2847cf5..616097c8770e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +@@ -18,7 +18,7 @@ mlx5e_rss_get_default_tt_config(enum mlx5_traffic_types tt); + + struct mlx5e_rss; + +-int mlx5e_rss_params_indir_init(struct mlx5e_rss_params_indir *indir, struct mlx5_core_dev *mdev, ++int mlx5e_rss_params_indir_init(struct mlx5e_rss_params_indir *indir, + u32 actual_table_size, u32 max_table_size); + void mlx5e_rss_params_indir_cleanup(struct mlx5e_rss_params_indir *indir); + void mlx5e_rss_params_indir_modify_actual_size(struct mlx5e_rss *rss, u32 num_channels); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index e1599817c3b2..7a34a502f97f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -758,11 +758,11 @@ static int mlx5e_hairpin_create_indirect_rqt(struct mlx5e_hairpin *hp) + struct mlx5e_priv *priv = hp->func_priv; + struct mlx5_core_dev *mdev = priv->mdev; + struct mlx5e_rss_params_indir indir; ++ u32 rqt_size; + int err; + +- err = mlx5e_rss_params_indir_init(&indir, mdev, +- mlx5e_rqt_size(mdev, hp->num_channels), +- mlx5e_rqt_size(mdev, hp->num_channels)); ++ rqt_size = mlx5e_rqt_size(mdev, hp->num_channels); ++ err = mlx5e_rss_params_indir_init(&indir, rqt_size, rqt_size); + if (err) + return err; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1569-net-mlx5e-introduce-mlx5e-rss-init-params.patch b/SOURCES/1569-net-mlx5e-introduce-mlx5e-rss-init-params.patch new file mode 100644 index 000000000..fb79d67da --- /dev/null +++ b/SOURCES/1569-net-mlx5e-introduce-mlx5e-rss-init-params.patch @@ -0,0 +1,288 @@ +From 382207217359bed3c6dd288669d8eabc4726fa49 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5e: Introduce mlx5e_rss_init_params + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fc92cddd7a833d51ef857eca672214cab755ceaa +Author: Carolina Jubran +Date: Mon Sep 29 00:25:21 2025 +0300 + + net/mlx5e: Introduce mlx5e_rss_init_params + + Introduce a dedicated structure to group RSS initialization parameters + that are only used during RSS creation, and drop the "init" prefix + from pkt_merge_param. + + No functional changes. + + Signed-off-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759094723-843774-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +index 6422eeabc334..c3eeeec62129 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +@@ -193,10 +193,10 @@ mlx5e_rss_get_tt_config(struct mlx5e_rss *rss, enum mlx5_traffic_types tt) + return rss_tt; + } + +-static int mlx5e_rss_create_tir(struct mlx5e_rss *rss, +- enum mlx5_traffic_types tt, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, +- bool inner) ++static int ++mlx5e_rss_create_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, ++ const struct mlx5e_packet_merge_param *pkt_merge_param, ++ bool inner) + { + struct mlx5e_rss_params_traffic_type rss_tt; + struct mlx5e_tir_builder *builder; +@@ -229,7 +229,7 @@ static int mlx5e_rss_create_tir(struct mlx5e_rss *rss, + rqtn = mlx5e_rqt_get_rqtn(&rss->rqt); + mlx5e_tir_builder_build_rqt(builder, rss->mdev->mlx5e_res.hw_objs.td.tdn, + rqtn, rss->inner_ft_support); +- mlx5e_tir_builder_build_packet_merge(builder, init_pkt_merge_param); ++ mlx5e_tir_builder_build_packet_merge(builder, pkt_merge_param); + rss_tt = mlx5e_rss_get_tt_config(rss, tt); + mlx5e_tir_builder_build_rss(builder, &rss->hash, &rss_tt, inner); + +@@ -265,15 +265,16 @@ static void mlx5e_rss_destroy_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types + *tir_p = NULL; + } + +-static int mlx5e_rss_create_tirs(struct mlx5e_rss *rss, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, +- bool inner) ++static int ++mlx5e_rss_create_tirs(struct mlx5e_rss *rss, ++ const struct mlx5e_packet_merge_param *pkt_merge_param, ++ bool inner) + { + enum mlx5_traffic_types tt, max_tt; + int err; + + for (tt = 0; tt < MLX5E_NUM_INDIR_TIRS; tt++) { +- err = mlx5e_rss_create_tir(rss, tt, init_pkt_merge_param, inner); ++ err = mlx5e_rss_create_tir(rss, tt, pkt_merge_param, inner); + if (err) + goto err_destroy_tirs; + } +@@ -359,10 +360,9 @@ static int mlx5e_rss_init_no_tirs(struct mlx5e_rss *rss) + rss->drop_rqn, rss->indir.max_table_size); + } + +-struct mlx5e_rss *mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, +- enum mlx5e_rss_init_type type, unsigned int nch, +- unsigned int max_nch) ++struct mlx5e_rss * ++mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, ++ const struct mlx5e_rss_init_params *init_params) + { + u32 rqt_max_size, rqt_size; + struct mlx5e_rss *rss; +@@ -372,8 +372,8 @@ struct mlx5e_rss *mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_suppo + if (!rss) + return ERR_PTR(-ENOMEM); + +- rqt_size = mlx5e_rqt_size(mdev, nch); +- rqt_max_size = mlx5e_rqt_size(mdev, max_nch); ++ rqt_size = mlx5e_rqt_size(mdev, init_params->nch); ++ rqt_max_size = mlx5e_rqt_size(mdev, init_params->max_nch); + err = mlx5e_rss_params_indir_init(&rss->indir, rqt_size, rqt_max_size); + if (err) + goto err_free_rss; +@@ -386,15 +386,18 @@ struct mlx5e_rss *mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_suppo + if (err) + goto err_free_indir; + +- if (type == MLX5E_RSS_INIT_NO_TIRS) ++ if (init_params->type == MLX5E_RSS_INIT_NO_TIRS) + goto out; + +- err = mlx5e_rss_create_tirs(rss, init_pkt_merge_param, false); ++ err = mlx5e_rss_create_tirs(rss, init_params->pkt_merge_param, ++ false); + if (err) + goto err_destroy_rqt; + + if (inner_ft_support) { +- err = mlx5e_rss_create_tirs(rss, init_pkt_merge_param, true); ++ err = mlx5e_rss_create_tirs(rss, ++ init_params->pkt_merge_param, ++ true); + if (err) + goto err_destroy_tirs; + } +@@ -470,10 +473,10 @@ bool mlx5e_rss_valid_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, bool + /* Fill the "tirn" output parameter. + * Create the requested TIR if it's its first usage. + */ +-int mlx5e_rss_obtain_tirn(struct mlx5e_rss *rss, +- enum mlx5_traffic_types tt, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, +- bool inner, u32 *tirn) ++int ++mlx5e_rss_obtain_tirn(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, ++ const struct mlx5e_packet_merge_param *pkt_merge_param, ++ bool inner, u32 *tirn) + { + struct mlx5e_tir *tir; + +@@ -481,7 +484,7 @@ int mlx5e_rss_obtain_tirn(struct mlx5e_rss *rss, + if (!tir) { /* TIR doesn't exist, create one */ + int err; + +- err = mlx5e_rss_create_tir(rss, tt, init_pkt_merge_param, inner); ++ err = mlx5e_rss_create_tir(rss, tt, pkt_merge_param, inner); + if (err) + return err; + tir = rss_get_tir(rss, tt, inner); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +index 616097c8770e..80225709675b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +@@ -13,6 +13,13 @@ enum mlx5e_rss_init_type { + MLX5E_RSS_INIT_TIRS + }; + ++struct mlx5e_rss_init_params { ++ enum mlx5e_rss_init_type type; ++ const struct mlx5e_packet_merge_param *pkt_merge_param; ++ unsigned int nch; ++ unsigned int max_nch; ++}; ++ + struct mlx5e_rss_params_traffic_type + mlx5e_rss_get_default_tt_config(enum mlx5_traffic_types tt); + +@@ -22,10 +29,9 @@ int mlx5e_rss_params_indir_init(struct mlx5e_rss_params_indir *indir, + u32 actual_table_size, u32 max_table_size); + void mlx5e_rss_params_indir_cleanup(struct mlx5e_rss_params_indir *indir); + void mlx5e_rss_params_indir_modify_actual_size(struct mlx5e_rss *rss, u32 num_channels); +-struct mlx5e_rss *mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, +- enum mlx5e_rss_init_type type, unsigned int nch, +- unsigned int max_nch); ++struct mlx5e_rss * ++mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, ++ const struct mlx5e_rss_init_params *init_params); + int mlx5e_rss_cleanup(struct mlx5e_rss *rss); + + void mlx5e_rss_refcnt_inc(struct mlx5e_rss *rss); +@@ -37,10 +43,10 @@ u32 mlx5e_rss_get_tirn(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, + bool inner); + bool mlx5e_rss_valid_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, bool inner); + u32 mlx5e_rss_get_rqtn(struct mlx5e_rss *rss); +-int mlx5e_rss_obtain_tirn(struct mlx5e_rss *rss, +- enum mlx5_traffic_types tt, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, +- bool inner, u32 *tirn); ++int ++mlx5e_rss_obtain_tirn(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, ++ const struct mlx5e_packet_merge_param *pkt_merge_param, ++ bool inner, u32 *tirn); + + void mlx5e_rss_enable(struct mlx5e_rss *rss, u32 *rqns, u32 *vhca_ids, unsigned int num_rqns); + void mlx5e_rss_disable(struct mlx5e_rss *rss); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +index a2acbfee2b77..74dda61e92bc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +@@ -54,17 +54,25 @@ static int mlx5e_rx_res_rss_init_def(struct mlx5e_rx_res *res, + unsigned int init_nch) + { + bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT; ++ struct mlx5e_rss_init_params init_params; + struct mlx5e_rss *rss; + + if (WARN_ON(res->rss[0])) + return -EINVAL; + ++ init_params = (struct mlx5e_rss_init_params) { ++ .type = MLX5E_RSS_INIT_TIRS, ++ .pkt_merge_param = &res->pkt_merge_param, ++ .nch = init_nch, ++ .max_nch = res->max_nch, ++ }; ++ + rss = mlx5e_rss_init(res->mdev, inner_ft_support, res->drop_rqn, +- &res->pkt_merge_param, MLX5E_RSS_INIT_TIRS, init_nch, res->max_nch); ++ &init_params); + if (IS_ERR(rss)) + return PTR_ERR(rss); + +- mlx5e_rss_set_indir_uniform(rss, init_nch); ++ mlx5e_rss_set_indir_uniform(rss, init_params.nch); + + res->rss[0] = rss; + +@@ -74,18 +82,25 @@ static int mlx5e_rx_res_rss_init_def(struct mlx5e_rx_res *res, + int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 rss_idx, unsigned int init_nch) + { + bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT; ++ struct mlx5e_rss_init_params init_params; + struct mlx5e_rss *rss; + + if (WARN_ON_ONCE(res->rss[rss_idx])) + return -ENOSPC; + ++ init_params = (struct mlx5e_rss_init_params) { ++ .type = MLX5E_RSS_INIT_NO_TIRS, ++ .pkt_merge_param = &res->pkt_merge_param, ++ .nch = init_nch, ++ .max_nch = res->max_nch, ++ }; ++ + rss = mlx5e_rss_init(res->mdev, inner_ft_support, res->drop_rqn, +- &res->pkt_merge_param, MLX5E_RSS_INIT_NO_TIRS, init_nch, +- res->max_nch); ++ &init_params); + if (IS_ERR(rss)) + return PTR_ERR(rss); + +- mlx5e_rss_set_indir_uniform(rss, init_nch); ++ mlx5e_rss_set_indir_uniform(rss, init_params.nch); + if (res->rss_active) { + u32 *vhca_ids = get_vhca_ids(res, 0); + +@@ -438,7 +453,7 @@ static void mlx5e_rx_res_ptp_destroy(struct mlx5e_rx_res *res) + struct mlx5e_rx_res * + mlx5e_rx_res_create(struct mlx5_core_dev *mdev, enum mlx5e_rx_res_features features, + unsigned int max_nch, u32 drop_rqn, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, ++ const struct mlx5e_packet_merge_param *pkt_merge_param, + unsigned int init_nch) + { + bool multi_vhca = features & MLX5E_RX_RES_FEATURE_MULTI_VHCA; +@@ -454,7 +469,7 @@ mlx5e_rx_res_create(struct mlx5_core_dev *mdev, enum mlx5e_rx_res_features featu + res->max_nch = max_nch; + res->drop_rqn = drop_rqn; + +- res->pkt_merge_param = *init_pkt_merge_param; ++ res->pkt_merge_param = *pkt_merge_param; + init_rwsem(&res->pkt_merge_param_sem); + + err = mlx5e_rx_res_rss_init_def(res, init_nch); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h +index 1d049e2aa264..65a857c215e1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h +@@ -27,7 +27,7 @@ enum mlx5e_rx_res_features { + struct mlx5e_rx_res * + mlx5e_rx_res_create(struct mlx5_core_dev *mdev, enum mlx5e_rx_res_features features, + unsigned int max_nch, u32 drop_rqn, +- const struct mlx5e_packet_merge_param *init_pkt_merge_param, ++ const struct mlx5e_packet_merge_param *pkt_merge_param, + unsigned int init_nch); + void mlx5e_rx_res_destroy(struct mlx5e_rx_res *res); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1570-net-mlx5e-introduce-mlx5e-rss-params-for-rss-configuration.patch b/SOURCES/1570-net-mlx5e-introduce-mlx5e-rss-params-for-rss-configuration.patch new file mode 100644 index 000000000..0243d1aa6 --- /dev/null +++ b/SOURCES/1570-net-mlx5e-introduce-mlx5e-rss-params-for-rss-configuration.patch @@ -0,0 +1,247 @@ +From fe06434012210b2e9fc9dc042408201768b9419c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5e: Introduce mlx5e_rss_params for RSS configuration + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c40a94ccfdc76fa26c620d1748ebda35c2153dd9 +Author: Carolina Jubran +Date: Mon Sep 29 00:25:22 2025 +0300 + + net/mlx5e: Introduce mlx5e_rss_params for RSS configuration + + Group RSS-related parameters into a dedicated mlx5e_rss_params + struct. Pass this struct instead of individual arguments when + initializing RSS. + + No functional changes. + + Signed-off-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759094723-843774-7-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +index c3eeeec62129..c96cbc4b0dbf 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +@@ -75,15 +75,14 @@ struct mlx5e_rss { + struct mlx5e_tir *inner_tir[MLX5E_NUM_INDIR_TIRS]; + struct mlx5e_rqt rqt; + struct mlx5_core_dev *mdev; /* primary */ +- u32 drop_rqn; +- bool inner_ft_support; ++ struct mlx5e_rss_params params; + bool enabled; + refcount_t refcnt; + }; + + bool mlx5e_rss_get_inner_ft_support(struct mlx5e_rss *rss) + { +- return rss->inner_ft_support; ++ return rss->params.inner_ft_support; + } + + void mlx5e_rss_params_indir_modify_actual_size(struct mlx5e_rss *rss, u32 num_channels) +@@ -198,6 +197,7 @@ mlx5e_rss_create_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, + const struct mlx5e_packet_merge_param *pkt_merge_param, + bool inner) + { ++ bool rss_inner = rss->params.inner_ft_support; + struct mlx5e_rss_params_traffic_type rss_tt; + struct mlx5e_tir_builder *builder; + struct mlx5e_tir **tir_p; +@@ -205,7 +205,7 @@ mlx5e_rss_create_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, + u32 rqtn; + int err; + +- if (inner && !rss->inner_ft_support) { ++ if (inner && !rss_inner) { + mlx5e_rss_warn(rss->mdev, + "Cannot create inner indirect TIR[%d], RSS inner FT is not supported.\n", + tt); +@@ -228,7 +228,7 @@ mlx5e_rss_create_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, + + rqtn = mlx5e_rqt_get_rqtn(&rss->rqt); + mlx5e_tir_builder_build_rqt(builder, rss->mdev->mlx5e_res.hw_objs.td.tdn, +- rqtn, rss->inner_ft_support); ++ rqtn, rss_inner); + mlx5e_tir_builder_build_packet_merge(builder, pkt_merge_param); + rss_tt = mlx5e_rss_get_tt_config(rss, tt); + mlx5e_tir_builder_build_rss(builder, &rss->hash, &rss_tt, inner); +@@ -337,7 +337,7 @@ static int mlx5e_rss_update_tirs(struct mlx5e_rss *rss) + tt, err); + } + +- if (!rss->inner_ft_support) ++ if (!rss->params.inner_ft_support) + continue; + + err = mlx5e_rss_update_tir(rss, tt, true); +@@ -357,11 +357,13 @@ static int mlx5e_rss_init_no_tirs(struct mlx5e_rss *rss) + refcount_set(&rss->refcnt, 1); + + return mlx5e_rqt_init_direct(&rss->rqt, rss->mdev, true, +- rss->drop_rqn, rss->indir.max_table_size); ++ rss->params.drop_rqn, ++ rss->indir.max_table_size); + } + + struct mlx5e_rss * +-mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, ++mlx5e_rss_init(struct mlx5_core_dev *mdev, ++ const struct mlx5e_rss_params *params, + const struct mlx5e_rss_init_params *init_params) + { + u32 rqt_max_size, rqt_size; +@@ -379,8 +381,7 @@ mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, + goto err_free_rss; + + rss->mdev = mdev; +- rss->inner_ft_support = inner_ft_support; +- rss->drop_rqn = drop_rqn; ++ rss->params = *params; + + err = mlx5e_rss_init_no_tirs(rss); + if (err) +@@ -394,7 +395,7 @@ mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, + if (err) + goto err_destroy_rqt; + +- if (inner_ft_support) { ++ if (params->inner_ft_support) { + err = mlx5e_rss_create_tirs(rss, + init_params->pkt_merge_param, + true); +@@ -423,7 +424,7 @@ int mlx5e_rss_cleanup(struct mlx5e_rss *rss) + + mlx5e_rss_destroy_tirs(rss, false); + +- if (rss->inner_ft_support) ++ if (rss->params.inner_ft_support) + mlx5e_rss_destroy_tirs(rss, true); + + mlx5e_rqt_destroy(&rss->rqt); +@@ -453,7 +454,7 @@ u32 mlx5e_rss_get_tirn(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, + { + struct mlx5e_tir *tir; + +- WARN_ON(inner && !rss->inner_ft_support); ++ WARN_ON(inner && !rss->params.inner_ft_support); + tir = rss_get_tir(rss, tt, inner); + WARN_ON(!tir); + +@@ -517,10 +518,11 @@ void mlx5e_rss_disable(struct mlx5e_rss *rss) + int err; + + rss->enabled = false; +- err = mlx5e_rqt_redirect_direct(&rss->rqt, rss->drop_rqn, NULL); ++ err = mlx5e_rqt_redirect_direct(&rss->rqt, rss->params.drop_rqn, NULL); + if (err) + mlx5e_rss_warn(rss->mdev, "Failed to redirect RQT %#x to drop RQ %#x: err = %d\n", +- mlx5e_rqt_get_rqtn(&rss->rqt), rss->drop_rqn, err); ++ mlx5e_rqt_get_rqtn(&rss->rqt), ++ rss->params.drop_rqn, err); + } + + int mlx5e_rss_packet_merge_set_param(struct mlx5e_rss *rss, +@@ -553,7 +555,7 @@ int mlx5e_rss_packet_merge_set_param(struct mlx5e_rss *rss, + } + + inner_tir: +- if (!rss->inner_ft_support) ++ if (!rss->params.inner_ft_support) + continue; + + tir = rss_get_tir(rss, tt, true); +@@ -686,7 +688,7 @@ int mlx5e_rss_set_hash_fields(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, + return err; + } + +- if (!(rss->inner_ft_support)) ++ if (!(rss->params.inner_ft_support)) + return 0; + + err = mlx5e_rss_update_tir(rss, tt, true); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +index 80225709675b..5fb03cd0a411 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +@@ -20,6 +20,11 @@ struct mlx5e_rss_init_params { + unsigned int max_nch; + }; + ++struct mlx5e_rss_params { ++ bool inner_ft_support; ++ u32 drop_rqn; ++}; ++ + struct mlx5e_rss_params_traffic_type + mlx5e_rss_get_default_tt_config(enum mlx5_traffic_types tt); + +@@ -30,7 +35,8 @@ int mlx5e_rss_params_indir_init(struct mlx5e_rss_params_indir *indir, + void mlx5e_rss_params_indir_cleanup(struct mlx5e_rss_params_indir *indir); + void mlx5e_rss_params_indir_modify_actual_size(struct mlx5e_rss *rss, u32 num_channels); + struct mlx5e_rss * +-mlx5e_rss_init(struct mlx5_core_dev *mdev, bool inner_ft_support, u32 drop_rqn, ++mlx5e_rss_init(struct mlx5_core_dev *mdev, ++ const struct mlx5e_rss_params *params, + const struct mlx5e_rss_init_params *init_params); + int mlx5e_rss_cleanup(struct mlx5e_rss *rss); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +index 74dda61e92bc..ac26a32845d0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +@@ -55,6 +55,7 @@ static int mlx5e_rx_res_rss_init_def(struct mlx5e_rx_res *res, + { + bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT; + struct mlx5e_rss_init_params init_params; ++ struct mlx5e_rss_params rss_params; + struct mlx5e_rss *rss; + + if (WARN_ON(res->rss[0])) +@@ -67,8 +68,12 @@ static int mlx5e_rx_res_rss_init_def(struct mlx5e_rx_res *res, + .max_nch = res->max_nch, + }; + +- rss = mlx5e_rss_init(res->mdev, inner_ft_support, res->drop_rqn, +- &init_params); ++ rss_params = (struct mlx5e_rss_params) { ++ .inner_ft_support = inner_ft_support, ++ .drop_rqn = res->drop_rqn, ++ }; ++ ++ rss = mlx5e_rss_init(res->mdev, &rss_params, &init_params); + if (IS_ERR(rss)) + return PTR_ERR(rss); + +@@ -83,6 +88,7 @@ int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 rss_idx, unsigned int in + { + bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT; + struct mlx5e_rss_init_params init_params; ++ struct mlx5e_rss_params rss_params; + struct mlx5e_rss *rss; + + if (WARN_ON_ONCE(res->rss[rss_idx])) +@@ -95,8 +101,12 @@ int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 rss_idx, unsigned int in + .max_nch = res->max_nch, + }; + +- rss = mlx5e_rss_init(res->mdev, inner_ft_support, res->drop_rqn, +- &init_params); ++ rss_params = (struct mlx5e_rss_params) { ++ .inner_ft_support = inner_ft_support, ++ .drop_rqn = res->drop_rqn, ++ }; ++ ++ rss = mlx5e_rss_init(res->mdev, &rss_params, &init_params); + if (IS_ERR(rss)) + return PTR_ERR(rss); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1571-net-mlx5e-use-extack-in-set-rxfh-callback.patch b/SOURCES/1571-net-mlx5e-use-extack-in-set-rxfh-callback.patch new file mode 100644 index 000000000..21c34d6cd --- /dev/null +++ b/SOURCES/1571-net-mlx5e-use-extack-in-set-rxfh-callback.patch @@ -0,0 +1,85 @@ +From 96f1787f529352d9da96d303daa083686caa2c71 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5e: Use extack in set rxfh callback + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a833538d1d8db96b78bac04eec9be51b297f1d23 +Author: Gal Pressman +Date: Mon Sep 29 00:25:23 2025 +0300 + + net/mlx5e: Use extack in set rxfh callback + + The ->set/create/modify_rxfh() callbacks now pass a valid extack instead + of NULL through netlink [1]. In case of an error, reflect it through + extack instead of a dmesg print. + + [1] + commit c0ae03588bbb ("ethtool: rss: initial RSS_SET (indirection table handling)") + + Signed-off-by: Gal Pressman + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759094723-843774-8-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index 81e819f8722c..64f315089b04 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -1494,7 +1494,8 @@ static int mlx5e_get_rxfh(struct net_device *netdev, struct ethtool_rxfh_param * + } + + static int mlx5e_rxfh_hfunc_check(struct mlx5e_priv *priv, +- const struct ethtool_rxfh_param *rxfh) ++ const struct ethtool_rxfh_param *rxfh, ++ struct netlink_ext_ack *extack) + { + unsigned int count; + +@@ -1504,8 +1505,10 @@ static int mlx5e_rxfh_hfunc_check(struct mlx5e_priv *priv, + unsigned int xor8_max_channels = mlx5e_rqt_max_num_channels_allowed_for_xor8(); + + if (count > xor8_max_channels) { +- netdev_err(priv->netdev, "%s: Cannot set RSS hash function to XOR, current number of channels (%d) exceeds the maximum allowed for XOR8 RSS hfunc (%d)\n", +- __func__, count, xor8_max_channels); ++ NL_SET_ERR_MSG_FMT_MOD( ++ extack, ++ "Number of channels (%u) exceeds the max for XOR8 RSS (%u)", ++ count, xor8_max_channels); + return -EINVAL; + } + } +@@ -1524,7 +1527,7 @@ static int mlx5e_set_rxfh(struct net_device *dev, + + mutex_lock(&priv->state_lock); + +- err = mlx5e_rxfh_hfunc_check(priv, rxfh); ++ err = mlx5e_rxfh_hfunc_check(priv, rxfh, extack); + if (err) + goto unlock; + +@@ -1550,7 +1553,7 @@ static int mlx5e_create_rxfh_context(struct net_device *dev, + + mutex_lock(&priv->state_lock); + +- err = mlx5e_rxfh_hfunc_check(priv, rxfh); ++ err = mlx5e_rxfh_hfunc_check(priv, rxfh, extack); + if (err) + goto unlock; + +@@ -1590,7 +1593,7 @@ static int mlx5e_modify_rxfh_context(struct net_device *dev, + + mutex_lock(&priv->state_lock); + +- err = mlx5e_rxfh_hfunc_check(priv, rxfh); ++ err = mlx5e_rxfh_hfunc_check(priv, rxfh, extack); + if (err) + goto unlock; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1572-net-mlx5-prevent-tunnel-mode-conflicts-between-fdb-and-nic-i.patch b/SOURCES/1572-net-mlx5-prevent-tunnel-mode-conflicts-between-fdb-and-nic-i.patch new file mode 100644 index 000000000..ebfe8eed8 --- /dev/null +++ b/SOURCES/1572-net-mlx5-prevent-tunnel-mode-conflicts-between-fdb-and-nic-i.patch @@ -0,0 +1,134 @@ +From 80f8dd4a894d8b6f78de86efd7bb9db3fce351c7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5: Prevent tunnel mode conflicts between FDB and NIC + IPsec tables +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7593439c13933164f701eed9c83d89358f203469 +Author: Carolina Jubran +Date: Sun Oct 5 11:29:57 2025 +0300 + + net/mlx5: Prevent tunnel mode conflicts between FDB and NIC IPsec tables + + When creating IPsec flow tables with tunnel mode enabled, the driver + uses mlx5_eswitch_block_encap() to prevent tunnel encapsulation + conflicts across different domains (NIC_RX/NIC_TX and FDB), since the + firmware doesn’t allow both at the same time. + + Currently, the driver attempts to reserve tunnel mode unconditionally + for both NIC and FDB IPsec tables. This can lead to conflicting tunnel + mode setups, for example, if a flow table was created in the FDB + domain with tunnel offload enabled, and we later try to create another + one in the NIC, or vice versa. + + To resolve this, adjust the blocking logic so that tunnel mode is only + reserved by NIC flows. This ensures that tunnel offload is exclusively + used in either the NIC or the FDB, and avoids unintended offload + conflicts. + + Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode") + Fixes: c6c2bf5db4ea ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode") + Signed-off-by: Carolina Jubran + Reviewed-by: Jianbo Liu + Reviewed-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759652999-858513-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index 6ccfc2af07b7..0bc080274584 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -1069,7 +1069,9 @@ static int rx_create(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec, + + /* Create FT */ + if (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_TUNNEL) +- rx->allow_tunnel_mode = mlx5_eswitch_block_encap(mdev); ++ rx->allow_tunnel_mode = ++ mlx5_eswitch_block_encap(mdev, rx == ipsec->rx_esw); ++ + if (rx->allow_tunnel_mode) + flags = MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT; + ft = ipsec_ft_create(attr.ns, attr.sa_level, attr.prio, 1, 2, flags); +@@ -1310,7 +1312,9 @@ static int tx_create(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_tx *tx, + goto err_status_rule; + + if (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_TUNNEL) +- tx->allow_tunnel_mode = mlx5_eswitch_block_encap(mdev); ++ tx->allow_tunnel_mode = ++ mlx5_eswitch_block_encap(mdev, tx == ipsec->tx_esw); ++ + if (tx->allow_tunnel_mode) + flags = MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT; + ft = ipsec_ft_create(tx->ns, attr.sa_level, attr.prio, 1, 4, flags); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index df3756d7e52e..16eb99aba2a7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -879,7 +879,7 @@ void mlx5_eswitch_offloads_single_fdb_del_one(struct mlx5_eswitch *master_esw, + struct mlx5_eswitch *slave_esw); + int mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw); + +-bool mlx5_eswitch_block_encap(struct mlx5_core_dev *dev); ++bool mlx5_eswitch_block_encap(struct mlx5_core_dev *dev, bool from_fdb); + void mlx5_eswitch_unblock_encap(struct mlx5_core_dev *dev); + + int mlx5_eswitch_block_mode(struct mlx5_core_dev *dev); +@@ -974,7 +974,8 @@ mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw) + return 0; + } + +-static inline bool mlx5_eswitch_block_encap(struct mlx5_core_dev *dev) ++static inline bool ++mlx5_eswitch_block_encap(struct mlx5_core_dev *dev, bool from_fdb) + { + return true; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 52c3de24bea3..4cf995be127d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -4006,23 +4006,25 @@ int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode) + return esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode); + } + +-bool mlx5_eswitch_block_encap(struct mlx5_core_dev *dev) ++bool mlx5_eswitch_block_encap(struct mlx5_core_dev *dev, bool from_fdb) + { + struct mlx5_eswitch *esw = dev->priv.eswitch; ++ enum devlink_eswitch_encap_mode encap; ++ bool allow_tunnel = false; + + if (!mlx5_esw_allowed(esw)) + return true; + + down_write(&esw->mode_lock); +- if (esw->mode != MLX5_ESWITCH_LEGACY && +- esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE) { +- up_write(&esw->mode_lock); +- return false; ++ encap = esw->offloads.encap; ++ if (esw->mode == MLX5_ESWITCH_LEGACY || ++ (encap == DEVLINK_ESWITCH_ENCAP_MODE_NONE && !from_fdb)) { ++ allow_tunnel = true; ++ esw->offloads.num_block_encap++; + } +- +- esw->offloads.num_block_encap++; + up_write(&esw->mode_lock); +- return true; ++ ++ return allow_tunnel; + } + + void mlx5_eswitch_unblock_encap(struct mlx5_core_dev *dev) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1573-net-mlx5e-prevent-tunnel-reformat-when-tunnel-mode-not-allow.patch b/SOURCES/1573-net-mlx5e-prevent-tunnel-reformat-when-tunnel-mode-not-allow.patch new file mode 100644 index 000000000..fc0cf5b05 --- /dev/null +++ b/SOURCES/1573-net-mlx5e-prevent-tunnel-reformat-when-tunnel-mode-not-allow.patch @@ -0,0 +1,188 @@ +From 9822a1419dfc189e8c1029d8c8df460c5300de89 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:31 -0400 +Subject: [PATCH] net/mlx5e: Prevent tunnel reformat when tunnel mode not + allowed + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 22239eb258bc1e6ccdb2d3502fce1cc2b2a88386 +Author: Carolina Jubran +Date: Sun Oct 5 11:29:58 2025 +0300 + + net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed + + When configuring IPsec packet offload in tunnel mode, the driver tries + to create tunnel reformat objects unconditionally. This is incorrect, + because tunnel mode is only permitted under specific encapsulation + settings, and that decision is already made when the flow table is + created. + + The offending commit attempted to block this case in the state add + path, but the check there happens too late and does not prevent the + reformat from being configured. + + Fix by taking short reservations for both the eswitch mode and the + encap at the start of state setup. This preserves the block ordering + (mode --> encap) used later: the mode is blocked during RX/TX get, and + the encap is blocked during flow-table creation. This lets us fail + early if either reservation cannot be obtained, it means a mode + transition is underway or a conflicting configuration already owns + encap. If both succeed, the flow-table path later takes the ownership + and the reservations are released on exit. + + Fixes: 146c196b60e4 ("net/mlx5e: Create IPsec table with tunnel support only when encap is disabled") + Signed-off-by: Carolina Jubran + Reviewed-by: Jianbo Liu + Reviewed-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1759652999-858513-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +index 00e77c71e201..0a4fb8c92268 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +@@ -772,6 +772,7 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + struct netlink_ext_ack *extack) + { + struct mlx5e_ipsec_sa_entry *sa_entry = NULL; ++ bool allow_tunnel_mode = false; + struct mlx5e_ipsec *ipsec; + struct mlx5e_priv *priv; + gfp_t gfp; +@@ -803,6 +804,20 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + goto err_xfrm; + } + ++ if (mlx5_eswitch_block_mode(priv->mdev)) ++ goto unblock_ipsec; ++ ++ if (x->props.mode == XFRM_MODE_TUNNEL && ++ x->xso.type == XFRM_DEV_OFFLOAD_PACKET) { ++ allow_tunnel_mode = mlx5e_ipsec_fs_tunnel_allowed(sa_entry); ++ if (!allow_tunnel_mode) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Packet offload tunnel mode is disabled due to encap settings"); ++ err = -EINVAL; ++ goto unblock_mode; ++ } ++ } ++ + /* check esn */ + if (x->props.flags & XFRM_STATE_ESN) + mlx5e_ipsec_update_esn_state(sa_entry); +@@ -817,7 +832,7 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + + err = mlx5_ipsec_create_work(sa_entry); + if (err) +- goto unblock_ipsec; ++ goto unblock_encap; + + err = mlx5e_ipsec_create_dwork(sa_entry); + if (err) +@@ -832,14 +847,6 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + if (err) + goto err_hw_ctx; + +- if (x->props.mode == XFRM_MODE_TUNNEL && +- x->xso.type == XFRM_DEV_OFFLOAD_PACKET && +- !mlx5e_ipsec_fs_tunnel_enabled(sa_entry)) { +- NL_SET_ERR_MSG_MOD(extack, "Packet offload tunnel mode is disabled due to encap settings"); +- err = -EINVAL; +- goto err_add_rule; +- } +- + /* We use *_bh() variant because xfrm_timer_handler(), which runs + * in softirq context, can reach our state delete logic and we need + * xa_erase_bh() there. +@@ -855,8 +862,7 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + queue_delayed_work(ipsec->wq, &sa_entry->dwork->dwork, + MLX5_IPSEC_RESCHED); + +- if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET && +- x->props.mode == XFRM_MODE_TUNNEL) { ++ if (allow_tunnel_mode) { + xa_lock_bh(&ipsec->sadb); + __xa_set_mark(&ipsec->sadb, sa_entry->ipsec_obj_id, + MLX5E_IPSEC_TUNNEL_SA); +@@ -865,6 +871,11 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + + out: + x->xso.offload_handle = (unsigned long)sa_entry; ++ if (allow_tunnel_mode) ++ mlx5_eswitch_unblock_encap(priv->mdev); ++ ++ mlx5_eswitch_unblock_mode(priv->mdev); ++ + return 0; + + err_add_rule: +@@ -877,6 +888,11 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + if (sa_entry->work) + kfree(sa_entry->work->data); + kfree(sa_entry->work); ++unblock_encap: ++ if (allow_tunnel_mode) ++ mlx5_eswitch_unblock_encap(priv->mdev); ++unblock_mode: ++ mlx5_eswitch_unblock_mode(priv->mdev); + unblock_ipsec: + mlx5_eswitch_unblock_ipsec(priv->mdev); + err_xfrm: +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +index 23703f28386a..5d7c15abfcaf 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +@@ -319,7 +319,7 @@ void mlx5e_accel_ipsec_fs_del_rule(struct mlx5e_ipsec_sa_entry *sa_entry); + int mlx5e_accel_ipsec_fs_add_pol(struct mlx5e_ipsec_pol_entry *pol_entry); + void mlx5e_accel_ipsec_fs_del_pol(struct mlx5e_ipsec_pol_entry *pol_entry); + void mlx5e_accel_ipsec_fs_modify(struct mlx5e_ipsec_sa_entry *sa_entry); +-bool mlx5e_ipsec_fs_tunnel_enabled(struct mlx5e_ipsec_sa_entry *sa_entry); ++bool mlx5e_ipsec_fs_tunnel_allowed(struct mlx5e_ipsec_sa_entry *sa_entry); + + int mlx5_ipsec_create_sa_ctx(struct mlx5e_ipsec_sa_entry *sa_entry); + void mlx5_ipsec_free_sa_ctx(struct mlx5e_ipsec_sa_entry *sa_entry); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index 0bc080274584..bf1d2769d4f1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -2850,18 +2850,24 @@ void mlx5e_accel_ipsec_fs_modify(struct mlx5e_ipsec_sa_entry *sa_entry) + memcpy(sa_entry, &sa_entry_shadow, sizeof(*sa_entry)); + } + +-bool mlx5e_ipsec_fs_tunnel_enabled(struct mlx5e_ipsec_sa_entry *sa_entry) ++bool mlx5e_ipsec_fs_tunnel_allowed(struct mlx5e_ipsec_sa_entry *sa_entry) + { +- struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs; +- struct mlx5e_ipsec_rx *rx; +- struct mlx5e_ipsec_tx *tx; ++ struct mlx5e_ipsec *ipsec = sa_entry->ipsec; ++ struct xfrm_state *x = sa_entry->x; ++ bool from_fdb; + +- rx = ipsec_rx(sa_entry->ipsec, attrs->addrs.family, attrs->type); +- tx = ipsec_tx(sa_entry->ipsec, attrs->type); +- if (sa_entry->attrs.dir == XFRM_DEV_OFFLOAD_OUT) +- return tx->allow_tunnel_mode; ++ if (x->xso.dir == XFRM_DEV_OFFLOAD_OUT) { ++ struct mlx5e_ipsec_tx *tx = ipsec_tx(ipsec, x->xso.type); ++ ++ from_fdb = (tx == ipsec->tx_esw); ++ } else { ++ struct mlx5e_ipsec_rx *rx = ipsec_rx(ipsec, x->props.family, ++ x->xso.type); ++ ++ from_fdb = (rx == ipsec->rx_esw); ++ } + +- return rx->allow_tunnel_mode; ++ return mlx5_eswitch_block_encap(ipsec->mdev, from_fdb); + } + + void mlx5e_ipsec_handle_mpv_event(int event, struct mlx5e_priv *slave_priv, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1574-net-mlx5-fix-pre-2-40-binutils-assembler-error.patch b/SOURCES/1574-net-mlx5-fix-pre-2-40-binutils-assembler-error.patch new file mode 100644 index 000000000..b2bfef9fc --- /dev/null +++ b/SOURCES/1574-net-mlx5-fix-pre-2-40-binutils-assembler-error.patch @@ -0,0 +1,49 @@ +From 832ffc37e1d39ab4bb0b9980dd9319d3951aeb3b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:32 -0400 +Subject: [PATCH] net/mlx5: fix pre-2.40 binutils assembler error + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e475fa420e6c53a5023e89dbf0d51bd027b5a776 +Author: Arnd Bergmann +Date: Mon Oct 6 13:56:34 2025 +0200 + + net/mlx5: fix pre-2.40 binutils assembler error + + Old binutils versions require a slightly stricter syntax for the .arch_extension + directive and fail with the extra semicolon: + + /tmp/cclfMnj9.s:656: Error: unknown architectural extension `simd;' + + Drop the semicolon to make it work with all supported toolchain version. + + Link: https://lore.kernel.org/all/20251001163655.GA370262@ax162/ + Reported-by: Paolo Abeni + Reported-by: Naresh Kamboju + Suggested-by: Nathan Chancellor + Fixes: fd8c8216648c ("net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs") + Signed-off-by: Arnd Bergmann + Reviewed-by: Nathan Chancellor + Reviewed-by: Patrisious Haddad + Link: https://patch.msgid.link/20251006115640.497169-1-arnd@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +index c281153bd411..05e5fd777d4f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c +@@ -266,7 +266,7 @@ static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16], + if (cpu_has_neon()) { + kernel_neon_begin(); + asm volatile +- (".arch_extension simd;\n\t" ++ (".arch_extension simd\n\t" + "ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t" + "st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%1]" + : +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1575-net-mlx5e-return-1-instead-of-0-in-invalid-case-in-mlx5e-mpw.patch b/SOURCES/1575-net-mlx5e-return-1-instead-of-0-in-invalid-case-in-mlx5e-mpw.patch new file mode 100644 index 000000000..b2f11ea56 --- /dev/null +++ b/SOURCES/1575-net-mlx5e-return-1-instead-of-0-in-invalid-case-in-mlx5e-mpw.patch @@ -0,0 +1,67 @@ +From 7291fd4461e2a5d18ff4d95f7f7da90258d715ac Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:32 -0400 +Subject: [PATCH] net/mlx5e: Return 1 instead of 0 in invalid case in + mlx5e_mpwrq_umr_entry_size() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit aaf043a5688114703ae2c1482b92e7e0754d684e +Author: Nathan Chancellor +Date: Tue Oct 14 13:46:49 2025 -0700 + + net/mlx5e: Return 1 instead of 0 in invalid case in mlx5e_mpwrq_umr_entry_size() + + When building with Clang 20 or newer, there are some objtool warnings + from unexpected fallthroughs to other functions: + + vmlinux.o: warning: objtool: mlx5e_mpwrq_mtts_per_wqe() falls through to next function mlx5e_mpwrq_max_num_entries() + vmlinux.o: warning: objtool: mlx5e_mpwrq_max_log_rq_size() falls through to next function mlx5e_get_linear_rq_headroom() + + LLVM 20 contains an (admittedly problematic [1]) optimization [2] to + convert divide by zero into the equivalent of __builtin_unreachable(), + which invokes undefined behavior and destroys code generation when it is + encountered in a control flow graph. + + mlx5e_mpwrq_umr_entry_size() returns 0 in the default case of an + unrecognized mlx5e_mpwrq_umr_mode value. mlx5e_mpwrq_mtts_per_wqe(), + which is inlined into mlx5e_mpwrq_max_log_rq_size(), uses the result of + mlx5e_mpwrq_umr_entry_size() in a divide operation without checking for + zero, so LLVM is able to infer there will be a divide by zero in this + case and invokes undefined behavior. While there is some proposed work + to isolate this undefined behavior and avoid the destructive code + generation that results in these objtool warnings, code should still be + defensive against divide by zero. + + As the WARN_ONCE() implies that an invalid value should be handled + gracefully, return 1 instead of 0 in the default case so that the + results of this division operation is always valid. + + Fixes: 168723c1f8d6 ("net/mlx5e: xsk: Use umr_mode to calculate striding RQ parameters") + Link: https://lore.kernel.org/CAGG=3QUk8-Ak7YKnRziO4=0z=1C_7+4jF+6ZeDQ9yF+kuTOHOQ@mail.gmail.com/ [1] + Link: https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448 [2] + Closes: https://github.com/ClangBuiltLinux/linux/issues/2131 + Closes: https://github.com/ClangBuiltLinux/linux/issues/2132 + Signed-off-by: Nathan Chancellor + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20251014-mlx5e-avoid-zero-div-from-mlx5e_mpwrq_umr_entry_size-v1-1-dc186b8819ef@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +index 596440c8c364..c948a80a0e9a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +@@ -99,7 +99,7 @@ u8 mlx5e_mpwrq_umr_entry_size(enum mlx5e_mpwrq_umr_mode mode) + return sizeof(struct mlx5_ksm) * 4; + } + WARN_ONCE(1, "MPWRQ UMR mode %d is not known\n", mode); +- return 0; ++ return 1; + } + + u8 mlx5e_mpwrq_log_wqe_sz(struct mlx5_core_dev *mdev, u8 page_shift, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1576-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch b/SOURCES/1576-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch new file mode 100644 index 000000000..b64beae25 --- /dev/null +++ b/SOURCES/1576-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch @@ -0,0 +1,70 @@ +From e0455875e08fcaf37107e9ad673671f24ad54203 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:32 -0400 +Subject: [PATCH] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff + for legacy RQ + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit afd5ba577c10639f62e8120df67dc70ea4b61176 +Author: Amery Hung +Date: Thu Oct 16 22:55:39 2025 +0300 + + net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ + + XDP programs can release xdp_buff fragments when calling + bpf_xdp_adjust_tail(). The driver currently assumes the number of + fragments to be unchanged and may generate skb with wrong truesize or + containing invalid frags. Fix the bug by generating skb according to + xdp_buff after the XDP program runs. + + Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") + Reviewed-by: Dragos Tatulea + Signed-off-by: Amery Hung + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1760644540-899148-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 3301d5495134..753c20d9dc22 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -1772,14 +1772,27 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi + } + + prog = rcu_dereference(rq->xdp_prog); +- if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) { +- if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { +- struct mlx5e_wqe_frag_info *pwi; ++ if (prog) { ++ u8 nr_frags_free, old_nr_frags = sinfo->nr_frags; ++ ++ if (mlx5e_xdp_handle(rq, prog, mxbuf)) { ++ if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, ++ rq->flags)) { ++ struct mlx5e_wqe_frag_info *pwi; ++ ++ wi -= old_nr_frags - sinfo->nr_frags; ++ ++ for (pwi = head_wi; pwi < wi; pwi++) ++ pwi->frag_page->frags++; ++ } ++ return NULL; /* page/packet was consumed by XDP */ ++ } + +- for (pwi = head_wi; pwi < wi; pwi++) +- pwi->frag_page->frags++; ++ nr_frags_free = old_nr_frags - sinfo->nr_frags; ++ if (unlikely(nr_frags_free)) { ++ wi -= nr_frags_free; ++ truesize -= nr_frags_free * frag_info->frag_stride; + } +- return NULL; /* page/packet was consumed by XDP */ + } + + skb = mlx5e_build_linear_skb( +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1577-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch b/SOURCES/1577-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch new file mode 100644 index 000000000..433ac0b33 --- /dev/null +++ b/SOURCES/1577-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch @@ -0,0 +1,122 @@ +From 7062cbfda71609deef84add064a6f2f936a52326 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:32 -0400 +Subject: [PATCH] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff + for striding RQ + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 87bcef158ac1faca1bd7e0104588e8e2956d10be +Author: Amery Hung +Date: Thu Oct 16 22:55:40 2025 +0300 + + net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ + + XDP programs can change the layout of an xdp_buff through + bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver + cannot assume the size of the linear data area nor fragments. Fix the + bug in mlx5 by generating skb according to xdp_buff after XDP programs + run. + + Currently, when handling multi-buf XDP, the mlx5 driver assumes the + layout of an xdp_buff to be unchanged. That is, the linear data area + continues to be empty and fragments remain the same. This may cause + the driver to generate erroneous skb or triggering a kernel + warning. When an XDP program added linear data through + bpf_xdp_adjust_head(), the linear data will be ignored as + mlx5e_build_linear_skb() builds an skb without linear data and then + pull data from fragments to fill the linear data area. When an XDP + program has shrunk the non-linear data through bpf_xdp_adjust_tail(), + the delta passed to __pskb_pull_tail() may exceed the actual nonlinear + data size and trigger the BUG_ON in it. + + To fix the issue, first record the original number of fragments. If the + number of fragments changes after the XDP program runs, rewind the end + fragment pointer by the difference and recalculate the truesize. Then, + build the skb with the linear data area matching the xdp_buff. Finally, + only pull data in if there is non-linear data and fill the linear part + up to 256 bytes. + + Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ") + Signed-off-by: Amery Hung + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1760644540-899148-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 753c20d9dc22..21be5dcf47d5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -2016,6 +2016,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + u32 byte_cnt = cqe_bcnt; + struct skb_shared_info *sinfo; + unsigned int truesize = 0; ++ u32 pg_consumed_bytes; + struct bpf_prog *prog; + struct sk_buff *skb; + u32 linear_frame_sz; +@@ -2068,7 +2069,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + + while (byte_cnt) { + /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */ +- u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); ++ pg_consumed_bytes = ++ min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); + + if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) + truesize += pg_consumed_bytes; +@@ -2084,10 +2086,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + } + + if (prog) { ++ u8 nr_frags_free, old_nr_frags = sinfo->nr_frags; ++ u32 len; ++ + if (mlx5e_xdp_handle(rq, prog, mxbuf)) { + if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { + struct mlx5e_frag_page *pfp; + ++ frag_page -= old_nr_frags - sinfo->nr_frags; ++ + for (pfp = head_page; pfp < frag_page; pfp++) + pfp->frags++; + +@@ -2098,9 +2105,19 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + return NULL; /* page/packet was consumed by XDP */ + } + ++ nr_frags_free = old_nr_frags - sinfo->nr_frags; ++ if (unlikely(nr_frags_free)) { ++ frag_page -= nr_frags_free; ++ truesize -= (nr_frags_free - 1) * PAGE_SIZE + ++ ALIGN(pg_consumed_bytes, ++ BIT(rq->mpwqe.log_stride_sz)); ++ } ++ ++ len = mxbuf->xdp.data_end - mxbuf->xdp.data; ++ + skb = mlx5e_build_linear_skb( + rq, mxbuf->xdp.data_hard_start, linear_frame_sz, +- mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0, ++ mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len, + mxbuf->xdp.data - mxbuf->xdp.data_meta); + if (unlikely(!skb)) { + mlx5e_page_release_fragmented(rq->page_pool, +@@ -2125,8 +2142,11 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + do + pagep->frags++; + while (++pagep < frag_page); ++ ++ headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, ++ skb->data_len); ++ __pskb_pull_tail(skb, headlen); + } +- __pskb_pull_tail(skb, headlen); + } else { + dma_addr_t addr; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1578-net-mlx5-add-pphcr-to-pcam-supported-registers-mask.patch b/SOURCES/1578-net-mlx5-add-pphcr-to-pcam-supported-registers-mask.patch new file mode 100644 index 000000000..758c1a0ba --- /dev/null +++ b/SOURCES/1578-net-mlx5-add-pphcr-to-pcam-supported-registers-mask.patch @@ -0,0 +1,43 @@ +From 1ab436443ca3370eb1dbe023658aaa6afc08020a Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:32 -0400 +Subject: [PATCH] net/mlx5: Add PPHCR to PCAM supported registers mask + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit bb65e0c141f879cdf54db11ae446ee3605fb54d5 +Author: Alexei Lazar +Date: Wed Oct 22 15:29:39 2025 +0300 + + net/mlx5: Add PPHCR to PCAM supported registers mask + + Add the PPHCR bit to the port_access_reg_cap_mask field of PCAM + register to indicate that the device supports the PPHCR register + and the RS-FEC histogram feature. + + Signed-off-by: Alexei Lazar + Reviewed-by: Yael Chemla + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761136182-918470-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 5e2bc469ca64..2207404a125c 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -10832,7 +10832,9 @@ struct mlx5_ifc_pcam_regs_5000_to_507f_bits { + u8 port_access_reg_cap_mask_127_to_96[0x20]; + u8 port_access_reg_cap_mask_95_to_64[0x20]; + +- u8 port_access_reg_cap_mask_63_to_36[0x1c]; ++ u8 port_access_reg_cap_mask_63[0x1]; ++ u8 pphcr[0x1]; ++ u8 port_access_reg_cap_mask_61_to_36[0x1a]; + u8 pplm[0x1]; + u8 port_access_reg_cap_mask_34_to_32[0x3]; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1579-net-mlx5-refactor-devcom-to-return-null-on-failure.patch b/SOURCES/1579-net-mlx5-refactor-devcom-to-return-null-on-failure.patch new file mode 100644 index 000000000..1b1bd841a --- /dev/null +++ b/SOURCES/1579-net-mlx5-refactor-devcom-to-return-null-on-failure.patch @@ -0,0 +1,302 @@ +From 0c2f2f4265dbabbfa5f3e8cf8225fcd6eef57bc8 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:32 -0400 +Subject: [PATCH] net/mlx5: Refactor devcom to return NULL on failure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 8f82f89550daafc8ca3ba74c389ae1b4afdd75c8 +Author: Patrisious Haddad +Date: Wed Oct 22 15:29:41 2025 +0300 + + net/mlx5: Refactor devcom to return NULL on failure + + Devcom device and component registration isn't always critical to the + functionality of the caller, hence the registration can fail and we can + continue working with an ERR_PTR value saved inside a variable. + + In order to avoid that make sure all devcom failures return NULL. + + Signed-off-by: Patrisious Haddad + Reviewed-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761136182-918470-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 26d35a2653dc..7ab4fb83dff2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -241,8 +241,8 @@ static int mlx5e_devcom_init_mpv(struct mlx5e_priv *priv, u64 *data) + &attr, + mlx5e_devcom_event_mpv, + priv); +- if (IS_ERR(priv->devcom)) +- return PTR_ERR(priv->devcom); ++ if (!priv->devcom) ++ return -EINVAL; + + if (mlx5_core_is_mp_master(priv->mdev)) { + mlx5_devcom_send_event(priv->devcom, MPV_DEVCOM_MASTER_UP, +@@ -255,7 +255,7 @@ static int mlx5e_devcom_init_mpv(struct mlx5e_priv *priv, u64 *data) + + static void mlx5e_devcom_cleanup_mpv(struct mlx5e_priv *priv) + { +- if (IS_ERR_OR_NULL(priv->devcom)) ++ if (!priv->devcom) + return; + + if (mlx5_core_is_mp_master(priv->mdev)) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 4cf995be127d..34749814f19b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -3129,7 +3129,7 @@ void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, + attr, + mlx5_esw_offloads_devcom_event, + esw); +- if (IS_ERR(esw->devcom)) ++ if (!esw->devcom) + return; + + mlx5_devcom_send_event(esw->devcom, +@@ -3140,7 +3140,7 @@ void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw, + + void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw) + { +- if (IS_ERR_OR_NULL(esw->devcom)) ++ if (!esw->devcom) + return; + + mlx5_devcom_send_event(esw->devcom, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +index 59c00c911275..3db0387bf6dc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +@@ -1430,11 +1430,10 @@ static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev) + mlx5_devcom_register_component(dev->priv.devc, + MLX5_DEVCOM_HCA_PORTS, + &attr, NULL, dev); +- if (IS_ERR(dev->priv.hca_devcom_comp)) { ++ if (!dev->priv.hca_devcom_comp) { + mlx5_core_err(dev, +- "Failed to register devcom HCA component, err: %ld\n", +- PTR_ERR(dev->priv.hca_devcom_comp)); +- return PTR_ERR(dev->priv.hca_devcom_comp); ++ "Failed to register devcom HCA component."); ++ return -EINVAL; + } + + return 0; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +index d0ba83d77cd1..29e7fa09c32c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +@@ -1444,7 +1444,7 @@ static void mlx5_shared_clock_register(struct mlx5_core_dev *mdev, u64 key) + compd = mlx5_devcom_register_component(mdev->priv.devc, + MLX5_DEVCOM_SHARED_CLOCK, + &attr, NULL, mdev); +- if (IS_ERR(compd)) ++ if (!compd) + return; + + mdev->clock_state->compdev = compd; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +index faa2833602c8..e749618229bc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +@@ -76,20 +76,18 @@ mlx5_devcom_dev_alloc(struct mlx5_core_dev *dev) + struct mlx5_devcom_dev * + mlx5_devcom_register_device(struct mlx5_core_dev *dev) + { +- struct mlx5_devcom_dev *devc; ++ struct mlx5_devcom_dev *devc = NULL; + + mutex_lock(&dev_list_lock); + + if (devcom_dev_exists(dev)) { +- devc = ERR_PTR(-EEXIST); ++ mlx5_core_err(dev, "devcom device already exists"); + goto out; + } + + devc = mlx5_devcom_dev_alloc(dev); +- if (!devc) { +- devc = ERR_PTR(-ENOMEM); ++ if (!devc) + goto out; +- } + + list_add_tail(&devc->list, &devcom_dev_list); + out: +@@ -110,8 +108,10 @@ mlx5_devcom_dev_release(struct kref *ref) + + void mlx5_devcom_unregister_device(struct mlx5_devcom_dev *devc) + { +- if (!IS_ERR_OR_NULL(devc)) +- kref_put(&devc->ref, mlx5_devcom_dev_release); ++ if (!devc) ++ return; ++ ++ kref_put(&devc->ref, mlx5_devcom_dev_release); + } + + static struct mlx5_devcom_comp * +@@ -122,7 +122,7 @@ mlx5_devcom_comp_alloc(u64 id, const struct mlx5_devcom_match_attr *attr, + + comp = kzalloc(sizeof(*comp), GFP_KERNEL); + if (!comp) +- return ERR_PTR(-ENOMEM); ++ return NULL; + + comp->id = id; + comp->key.key = attr->key; +@@ -160,7 +160,7 @@ devcom_alloc_comp_dev(struct mlx5_devcom_dev *devc, + + devcom = kzalloc(sizeof(*devcom), GFP_KERNEL); + if (!devcom) +- return ERR_PTR(-ENOMEM); ++ return NULL; + + kref_get(&devc->ref); + devcom->devc = devc; +@@ -240,31 +240,28 @@ mlx5_devcom_register_component(struct mlx5_devcom_dev *devc, + mlx5_devcom_event_handler_t handler, + void *data) + { +- struct mlx5_devcom_comp_dev *devcom; ++ struct mlx5_devcom_comp_dev *devcom = NULL; + struct mlx5_devcom_comp *comp; + +- if (IS_ERR_OR_NULL(devc)) +- return ERR_PTR(-EINVAL); ++ if (!devc) ++ return NULL; + + mutex_lock(&comp_list_lock); + comp = devcom_component_get(devc, id, attr, handler); +- if (IS_ERR(comp)) { +- devcom = ERR_PTR(-EINVAL); ++ if (IS_ERR(comp)) + goto out_unlock; +- } + + if (!comp) { + comp = mlx5_devcom_comp_alloc(id, attr, handler); +- if (IS_ERR(comp)) { +- devcom = ERR_CAST(comp); ++ if (!comp) + goto out_unlock; +- } ++ + list_add_tail(&comp->comp_list, &devcom_comp_list); + } + mutex_unlock(&comp_list_lock); + + devcom = devcom_alloc_comp_dev(devc, comp, data); +- if (IS_ERR(devcom)) ++ if (!devcom) + kref_put(&comp->ref, mlx5_devcom_comp_release); + + return devcom; +@@ -276,8 +273,10 @@ mlx5_devcom_register_component(struct mlx5_devcom_dev *devc, + + void mlx5_devcom_unregister_component(struct mlx5_devcom_comp_dev *devcom) + { +- if (!IS_ERR_OR_NULL(devcom)) +- devcom_free_comp_dev(devcom); ++ if (!devcom) ++ return; ++ ++ devcom_free_comp_dev(devcom); + } + + int mlx5_devcom_comp_get_size(struct mlx5_devcom_comp_dev *devcom) +@@ -296,7 +295,7 @@ int mlx5_devcom_send_event(struct mlx5_devcom_comp_dev *devcom, + int err = 0; + void *data; + +- if (IS_ERR_OR_NULL(devcom)) ++ if (!devcom) + return -ENODEV; + + comp = devcom->comp; +@@ -338,7 +337,7 @@ void mlx5_devcom_comp_set_ready(struct mlx5_devcom_comp_dev *devcom, bool ready) + + bool mlx5_devcom_comp_is_ready(struct mlx5_devcom_comp_dev *devcom) + { +- if (IS_ERR_OR_NULL(devcom)) ++ if (!devcom) + return false; + + return READ_ONCE(devcom->comp->ready); +@@ -348,7 +347,7 @@ bool mlx5_devcom_for_each_peer_begin(struct mlx5_devcom_comp_dev *devcom) + { + struct mlx5_devcom_comp *comp; + +- if (IS_ERR_OR_NULL(devcom)) ++ if (!devcom) + return false; + + comp = devcom->comp; +@@ -421,21 +420,21 @@ void *mlx5_devcom_get_next_peer_data_rcu(struct mlx5_devcom_comp_dev *devcom, + + void mlx5_devcom_comp_lock(struct mlx5_devcom_comp_dev *devcom) + { +- if (IS_ERR_OR_NULL(devcom)) ++ if (!devcom) + return; + down_write(&devcom->comp->sem); + } + + void mlx5_devcom_comp_unlock(struct mlx5_devcom_comp_dev *devcom) + { +- if (IS_ERR_OR_NULL(devcom)) ++ if (!devcom) + return; + up_write(&devcom->comp->sem); + } + + int mlx5_devcom_comp_trylock(struct mlx5_devcom_comp_dev *devcom) + { +- if (IS_ERR_OR_NULL(devcom)) ++ if (!devcom) + return 0; + return down_write_trylock(&devcom->comp->sem); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c +index f5c2701f6e87..8e17daae48af 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c +@@ -221,8 +221,8 @@ static int sd_register(struct mlx5_core_dev *dev) + attr.net = mlx5_core_net(dev); + devcom = mlx5_devcom_register_component(dev->priv.devc, MLX5_DEVCOM_SD_GROUP, + &attr, NULL, dev); +- if (IS_ERR(devcom)) +- return PTR_ERR(devcom); ++ if (!devcom) ++ return -EINVAL; + + sd->devcom = devcom; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 77f587f97a2d..81930a461e62 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -978,9 +978,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + int err; + + dev->priv.devc = mlx5_devcom_register_device(dev); +- if (IS_ERR(dev->priv.devc)) +- mlx5_core_warn(dev, "failed to register devcom device %pe\n", +- dev->priv.devc); ++ if (!dev->priv.devc) ++ mlx5_core_warn(dev, "failed to register devcom device\n"); + + err = mlx5_query_board_id(dev); + if (err) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1580-net-mlx5-fix-ipsec-cleanup-over-mpv-device.patch b/SOURCES/1580-net-mlx5-fix-ipsec-cleanup-over-mpv-device.patch new file mode 100644 index 000000000..8e937283c --- /dev/null +++ b/SOURCES/1580-net-mlx5-fix-ipsec-cleanup-over-mpv-device.patch @@ -0,0 +1,201 @@ +From 62aa720413bd9440055eeee9a13abe6090797204 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:33 -0400 +Subject: [PATCH] net/mlx5: Fix IPsec cleanup over MPV device + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 664f76be38a18c61151d0ef248c7e2f3afb4f3c7 +Author: Patrisious Haddad +Date: Wed Oct 22 15:29:42 2025 +0300 + + net/mlx5: Fix IPsec cleanup over MPV device + + When we do mlx5e_detach_netdev() we eventually disable blocking events + notifier, among those events are IPsec MPV events from IB to core. + + So before disabling those blocking events, make sure to also unregister + the devcom device and mark all this device operations as complete, + in order to prevent the other device from using invalid netdev + during future devcom events which could cause the trace below. + + BUG: kernel NULL pointer dereference, address: 0000000000000010 + PGD 146427067 P4D 146427067 PUD 146488067 PMD 0 + Oops: Oops: 0000 [#1] SMP + CPU: 1 UID: 0 PID: 7735 Comm: devlink Tainted: GW 6.12.0-rc6_for_upstream_min_debug_2024_11_08_00_46 #1 + Tainted: [W]=WARN + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 + RIP: 0010:mlx5_devcom_comp_set_ready+0x5/0x40 [mlx5_core] + Code: 00 01 48 83 05 23 32 1e 00 01 41 b8 ed ff ff ff e9 60 ff ff ff 48 83 05 00 32 1e 00 01 eb e3 66 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 47 10 48 83 05 5f 32 1e 00 01 48 8b 50 40 48 85 d2 74 05 40 + RSP: 0018:ffff88811a5c35f8 EFLAGS: 00010206 + RAX: ffff888106e8ab80 RBX: ffff888107d7e200 RCX: ffff88810d6f0a00 + RDX: ffff88810d6f0a00 RSI: 0000000000000001 RDI: 0000000000000000 + RBP: ffff88811a17e620 R08: 0000000000000040 R09: 0000000000000000 + R10: ffff88811a5c3618 R11: 0000000de85d51bd R12: ffff88811a17e600 + R13: ffff88810d6f0a00 R14: 0000000000000000 R15: ffff8881034bda80 + FS: 00007f27bdf89180(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000010 CR3: 000000010f159005 CR4: 0000000000372eb0 + DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 + DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 + Call Trace: + + ? __die+0x20/0x60 + ? page_fault_oops+0x150/0x3e0 + ? exc_page_fault+0x74/0x130 + ? asm_exc_page_fault+0x22/0x30 + ? mlx5_devcom_comp_set_ready+0x5/0x40 [mlx5_core] + mlx5e_devcom_event_mpv+0x42/0x60 [mlx5_core] + mlx5_devcom_send_event+0x8c/0x170 [mlx5_core] + blocking_event+0x17b/0x230 [mlx5_core] + notifier_call_chain+0x35/0xa0 + blocking_notifier_call_chain+0x3d/0x60 + mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core] + mlx5_core_mp_event_replay+0x12/0x20 [mlx5_core] + mlx5_ib_bind_slave_port+0x228/0x2c0 [mlx5_ib] + mlx5_ib_stage_init_init+0x664/0x9d0 [mlx5_ib] + ? idr_alloc_cyclic+0x50/0xb0 + ? __kmalloc_cache_noprof+0x167/0x340 + ? __kmalloc_noprof+0x1a7/0x430 + __mlx5_ib_add+0x34/0xd0 [mlx5_ib] + mlx5r_probe+0xe9/0x310 [mlx5_ib] + ? kernfs_add_one+0x107/0x150 + ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib] + auxiliary_bus_probe+0x3e/0x90 + really_probe+0xc5/0x3a0 + ? driver_probe_device+0x90/0x90 + __driver_probe_device+0x80/0x160 + driver_probe_device+0x1e/0x90 + __device_attach_driver+0x7d/0x100 + bus_for_each_drv+0x80/0xd0 + __device_attach+0xbc/0x1f0 + bus_probe_device+0x86/0xa0 + device_add+0x62d/0x830 + __auxiliary_device_add+0x3b/0xa0 + ? auxiliary_device_init+0x41/0x90 + add_adev+0xd1/0x150 [mlx5_core] + mlx5_rescan_drivers_locked+0x21c/0x300 [mlx5_core] + esw_mode_change+0x6c/0xc0 [mlx5_core] + mlx5_devlink_eswitch_mode_set+0x21e/0x640 [mlx5_core] + devlink_nl_eswitch_set_doit+0x60/0xe0 + genl_family_rcv_msg_doit+0xd0/0x120 + genl_rcv_msg+0x180/0x2b0 + ? devlink_get_from_attrs_lock+0x170/0x170 + ? devlink_nl_eswitch_get_doit+0x290/0x290 + ? devlink_nl_pre_doit_port_optional+0x50/0x50 + ? genl_family_rcv_msg_dumpit+0xf0/0xf0 + netlink_rcv_skb+0x54/0x100 + genl_rcv+0x24/0x40 + netlink_unicast+0x1fc/0x2d0 + netlink_sendmsg+0x1e4/0x410 + __sock_sendmsg+0x38/0x60 + ? sockfd_lookup_light+0x12/0x60 + __sys_sendto+0x105/0x160 + ? __sys_recvmsg+0x4e/0x90 + __x64_sys_sendto+0x20/0x30 + do_syscall_64+0x4c/0x100 + entry_SYSCALL_64_after_hwframe+0x4b/0x53 + RIP: 0033:0x7f27bc91b13a + Code: bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 fa 96 2c 00 45 89 c9 4c 63 d1 48 63 ff 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 f3 c3 0f 1f 40 00 41 55 41 54 4d 89 c5 55 + RSP: 002b:00007fff369557e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c + RAX: ffffffffffffffda RBX: 0000000009c54b10 RCX: 00007f27bc91b13a + RDX: 0000000000000038 RSI: 0000000009c54b10 RDI: 0000000000000006 + RBP: 0000000009c54920 R08: 00007f27bd0030e0 R09: 000000000000000c + R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 + R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 + + Modules linked in: mlx5_vdpa vringh vhost_iotlb vdpa xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_fwctl mlx5_ib ib_uverbs ib_core mlx5_core + CR2: 0000000000000010 + + Fixes: 82f9378c443c ("net/mlx5: Handle IPsec steering upon master unbind/bind") + Signed-off-by: Patrisious Haddad + Reviewed-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761136182-918470-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +index 5d7c15abfcaf..f8eaaf37963b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +@@ -342,6 +342,7 @@ void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry, + void mlx5e_ipsec_handle_mpv_event(int event, struct mlx5e_priv *slave_priv, + struct mlx5e_priv *master_priv); + void mlx5e_ipsec_send_event(struct mlx5e_priv *priv, int event); ++void mlx5e_ipsec_disable_events(struct mlx5e_priv *priv); + + static inline struct mlx5_core_dev * + mlx5e_ipsec_sa2dev(struct mlx5e_ipsec_sa_entry *sa_entry) +@@ -387,6 +388,10 @@ static inline void mlx5e_ipsec_handle_mpv_event(int event, struct mlx5e_priv *sl + static inline void mlx5e_ipsec_send_event(struct mlx5e_priv *priv, int event) + { + } ++ ++static inline void mlx5e_ipsec_disable_events(struct mlx5e_priv *priv) ++{ ++} + #endif + + #endif /* __MLX5E_IPSEC_H__ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index bf1d2769d4f1..feef86fff4bf 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -2893,9 +2893,30 @@ void mlx5e_ipsec_handle_mpv_event(int event, struct mlx5e_priv *slave_priv, + + void mlx5e_ipsec_send_event(struct mlx5e_priv *priv, int event) + { +- if (!priv->ipsec) +- return; /* IPsec not supported */ ++ if (!priv->ipsec || mlx5_devcom_comp_get_size(priv->devcom) < 2) ++ return; /* IPsec not supported or no peers */ + + mlx5_devcom_send_event(priv->devcom, event, event, priv); + wait_for_completion(&priv->ipsec->comp); + } ++ ++void mlx5e_ipsec_disable_events(struct mlx5e_priv *priv) ++{ ++ struct mlx5_devcom_comp_dev *tmp = NULL; ++ struct mlx5e_priv *peer_priv; ++ ++ if (!priv->devcom) ++ return; ++ ++ if (!mlx5_devcom_for_each_peer_begin(priv->devcom)) ++ goto out; ++ ++ peer_priv = mlx5_devcom_get_next_peer_data(priv->devcom, &tmp); ++ if (peer_priv) ++ complete_all(&peer_priv->ipsec->comp); ++ ++ mlx5_devcom_for_each_peer_end(priv->devcom); ++out: ++ mlx5_devcom_unregister_component(priv->devcom); ++ priv->devcom = NULL; ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 7ab4fb83dff2..31a6bfe1ce11 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -265,6 +265,7 @@ static void mlx5e_devcom_cleanup_mpv(struct mlx5e_priv *priv) + } + + mlx5_devcom_unregister_component(priv->devcom); ++ priv->devcom = NULL; + } + + static int blocking_event(struct notifier_block *nb, unsigned long event, void *data) +@@ -6069,6 +6070,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv) + if (mlx5e_monitor_counter_supported(priv)) + mlx5e_monitor_counter_cleanup(priv); + ++ mlx5e_ipsec_disable_events(priv); + mlx5e_disable_blocking_events(priv); + if (priv->en_trap) { + mlx5e_deactivate_trap(priv); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1581-net-mlx5-don-t-zero-user-count-when-destroying-fdb-tables.patch b/SOURCES/1581-net-mlx5-don-t-zero-user-count-when-destroying-fdb-tables.patch new file mode 100644 index 000000000..4dae81030 --- /dev/null +++ b/SOURCES/1581-net-mlx5-don-t-zero-user-count-when-destroying-fdb-tables.patch @@ -0,0 +1,82 @@ +From abfe0d534a4b347055a3e0b2169f2f58b304857d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:24:33 -0400 +Subject: [PATCH] net/mlx5: Don't zero user_count when destroying FDB tables + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 53110232c95ff56067fd96c75a1a1c53d10dcd98 +Author: Cosmin Ratiu +Date: Sun Oct 26 22:20:19 2025 +0200 + + net/mlx5: Don't zero user_count when destroying FDB tables + + esw->user_count tracks how many TC rules are added on an esw via + mlx5e_configure_flower -> mlx5_esw_get -> atomic64_inc(&esw->user_count) + + esw.user_count was unconditionally set to 0 in + esw_destroy_legacy_fdb_table and esw_destroy_offloads_fdb_tables. + + These two together can lead to the following sequence of events: + 1. echo 1 > /sys/class/net/eth2/device/sriov_numvfs + - mlx5_core_sriov_configure -...-> esw_create_legacy_table -> + atomic64_set(&esw->user_count, 0) + 2. tc qdisc add dev eth2 ingress && \ + tc filter replace dev eth2 pref 1 protocol ip chain 0 ingress \ + handle 1 flower action ct nat zone 64000 pipe + - mlx5e_configure_flower -> mlx5_esw_get -> + atomic64_inc(&esw->user_count) + 3. echo 0 > /sys/class/net/eth2/device/sriov_numvfs + - mlx5_core_sriov_configure -..-> esw_destroy_legacy_fdb_table + -> atomic64_set(&esw->user_count, 0) + 4. devlink dev eswitch set pci/0000:08:00.0 mode switchdev + - mlx5_devlink_eswitch_mode_set -> mlx5_esw_try_lock -> + atomic64_read(&esw->user_count) == 0 + - then proceed to a WARN_ON in: + esw_offloads_start -> mlx5_eswitch_enable_locke -> esw_offloads_enable + -> mlx5_esw_offloads_rep_load -> mlx5e_vport_rep_load -> + mlx5e_netdev_change_profile -> mlx5e_detach_netdev -> + mlx5e_cleanup_nic_rx -> mlx5e_tc_nic_cleanup -> + mlx5e_mod_hdr_tbl_destroy + + Fix this by not clearing out the user_count when destroying FDB tables, + so that the check in mlx5_esw_try_lock can prevent the mode change when + there are TC rules configured, as originally intended. + + Fixes: 2318b8bb94a3 ("net/mlx5: E-switch, Destroy legacy fdb table when needed") + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761510019-938772-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c +index 76382626ad41..929adeb50a98 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c +@@ -66,7 +66,6 @@ static void esw_destroy_legacy_fdb_table(struct mlx5_eswitch *esw) + esw->fdb_table.legacy.addr_grp = NULL; + esw->fdb_table.legacy.allmulti_grp = NULL; + esw->fdb_table.legacy.promisc_grp = NULL; +- atomic64_set(&esw->user_count, 0); + } + + static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 34749814f19b..44a142a041b2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -1978,7 +1978,6 @@ static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw) + /* Holds true only as long as DMFS is the default */ + mlx5_flow_namespace_set_mode(esw->fdb_table.offloads.ns, + MLX5_FLOW_STEERING_MODE_DMFS); +- atomic64_set(&esw->user_count, 0); + } + + static int esw_get_nr_ft_offloads_steering_src_ports(struct mlx5_eswitch *esw) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1582-net-mlx5e-fix-return-value-in-case-of-module-eeprom-read-err.patch b/SOURCES/1582-net-mlx5e-fix-return-value-in-case-of-module-eeprom-read-err.patch new file mode 100644 index 000000000..4bf5d762e --- /dev/null +++ b/SOURCES/1582-net-mlx5e-fix-return-value-in-case-of-module-eeprom-read-err.patch @@ -0,0 +1,77 @@ +From a073cd50ef8f5289370d350c9cd61907e6d330c5 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:28:57 -0400 +Subject: [PATCH] net/mlx5e: Fix return value in case of module EEPROM read + error + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d1c94bc5b90c21b65469d30d4a6bc8ed715c1bfe +Author: Gal Pressman +Date: Tue Nov 4 16:15:36 2025 +0200 + + net/mlx5e: Fix return value in case of module EEPROM read error + + mlx5e_get_module_eeprom_by_page() has weird error handling. + + First, it is treating -EINVAL as a special case, but it is unclear why. + + Second, it tries to fail "gracefully" by returning the number of bytes + read even in case of an error. This results in wrongly returning + success (0 return value) if the error occurs before any bytes were + read. + + Simplify the error handling by returning an error when such occurs. This + also aligns with the error handling we have in mlx5e_get_module_eeprom() + for the old API. + + This fixes the following case where the query fails, but userspace + ethtool wrongly treats it as success and dumps an output: + + # ethtool -m eth2 + netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5 + netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5 + Offset Values + ------ ------ + 0x0000: 00 00 00 00 05 00 04 00 00 00 00 00 05 00 05 00 + 0x0010: 00 00 00 00 05 00 06 00 50 00 00 00 67 65 20 66 + 0x0020: 61 69 6c 65 64 2c 20 72 65 61 64 20 30 20 62 79 + 0x0030: 74 65 73 2c 20 65 72 72 20 2d 35 00 14 00 03 00 + 0x0040: 08 00 01 00 03 00 00 00 08 00 02 00 1a 00 00 00 + 0x0050: 14 00 04 00 08 00 01 00 04 00 00 00 08 00 02 00 + 0x0060: 0e 00 00 00 14 00 05 00 08 00 01 00 05 00 00 00 + 0x0070: 08 00 02 00 1a 00 00 00 14 00 06 00 08 00 01 00 + + Fixes: e109d2b204da ("net/mlx5: Implement get_module_eeprom_by_page()") + Signed-off-by: Gal Pressman + Reviewed-by: Alex Lazar + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1762265736-1028868-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index 64f315089b04..e7c9f22ac1fc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -2124,14 +2124,12 @@ static int mlx5e_get_module_eeprom_by_page(struct net_device *netdev, + if (!size_read) + return i; + +- if (size_read == -EINVAL) +- return -EINVAL; + if (size_read < 0) { + NL_SET_ERR_MSG_FMT_MOD( + extack, + "Query module eeprom by page failed, read %u bytes, err %d", + i, size_read); +- return i; ++ return size_read; + } + + i += size_read; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1583-net-mlx5e-fix-missing-error-assignment-in-mlx5e-xfrm-add-sta.patch b/SOURCES/1583-net-mlx5e-fix-missing-error-assignment-in-mlx5e-xfrm-add-sta.patch new file mode 100644 index 000000000..37a17fecb --- /dev/null +++ b/SOURCES/1583-net-mlx5e-fix-missing-error-assignment-in-mlx5e-xfrm-add-sta.patch @@ -0,0 +1,47 @@ +From 925dc73091f0cfe33befd0576b2855ddefa9d335 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:41 -0400 +Subject: [PATCH] net/mlx5e: Fix missing error assignment in + mlx5e_xfrm_add_state() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 0bcd5b3b50cc1fcbf775479322cc37c15d35a489 +Author: Carolina Jubran +Date: Sun Nov 9 11:37:49 2025 +0200 + + net/mlx5e: Fix missing error assignment in mlx5e_xfrm_add_state() + + Assign the return value of mlx5_eswitch_block_mode() to 'err' before + checking it to avoid returning an uninitialized error code. + + Fixes: 22239eb258bc ("net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed") + Reported-by: kernel test robot + Reported-by: Dan Carpenter + Closes: https://lore.kernel.org/r/202510271649.uwsIxD6O-lkp@intel.com/ + Closes: http://lore.kernel.org/linux-rdma/aPIEK4rLB586FdDt@stanley.mountain/ + Signed-off-by: Carolina Jubran + Reviewed-by: Jianbo Liu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1762681073-1084058-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +index 0a4fb8c92268..35d9530037a6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +@@ -804,7 +804,8 @@ static int mlx5e_xfrm_add_state(struct net_device *dev, + goto err_xfrm; + } + +- if (mlx5_eswitch_block_mode(priv->mdev)) ++ err = mlx5_eswitch_block_mode(priv->mdev); ++ if (err) + goto unblock_ipsec; + + if (x->props.mode == XFRM_MODE_TUNNEL && +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1584-net-mlx5e-trim-the-length-of-the-num-doorbell-error.patch b/SOURCES/1584-net-mlx5e-trim-the-length-of-the-num-doorbell-error.patch new file mode 100644 index 000000000..3f43e4681 --- /dev/null +++ b/SOURCES/1584-net-mlx5e-trim-the-length-of-the-num-doorbell-error.patch @@ -0,0 +1,45 @@ +From 8ced9f3ff009ffbdb00178a66be008cdddf28a96 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:41 -0400 +Subject: [PATCH] net/mlx5e: Trim the length of the num_doorbell error + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2dc768c05217e667f987907a3404926e7ba89ff3 +Author: Cosmin Ratiu +Date: Sun Nov 9 11:37:50 2025 +0200 + + net/mlx5e: Trim the length of the num_doorbell error + + When trying to set num_doorbells to a value greater than the max number + of channels, the error message was going over the netlink limit of 80 + chars, truncating the most important part of the message, the number of + channels. + + Fix that by trimming the length a bit. + + Fixes: 11bbcfb7668c ("net/mlx5e: Use the 'num_doorbells' devlink param") + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1762681073-1084058-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index e8ce011f2464..c204c707b850 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -541,7 +541,7 @@ static int mlx5_devlink_num_doorbells_validate(struct devlink *devlink, u32 id, + max_num_channels = mlx5e_get_max_num_channels(mdev); + if (val32 > max_num_channels) { + NL_SET_ERR_MSG_FMT_MOD(extack, +- "Requested num_doorbells (%u) exceeds maximum number of channels (%u)", ++ "Requested num_doorbells (%u) exceeds max number of channels (%u)", + val32, max_num_channels); + return -EINVAL; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1585-net-mlx5e-fix-maxrate-wraparound-in-threshold-between-units.patch b/SOURCES/1585-net-mlx5e-fix-maxrate-wraparound-in-threshold-between-units.patch new file mode 100644 index 000000000..4a4184880 --- /dev/null +++ b/SOURCES/1585-net-mlx5e-fix-maxrate-wraparound-in-threshold-between-units.patch @@ -0,0 +1,60 @@ +From 717d4f3fa629f184f2a41c67ab587bb4a2f241e0 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:41 -0400 +Subject: [PATCH] net/mlx5e: Fix maxrate wraparound in threshold between units + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a7bf4d5063c7837096aab2853224eb23628514d9 +Author: Gal Pressman +Date: Sun Nov 9 11:37:51 2025 +0200 + + net/mlx5e: Fix maxrate wraparound in threshold between units + + The previous calculation used roundup() which caused an overflow for + rates between 25.5Gbps and 26Gbps. + For example, a rate of 25.6Gbps would result in using 100Mbps units with + value of 256, which would overflow the 8 bits field. + + Simplify the upper_limit_mbps calculation by removing the + unnecessary roundup, and adjust the comparison to use <= to correctly + handle the boundary condition. + + Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") + Signed-off-by: Gal Pressman + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1762681073-1084058-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index b08328fe1aa3..99ee288ed43a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -595,18 +595,19 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + struct mlx5_core_dev *mdev = priv->mdev; + u8 max_bw_value[IEEE_8021QAZ_MAX_TCS]; + u8 max_bw_unit[IEEE_8021QAZ_MAX_TCS]; +- __u64 upper_limit_mbps = roundup(255 * MLX5E_100MB, MLX5E_1GB); ++ __u64 upper_limit_mbps; + int i; + + memset(max_bw_value, 0, sizeof(max_bw_value)); + memset(max_bw_unit, 0, sizeof(max_bw_unit)); ++ upper_limit_mbps = 255 * MLX5E_100MB; + + for (i = 0; i <= mlx5_max_tc(mdev); i++) { + if (!maxrate->tc_maxrate[i]) { + max_bw_unit[i] = MLX5_BW_NO_LIMIT; + continue; + } +- if (maxrate->tc_maxrate[i] < upper_limit_mbps) { ++ if (maxrate->tc_maxrate[i] <= upper_limit_mbps) { + max_bw_value[i] = div_u64(maxrate->tc_maxrate[i], + MLX5E_100MB); + max_bw_value[i] = max_bw_value[i] ? max_bw_value[i] : 1; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1586-net-mlx5e-fix-wraparound-in-rate-limiting-for-values-above-2.patch b/SOURCES/1586-net-mlx5e-fix-wraparound-in-rate-limiting-for-values-above-2.patch new file mode 100644 index 000000000..49fbba447 --- /dev/null +++ b/SOURCES/1586-net-mlx5e-fix-wraparound-in-rate-limiting-for-values-above-2.patch @@ -0,0 +1,65 @@ +From 30e8fa6185cde17f7dff2fbfa16760aa3ea8df3a Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:41 -0400 +Subject: [PATCH] net/mlx5e: Fix wraparound in rate limiting for values above + 255 Gbps + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 43b27d1bd88a4bce34ec2437d103acfae9655f9e +Author: Gal Pressman +Date: Sun Nov 9 11:37:52 2025 +0200 + + net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps + + Add validation to reject rates exceeding 255 Gbps that would overflow + the 8 bits max bandwidth field. + + Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") + Signed-off-by: Gal Pressman + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1762681073-1084058-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index 99ee288ed43a..154f8d9eec02 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -596,11 +596,13 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + u8 max_bw_value[IEEE_8021QAZ_MAX_TCS]; + u8 max_bw_unit[IEEE_8021QAZ_MAX_TCS]; + __u64 upper_limit_mbps; ++ __u64 upper_limit_gbps; + int i; + + memset(max_bw_value, 0, sizeof(max_bw_value)); + memset(max_bw_unit, 0, sizeof(max_bw_unit)); + upper_limit_mbps = 255 * MLX5E_100MB; ++ upper_limit_gbps = 255 * MLX5E_1GB; + + for (i = 0; i <= mlx5_max_tc(mdev); i++) { + if (!maxrate->tc_maxrate[i]) { +@@ -612,10 +614,16 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + MLX5E_100MB); + max_bw_value[i] = max_bw_value[i] ? max_bw_value[i] : 1; + max_bw_unit[i] = MLX5_100_MBPS_UNIT; +- } else { ++ } else if (max_bw_value[i] <= upper_limit_gbps) { + max_bw_value[i] = div_u64(maxrate->tc_maxrate[i], + MLX5E_1GB); + max_bw_unit[i] = MLX5_GBPS_UNIT; ++ } else { ++ netdev_err(netdev, ++ "tc_%d maxrate %llu Kbps exceeds limit %llu\n", ++ i, maxrate->tc_maxrate[i], ++ upper_limit_gbps); ++ return -EINVAL; + } + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1587-net-mlx5e-fix-potentially-misleading-debug-message.patch b/SOURCES/1587-net-mlx5e-fix-potentially-misleading-debug-message.patch new file mode 100644 index 000000000..33eb5415d --- /dev/null +++ b/SOURCES/1587-net-mlx5e-fix-potentially-misleading-debug-message.patch @@ -0,0 +1,64 @@ +From 59df4ad4169a55ed4d0febf33d6fc026517e9c7e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:41 -0400 +Subject: [PATCH] net/mlx5e: Fix potentially misleading debug message + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9fcc2b6c10523f7e75db6387946c86fcf19dc97e +Author: Gal Pressman +Date: Sun Nov 9 11:37:53 2025 +0200 + + net/mlx5e: Fix potentially misleading debug message + + Change the debug message to print the correct units instead of always + assuming Gbps, as the value can be in either 100 Mbps or 1 Gbps units. + + Fixes: 5da8bc3effb6 ("net/mlx5e: DCBNL, Add debug messages log") + Signed-off-by: Gal Pressman + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1762681073-1084058-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index 154f8d9eec02..2ca32fb1961e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -598,6 +598,19 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + __u64 upper_limit_mbps; + __u64 upper_limit_gbps; + int i; ++ struct { ++ int scale; ++ const char *units_str; ++ } units[] = { ++ [MLX5_100_MBPS_UNIT] = { ++ .scale = 100, ++ .units_str = "Mbps", ++ }, ++ [MLX5_GBPS_UNIT] = { ++ .scale = 1, ++ .units_str = "Gbps", ++ }, ++ }; + + memset(max_bw_value, 0, sizeof(max_bw_value)); + memset(max_bw_unit, 0, sizeof(max_bw_unit)); +@@ -628,8 +641,9 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + } + + for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) { +- netdev_dbg(netdev, "%s: tc_%d <=> max_bw %d Gbps\n", +- __func__, i, max_bw_value[i]); ++ netdev_dbg(netdev, "%s: tc_%d <=> max_bw %u %s\n", __func__, i, ++ max_bw_value[i] * units[max_bw_unit[i]].scale, ++ units[max_bw_unit[i]].units_str); + } + + return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1588-mlx5-fix-default-values-in-create-cq.patch b/SOURCES/1588-mlx5-fix-default-values-in-create-cq.patch new file mode 100644 index 000000000..138fa7f86 --- /dev/null +++ b/SOURCES/1588-mlx5-fix-default-values-in-create-cq.patch @@ -0,0 +1,298 @@ +From 7b0831b2d1ad91773665acd7e75272bd0e67e27c Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:41 -0400 +Subject: [PATCH] mlx5: Fix default values in create CQ + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e5eba42f01340f73888dfe560be2806057c25913 +Author: Akiva Goldberger +Date: Sun Nov 9 11:49:03 2025 +0200 + + mlx5: Fix default values in create CQ + + Currently, CQs without a completion function are assigned the + mlx5_add_cq_to_tasklet function by default. This is problematic since + only user CQs created through the mlx5_ib driver are intended to use + this function. + + Additionally, all CQs that will use doorbells instead of polling for + completions must call mlx5_cq_arm. However, the default CQ creation flow + leaves a valid value in the CQ's arm_db field, allowing FW to send + interrupts to polling-only CQs in certain corner cases. + + These two factors would allow a polling-only kernel CQ to be triggered + by an EQ interrupt and call a completion function intended only for user + CQs, causing a null pointer exception. + + Some areas in the driver have prevented this issue with one-off fixes + but did not address the root cause. + + This patch fixes the described issue by adding defaults to the create CQ + flow. It adds a default dummy completion function to protect against + null pointer exceptions, and it sets an invalid command sequence number + by default in kernel CQs to prevent the FW from sending an interrupt to + the CQ until it is armed. User CQs are responsible for their own + initialization values. + + Callers of mlx5_core_create_cq are responsible for changing the + completion function and arming the CQ per their needs. + + Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO") + Signed-off-by: Akiva Goldberger + Reviewed-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Acked-by: Leon Romanovsky + Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c +index f23eb22e98ff..aa11a4b5a264 100644 +--- a/drivers/infiniband/hw/mlx5/cq.c ++++ b/drivers/infiniband/hw/mlx5/cq.c +@@ -1017,15 +1017,18 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, + if (cq->create_flags & IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN) + MLX5_SET(cqc, cqc, oi, 1); + ++ if (udata) { ++ cq->mcq.comp = mlx5_add_cq_to_tasklet; ++ cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp; ++ } else { ++ cq->mcq.comp = mlx5_ib_cq_comp; ++ } ++ + err = mlx5_core_create_cq(dev->mdev, &cq->mcq, cqb, inlen, out, sizeof(out)); + if (err) + goto err_cqb; + + mlx5_ib_dbg(dev, "cqn 0x%x\n", cq->mcq.cqn); +- if (udata) +- cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp; +- else +- cq->mcq.comp = mlx5_ib_cq_comp; + cq->mcq.event = mlx5_ib_cq_event; + + INIT_LIST_HEAD(&cq->wc_list); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c +index e9f319a9bdd6..60f7ab1d72e7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c +@@ -66,8 +66,8 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t) + tasklet_schedule(&ctx->task); + } + +-static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq, +- struct mlx5_eqe *eqe) ++void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq, ++ struct mlx5_eqe *eqe) + { + unsigned long flags; + struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv; +@@ -95,7 +95,15 @@ static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq, + if (schedule_tasklet) + tasklet_schedule(&tasklet_ctx->task); + } ++EXPORT_SYMBOL(mlx5_add_cq_to_tasklet); + ++static void mlx5_core_cq_dummy_cb(struct mlx5_core_cq *cq, struct mlx5_eqe *eqe) ++{ ++ mlx5_core_err(cq->eq->core.dev, ++ "CQ default completion callback, CQ #%u\n", cq->cqn); ++} ++ ++#define MLX5_CQ_INIT_CMD_SN cpu_to_be32(2 << 28) + /* Callers must verify outbox status in case of err */ + int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, + u32 *in, int inlen, u32 *out, int outlen) +@@ -121,10 +129,19 @@ int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, + cq->arm_sn = 0; + cq->eq = eq; + cq->uid = MLX5_GET(create_cq_in, in, uid); ++ ++ /* Kernel CQs must set the arm_db address prior to calling ++ * this function, allowing for the proper value to be ++ * initialized. User CQs are responsible for their own ++ * initialization since they do not use the arm_db field. ++ */ ++ if (cq->arm_db) ++ *cq->arm_db = MLX5_CQ_INIT_CMD_SN; ++ + refcount_set(&cq->refcount, 1); + init_completion(&cq->free); + if (!cq->comp) +- cq->comp = mlx5_add_cq_to_tasklet; ++ cq->comp = mlx5_core_cq_dummy_cb; + /* assuming CQ will be deleted before the EQ */ + cq->tasklet_ctx.priv = &eq->tasklet_ctx; + INIT_LIST_HEAD(&cq->tasklet_ctx.list); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 31a6bfe1ce11..ef655b8abc96 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -2191,7 +2191,6 @@ static int mlx5e_alloc_cq_common(struct mlx5_core_dev *mdev, + mcq->set_ci_db = cq->wq_ctrl.db.db; + mcq->arm_db = cq->wq_ctrl.db.db + 1; + *mcq->set_ci_db = 0; +- *mcq->arm_db = 0; + mcq->vector = param->eq_ix; + mcq->comp = mlx5e_completion_event; + mcq->event = mlx5e_cq_error_event; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c +index cb1319974f83..ccef64fb40b6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c +@@ -421,6 +421,13 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size) + __be64 *pas; + u32 i; + ++ conn->cq.mcq.cqe_sz = 64; ++ conn->cq.mcq.set_ci_db = conn->cq.wq_ctrl.db.db; ++ conn->cq.mcq.arm_db = conn->cq.wq_ctrl.db.db + 1; ++ *conn->cq.mcq.set_ci_db = 0; ++ conn->cq.mcq.vector = 0; ++ conn->cq.mcq.comp = mlx5_fpga_conn_cq_complete; ++ + cq_size = roundup_pow_of_two(cq_size); + MLX5_SET(cqc, temp_cqc, log_cq_size, ilog2(cq_size)); + +@@ -468,15 +475,7 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size) + if (err) + goto err_cqwq; + +- conn->cq.mcq.cqe_sz = 64; +- conn->cq.mcq.set_ci_db = conn->cq.wq_ctrl.db.db; +- conn->cq.mcq.arm_db = conn->cq.wq_ctrl.db.db + 1; +- *conn->cq.mcq.set_ci_db = 0; +- *conn->cq.mcq.arm_db = 0; +- conn->cq.mcq.vector = 0; +- conn->cq.mcq.comp = mlx5_fpga_conn_cq_complete; + tasklet_setup(&conn->cq.tasklet, mlx5_fpga_conn_cq_tasklet); +- + mlx5_fpga_dbg(fdev, "Created CQ #0x%x\n", conn->cq.mcq.cqn); + + goto out; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +index 24ef7d66fa8a..7510c46e58a5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c +@@ -873,12 +873,6 @@ static int hws_send_ring_open_sq(struct mlx5hws_context *ctx, + return err; + } + +-static void hws_cq_complete(struct mlx5_core_cq *mcq, +- struct mlx5_eqe *eqe) +-{ +- pr_err("CQ completion CQ: #%u\n", mcq->cqn); +-} +- + static int hws_send_ring_alloc_cq(struct mlx5_core_dev *mdev, + int numa_node, + struct mlx5hws_send_engine *queue, +@@ -901,7 +895,6 @@ static int hws_send_ring_alloc_cq(struct mlx5_core_dev *mdev, + mcq->cqe_sz = 64; + mcq->set_ci_db = cq->wq_ctrl.db.db; + mcq->arm_db = cq->wq_ctrl.db.db + 1; +- mcq->comp = hws_cq_complete; + + for (i = 0; i < mlx5_cqwq_get_size(&cq->wq); i++) { + cqe = mlx5_cqwq_get_wqe(&cq->wq, i); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c +index 077a77fde670..d034372fa047 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c +@@ -1049,12 +1049,6 @@ static int dr_prepare_qp_to_rts(struct mlx5dr_domain *dmn) + return 0; + } + +-static void dr_cq_complete(struct mlx5_core_cq *mcq, +- struct mlx5_eqe *eqe) +-{ +- pr_err("CQ completion CQ: #%u\n", mcq->cqn); +-} +- + static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, + struct mlx5_uars_page *uar, + size_t ncqe) +@@ -1089,6 +1083,13 @@ static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, + cqe->op_own = MLX5_CQE_INVALID << 4 | MLX5_CQE_OWNER_MASK; + } + ++ cq->mcq.cqe_sz = 64; ++ cq->mcq.set_ci_db = cq->wq_ctrl.db.db; ++ cq->mcq.arm_db = cq->wq_ctrl.db.db + 1; ++ *cq->mcq.set_ci_db = 0; ++ cq->mcq.vector = 0; ++ cq->mdev = mdev; ++ + inlen = MLX5_ST_SZ_BYTES(create_cq_in) + + sizeof(u64) * cq->wq_ctrl.buf.npages; + in = kvzalloc(inlen, GFP_KERNEL); +@@ -1112,27 +1113,12 @@ static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, + pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas); + mlx5_fill_page_frag_array(&cq->wq_ctrl.buf, pas); + +- cq->mcq.comp = dr_cq_complete; +- + err = mlx5_core_create_cq(mdev, &cq->mcq, in, inlen, out, sizeof(out)); + kvfree(in); + + if (err) + goto err_cqwq; + +- cq->mcq.cqe_sz = 64; +- cq->mcq.set_ci_db = cq->wq_ctrl.db.db; +- cq->mcq.arm_db = cq->wq_ctrl.db.db + 1; +- *cq->mcq.set_ci_db = 0; +- +- /* set no-zero value, in order to avoid the HW to run db-recovery on +- * CQ that used in polling mode. +- */ +- *cq->mcq.arm_db = cpu_to_be32(2 << 28); +- +- cq->mcq.vector = 0; +- cq->mdev = mdev; +- + return cq; + + err_cqwq: +diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c +index 7ea46522f230..3c05407449c5 100644 +--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c ++++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c +@@ -552,6 +552,8 @@ static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent) + vcq->mcq.set_ci_db = vcq->db.db; + vcq->mcq.arm_db = vcq->db.db + 1; + vcq->mcq.cqe_sz = 64; ++ vcq->mcq.comp = mlx5_vdpa_cq_comp; ++ vcq->cqe = num_ent; + + err = cq_frag_buf_alloc(ndev, &vcq->buf, num_ent); + if (err) +@@ -591,10 +593,6 @@ static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent) + if (err) + goto err_vec; + +- vcq->mcq.comp = mlx5_vdpa_cq_comp; +- vcq->cqe = num_ent; +- vcq->mcq.set_ci_db = vcq->db.db; +- vcq->mcq.arm_db = vcq->db.db + 1; + mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index); + kfree(in); + return 0; +diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h +index 7ef2c7c7d803..9d47cdc727ad 100644 +--- a/include/linux/mlx5/cq.h ++++ b/include/linux/mlx5/cq.h +@@ -183,6 +183,7 @@ static inline void mlx5_cq_put(struct mlx5_core_cq *cq) + complete(&cq->free); + } + ++void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq, struct mlx5_eqe *eqe); + int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, + u32 *in, int inlen, u32 *out, int outlen); + int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1589-net-mlx5-clean-up-only-new-irq-glue-on-request-irq-failure.patch b/SOURCES/1589-net-mlx5-clean-up-only-new-irq-glue-on-request-irq-failure.patch new file mode 100644 index 000000000..aa0030ee6 --- /dev/null +++ b/SOURCES/1589-net-mlx5-clean-up-only-new-irq-glue-on-request-irq-failure.patch @@ -0,0 +1,163 @@ +From b851610ef6ad88f1cdcb487ae25be1fe0781f904 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:42 -0400 +Subject: [PATCH] net/mlx5: Clean up only new IRQ glue on request_irq() failure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d47515af6cccd7484d8b0870376858c9848a18ec +Author: Pradyumn Rahar +Date: Mon Nov 17 14:16:08 2025 +0200 + + net/mlx5: Clean up only new IRQ glue on request_irq() failure + + The mlx5_irq_alloc() function can inadvertently free the entire rmap + and end up in a crash[1] when the other threads tries to access this, + when request_irq() fails due to exhausted IRQ vectors. This commit + modifies the cleanup to remove only the specific IRQ mapping that was + just added. + + This prevents removal of other valid mappings and ensures precise + cleanup of the failed IRQ allocation's associated glue object. + + Note: This error is observed when both fwctl and rds configs are enabled. + + [1] + mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1 + mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to + request irq. err = -28 + infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while + trying to test write-combining support + mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1 + mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1 + mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to + request irq. err = -28 + infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while + trying to test write-combining support + mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1 + mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to + request irq. err = -28 + mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to + request irq. err = -28 + general protection fault, probably for non-canonical address + 0xe277a58fde16f291: 0000 [#1] SMP NOPTI + + RIP: 0010:free_irq_cpu_rmap+0x23/0x7d + Call Trace: + + ? show_trace_log_lvl+0x1d6/0x2f9 + ? show_trace_log_lvl+0x1d6/0x2f9 + ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] + ? __die_body.cold+0x8/0xa + ? die_addr+0x39/0x53 + ? exc_general_protection+0x1c4/0x3e9 + ? dev_vprintk_emit+0x5f/0x90 + ? asm_exc_general_protection+0x22/0x27 + ? free_irq_cpu_rmap+0x23/0x7d + mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] + irq_pool_request_vector+0x7d/0x90 [mlx5_core] + mlx5_irq_request+0x2e/0xe0 [mlx5_core] + mlx5_irq_request_vector+0xad/0xf7 [mlx5_core] + comp_irq_request_pci+0x64/0xf0 [mlx5_core] + create_comp_eq+0x71/0x385 [mlx5_core] + ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core] + mlx5_comp_eqn_get+0x72/0x90 [mlx5_core] + ? xas_load+0x8/0x91 + mlx5_comp_irqn_get+0x40/0x90 [mlx5_core] + mlx5e_open_channel+0x7d/0x3c7 [mlx5_core] + mlx5e_open_channels+0xad/0x250 [mlx5_core] + mlx5e_open_locked+0x3e/0x110 [mlx5_core] + mlx5e_open+0x23/0x70 [mlx5_core] + __dev_open+0xf1/0x1a5 + __dev_change_flags+0x1e1/0x249 + dev_change_flags+0x21/0x5c + do_setlink+0x28b/0xcc4 + ? __nla_parse+0x22/0x3d + ? inet6_validate_link_af+0x6b/0x108 + ? cpumask_next+0x1f/0x35 + ? __snmp6_fill_stats64.constprop.0+0x66/0x107 + ? __nla_validate_parse+0x48/0x1e6 + __rtnl_newlink+0x5ff/0xa57 + ? kmem_cache_alloc_trace+0x164/0x2ce + rtnl_newlink+0x44/0x6e + rtnetlink_rcv_msg+0x2bb/0x362 + ? __netlink_sendskb+0x4c/0x6c + ? netlink_unicast+0x28f/0x2ce + ? rtnl_calcit.isra.0+0x150/0x146 + netlink_rcv_skb+0x5f/0x112 + netlink_unicast+0x213/0x2ce + netlink_sendmsg+0x24f/0x4d9 + __sock_sendmsg+0x65/0x6a + ____sys_sendmsg+0x28f/0x2c9 + ? import_iovec+0x17/0x2b + ___sys_sendmsg+0x97/0xe0 + __sys_sendmsg+0x81/0xd8 + do_syscall_64+0x35/0x87 + entry_SYSCALL_64_after_hwframe+0x6e/0x0 + RIP: 0033:0x7fc328603727 + Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed + ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 + f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 + RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e + RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727 + RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d + RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000 + R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 + R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc + + ---[ end trace f43ce73c3c2b13a2 ]--- + RIP: 0010:free_irq_cpu_rmap+0x23/0x7d + Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00 + 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31 + f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff + RSP: 0018:ff384881640eaca0 EFLAGS: 00010282 + RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000 + RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400 + RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000 + R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88 + R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8 + FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000) + knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0 + DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 + DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 + PKRU: 55555554 + Kernel panic - not syncing: Fatal exception + Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range: + 0xffffffff80000000-0xffffffffbfffffff) + kvm-guest: disable async PF for cpu 0 + + Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") + Signed-off-by: Mohith Kumar Thummaluru + Tested-by: Mohith Kumar Thummaluru + Reviewed-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Pradyumn Rahar + Reviewed-by: Jacob Keller + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763381768-1234998-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +index e18a850c615c..aa3b5878e3da 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +@@ -324,10 +324,8 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool *pool, int i, + free_irq(irq->map.virq, &irq->nh); + err_req_irq: + #ifdef CONFIG_RFS_ACCEL +- if (i && rmap && *rmap) { +- free_irq_cpu_rmap(*rmap); +- *rmap = NULL; +- } ++ if (i && rmap && *rmap) ++ irq_cpu_rmap_remove(*rmap, irq->map.virq); + err_irq_rmap: + #endif + if (i && pci_msix_can_alloc_dyn(dev->pdev)) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1590-net-mlx5e-fix-validation-logic-in-rate-limiting.patch b/SOURCES/1590-net-mlx5e-fix-validation-logic-in-rate-limiting.patch new file mode 100644 index 000000000..eb60ddc1b --- /dev/null +++ b/SOURCES/1590-net-mlx5e-fix-validation-logic-in-rate-limiting.patch @@ -0,0 +1,65 @@ +From 8fc5ca59970d1e8eb7a933ce6cb9a0afbecea0dc Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Sun, 19 Apr 2026 18:32:42 -0400 +Subject: [PATCH] net/mlx5e: Fix validation logic in rate limiting + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d2099d9f16dbfa1c5266d4230ff7860047bb0b68 +Author: Danielle Costantino +Date: Mon Nov 24 10:00:43 2025 -0800 + + net/mlx5e: Fix validation logic in rate limiting + + The rate limiting validation condition currently checks the output + variable max_bw_value[i] instead of the input value + maxrate->tc_maxrate[i]. This causes the validation to compare an + uninitialized or stale value rather than the actual requested rate. + + The condition should check the input rate to properly validate against + the upper limit: + + } else if (maxrate->tc_maxrate[i] <= upper_limit_gbps) { + + This aligns with the pattern used in the first branch, which correctly + checks maxrate->tc_maxrate[i] against upper_limit_mbps. + + The current implementation can lead to unreliable validation behavior: + + - For rates between 25.5 Gbps and 255 Gbps, if max_bw_value[i] is 0 + from initialization, the GBPS path may be taken regardless of whether + the actual rate is within bounds + + - When processing multiple TCs (i > 0), max_bw_value[i] contains the + value computed for the previous TC, affecting the validation logic + + - The overflow check for rates exceeding 255 Gbps may not trigger + consistently depending on previous array values + + This patch ensures the validation correctly examines the requested rate + value for proper bounds checking. + + Fixes: 43b27d1bd88a ("net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps") + Signed-off-by: Danielle Costantino + Reviewed-by: Gal Pressman + Link: https://patch.msgid.link/20251124180043.2314428-1-dcostantino@meta.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index 2ca32fb1961e..84e700777941 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -627,7 +627,7 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + MLX5E_100MB); + max_bw_value[i] = max_bw_value[i] ? max_bw_value[i] : 1; + max_bw_unit[i] = MLX5_100_MBPS_UNIT; +- } else if (max_bw_value[i] <= upper_limit_gbps) { ++ } else if (maxrate->tc_maxrate[i] <= upper_limit_gbps) { + max_bw_value[i] = div_u64(maxrate->tc_maxrate[i], + MLX5E_1GB); + max_bw_unit[i] = MLX5_GBPS_UNIT; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1591-rdma-mlx5-enable-data-direct-with-relaxed-ordering.patch b/SOURCES/1591-rdma-mlx5-enable-data-direct-with-relaxed-ordering.patch new file mode 100644 index 000000000..af60de7c6 --- /dev/null +++ b/SOURCES/1591-rdma-mlx5-enable-data-direct-with-relaxed-ordering.patch @@ -0,0 +1,141 @@ +From 29e117eff6eab97cde39f2c88548de7ab7371d22 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:51:52 -0400 +Subject: [PATCH] RDMA/mlx5: Enable Data-Direct with Relaxed Ordering + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d9e6e85b7beb2aeb8defac2f705b23532ddb25d4 +Author: Yishai Hadas +Date: Wed Aug 13 15:36:01 2025 +0300 + + RDMA/mlx5: Enable Data-Direct with Relaxed Ordering + + Relaxed Ordering can improve performance in certain scenarios. + + Enable it in the Data-Direct use case as well. + + Link: https://patch.msgid.link/r/1221dcdda8061ba5f6bc3519044083c7438b257e.1755088503.git.leon@kernel.org + Signed-off-by: Yishai Hadas + Reviewed-by: Gal Shalom + Signed-off-by: Leon Romanovsky + Signed-off-by: Jason Gunthorpe + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index 08e4e6d85f7b..63f3ba258b18 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -3118,6 +3118,7 @@ mlx5_ib_create_data_direct_resources(struct mlx5_ib_dev *dev) + { + int inlen = MLX5_ST_SZ_BYTES(create_mkey_in); + struct mlx5_core_dev *mdev = dev->mdev; ++ bool ro_supp = false; + void *mkc; + u32 mkey; + u32 pdn; +@@ -3146,14 +3147,37 @@ mlx5_ib_create_data_direct_resources(struct mlx5_ib_dev *dev) + MLX5_SET(mkc, mkc, length64, 1); + MLX5_SET(mkc, mkc, qpn, 0xffffff); + err = mlx5_core_create_mkey(mdev, &mkey, in, inlen); +- kvfree(in); + if (err) +- goto err; ++ goto err_mkey; + + dev->ddr.mkey = mkey; + dev->ddr.pdn = pdn; ++ ++ /* create another mkey with RO support */ ++ if (MLX5_CAP_GEN(dev->mdev, relaxed_ordering_write)) { ++ MLX5_SET(mkc, mkc, relaxed_ordering_write, 1); ++ ro_supp = true; ++ } ++ ++ if (MLX5_CAP_GEN(dev->mdev, relaxed_ordering_read)) { ++ MLX5_SET(mkc, mkc, relaxed_ordering_read, 1); ++ ro_supp = true; ++ } ++ ++ if (ro_supp) { ++ err = mlx5_core_create_mkey(mdev, &mkey, in, inlen); ++ /* RO is defined as best effort */ ++ if (!err) { ++ dev->ddr.mkey_ro = mkey; ++ dev->ddr.mkey_ro_valid = true; ++ } ++ } ++ ++ kvfree(in); + return 0; + ++err_mkey: ++ kvfree(in); + err: + mlx5_core_dealloc_pd(mdev, pdn); + return err; +@@ -3162,6 +3186,10 @@ mlx5_ib_create_data_direct_resources(struct mlx5_ib_dev *dev) + static void + mlx5_ib_free_data_direct_resources(struct mlx5_ib_dev *dev) + { ++ ++ if (dev->ddr.mkey_ro_valid) ++ mlx5_core_destroy_mkey(dev->mdev, dev->ddr.mkey_ro); ++ + mlx5_core_destroy_mkey(dev->mdev, dev->ddr.mkey); + mlx5_core_dealloc_pd(dev->mdev, dev->ddr.pdn); + } +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 1b646761d5d5..6ffa394c2e6d 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -850,6 +850,8 @@ struct mlx5_ib_port_resources { + struct mlx5_data_direct_resources { + u32 pdn; + u32 mkey; ++ u32 mkey_ro; ++ u8 mkey_ro_valid :1; + }; + + struct mlx5_ib_resources { +diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c +index 1317f2cb38a4..d3c82ffa300e 100644 +--- a/drivers/infiniband/hw/mlx5/mr.c ++++ b/drivers/infiniband/hw/mlx5/mr.c +@@ -1717,11 +1717,11 @@ reg_user_mr_dmabuf_by_data_direct(struct ib_pd *pd, u64 offset, + goto end; + } + +- /* The device's 'data direct mkey' was created without RO flags to +- * simplify things and allow for a single mkey per device. +- * Since RO is not a must, mask it out accordingly. ++ /* If no device's 'data direct mkey' with RO flags exists ++ * mask it out accordingly. + */ +- access_flags &= ~IB_ACCESS_RELAXED_ORDERING; ++ if (!dev->ddr.mkey_ro_valid) ++ access_flags &= ~IB_ACCESS_RELAXED_ORDERING; + crossed_mr = reg_user_mr_dmabuf(pd, &data_direct_dev->pdev->dev, + offset, length, virt_addr, fd, + access_flags, MLX5_MKC_ACCESS_MODE_KSM, +diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c +index 7ef35cddce81..4e562e0dd9e1 100644 +--- a/drivers/infiniband/hw/mlx5/umr.c ++++ b/drivers/infiniband/hw/mlx5/umr.c +@@ -761,7 +761,11 @@ _mlx5r_umr_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags, bool dd, + + if (dd) { + cur_ksm->va = cpu_to_be64(rdma_block_iter_dma_address(&biter)); +- cur_ksm->key = cpu_to_be32(dev->ddr.mkey); ++ if (mr->access_flags & IB_ACCESS_RELAXED_ORDERING && ++ dev->ddr.mkey_ro_valid) ++ cur_ksm->key = cpu_to_be32(dev->ddr.mkey_ro); ++ else ++ cur_ksm->key = cpu_to_be32(dev->ddr.mkey); + if (mr->umem->is_dmabuf && + (flags & MLX5_IB_UPD_XLT_ZAP)) { + cur_ksm->va = 0; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1592-rdma-mlx5-better-estimate-max-qp-wr-to-reflect-wqe-count.patch b/SOURCES/1592-rdma-mlx5-better-estimate-max-qp-wr-to-reflect-wqe-count.patch new file mode 100644 index 000000000..adc4c98fe --- /dev/null +++ b/SOURCES/1592-rdma-mlx5-better-estimate-max-qp-wr-to-reflect-wqe-count.patch @@ -0,0 +1,117 @@ +From 27bb9e5c42866bd69208589cc7b070c2abfd980f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:51:52 -0400 +Subject: [PATCH] RDMA/mlx5: Better estimate max_qp_wr to reflect WQE count +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1a7c18c485bf17ef408d5ebb7f83e1f8ef329585 +Author: Or Har-Toov +Date: Wed Aug 13 15:39:56 2025 +0300 + + RDMA/mlx5: Better estimate max_qp_wr to reflect WQE count + + The mlx5 driver currently derives max_qp_wr directly from the + log_max_qp_sz HCA capability: + + props->max_qp_wr = 1 << MLX5_CAP_GEN(mdev, log_max_qp_sz); + + However, this value represents the number of WQEs in units of Basic + Blocks (see MLX5_SEND_WQE_BB), not actual number of WQEs. Since the size + of a WQE can vary depending on transport type and features (e.g., atomic + operations, UMR, LSO), the actual number of WQEs can be significantly + smaller than the WQEBB count suggests. + + This patch introduces a conservative estimation of the worst-case WQE size + — considering largest segments possible with 1 SGE and no inline data or + special features. It uses this to derive a more accurate max_qp_wr value. + + Fixes: 938fe83c8dcb ("net/mlx5_core: New device capabilities handling") + Link: https://patch.msgid.link/r/7d992c9831c997ed5c33d30973406dc2dcaf5e89.1755088725.git.leon@kernel.org + Reported-by: Chuck Lever + Closes: https://lore.kernel.org/all/20250506142202.GJ2260621@ziepe.ca/ + Signed-off-by: Or Har-Toov + Signed-off-by: Leon Romanovsky + Signed-off-by: Jason Gunthorpe + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index 63f3ba258b18..671f03876f18 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -13,6 +13,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -883,6 +884,51 @@ static void fill_esw_mgr_reg_c0(struct mlx5_core_dev *mdev, + resp->reg_c0.mask = mlx5_eswitch_get_vport_metadata_mask(); + } + ++/* ++ * Calculate maximum SQ overhead across all QP types. ++ * Other QP types (REG_UMR, UC, RC, UD/SMI/GSI, XRC_TGT) ++ * have smaller overhead than the types calculated below, ++ * so they are implicitly included. ++ */ ++static u32 mlx5_ib_calc_max_sq_overhead(void) ++{ ++ u32 max_overhead_xrc, overhead_ud_lso, a, b; ++ ++ /* XRC_INI */ ++ max_overhead_xrc = sizeof(struct mlx5_wqe_xrc_seg); ++ max_overhead_xrc += sizeof(struct mlx5_wqe_ctrl_seg); ++ a = sizeof(struct mlx5_wqe_atomic_seg) + ++ sizeof(struct mlx5_wqe_raddr_seg); ++ b = sizeof(struct mlx5_wqe_umr_ctrl_seg) + ++ sizeof(struct mlx5_mkey_seg) + ++ MLX5_IB_SQ_UMR_INLINE_THRESHOLD / MLX5_IB_UMR_OCTOWORD; ++ max_overhead_xrc += max(a, b); ++ ++ /* UD with LSO */ ++ overhead_ud_lso = sizeof(struct mlx5_wqe_ctrl_seg); ++ overhead_ud_lso += sizeof(struct mlx5_wqe_eth_pad); ++ overhead_ud_lso += sizeof(struct mlx5_wqe_eth_seg); ++ overhead_ud_lso += sizeof(struct mlx5_wqe_datagram_seg); ++ ++ return max(max_overhead_xrc, overhead_ud_lso); ++} ++ ++static u32 mlx5_ib_calc_max_qp_wr(struct mlx5_ib_dev *dev) ++{ ++ struct mlx5_core_dev *mdev = dev->mdev; ++ u32 max_wqe_bb_units = 1 << MLX5_CAP_GEN(mdev, log_max_qp_sz); ++ u32 max_wqe_size; ++ /* max QP overhead + 1 SGE, no inline, no special features */ ++ max_wqe_size = mlx5_ib_calc_max_sq_overhead() + ++ sizeof(struct mlx5_wqe_data_seg); ++ ++ max_wqe_size = roundup_pow_of_two(max_wqe_size); ++ ++ max_wqe_size = ALIGN(max_wqe_size, MLX5_SEND_WQE_BB); ++ ++ return (max_wqe_bb_units * MLX5_SEND_WQE_BB) / max_wqe_size; ++} ++ + static int mlx5_ib_query_device(struct ib_device *ibdev, + struct ib_device_attr *props, + struct ib_udata *uhw) +@@ -1041,7 +1087,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, + props->max_mr_size = ~0ull; + props->page_size_cap = ~(min_page_size - 1); + props->max_qp = 1 << MLX5_CAP_GEN(mdev, log_max_qp); +- props->max_qp_wr = 1 << MLX5_CAP_GEN(mdev, log_max_qp_sz); ++ props->max_qp_wr = mlx5_ib_calc_max_qp_wr(dev); + max_rq_sg = MLX5_CAP_GEN(mdev, max_wqe_sz_rq) / + sizeof(struct mlx5_wqe_data_seg); + max_sq_desc = min_t(int, MLX5_CAP_GEN(mdev, max_wqe_sz_sq), 512); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1593-rdma-mlx5-fix-vport-loopback-forcing-for-mpv-device.patch b/SOURCES/1593-rdma-mlx5-fix-vport-loopback-forcing-for-mpv-device.patch new file mode 100644 index 000000000..885b3db45 --- /dev/null +++ b/SOURCES/1593-rdma-mlx5-fix-vport-loopback-forcing-for-mpv-device.patch @@ -0,0 +1,118 @@ +From 1a67b8f97f52293f23a8836ed94bb75832c95245 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:51:52 -0400 +Subject: [PATCH] RDMA/mlx5: Fix vport loopback forcing for MPV device + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 08aae7860450c89eebbc6fd4d38416e53c7a33d2 +Author: Patrisious Haddad +Date: Wed Aug 13 15:41:19 2025 +0300 + + RDMA/mlx5: Fix vport loopback forcing for MPV device + + Previously loopback for MPV was supposed to be permanently enabled, + however other driver flows were able to over-ride that configuration and + disable it. + + Add force_lb parameter that indicates that loopback should always be + enabled which prevents all other driver flows from disabling it. + + Fixes: a9a9e68954f2 ("RDMA/mlx5: Fix vport loopback for MPV device") + Link: https://patch.msgid.link/r/cfc6b1f0f99f8100b087483cc14da6025317f901.1755088808.git.leon@kernel.org + Signed-off-by: Patrisious Haddad + Reviewed-by: Mark Bloch + Signed-off-by: Leon Romanovsky + Signed-off-by: Jason Gunthorpe + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index 671f03876f18..e9bcd93517a9 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -1839,7 +1839,8 @@ static void deallocate_uars(struct mlx5_ib_dev *dev, + } + + static int mlx5_ib_enable_lb_mp(struct mlx5_core_dev *master, +- struct mlx5_core_dev *slave) ++ struct mlx5_core_dev *slave, ++ struct mlx5_ib_lb_state *lb_state) + { + int err; + +@@ -1851,6 +1852,7 @@ static int mlx5_ib_enable_lb_mp(struct mlx5_core_dev *master, + if (err) + goto out; + ++ lb_state->force_enable = true; + return 0; + + out: +@@ -1859,16 +1861,22 @@ static int mlx5_ib_enable_lb_mp(struct mlx5_core_dev *master, + } + + static void mlx5_ib_disable_lb_mp(struct mlx5_core_dev *master, +- struct mlx5_core_dev *slave) ++ struct mlx5_core_dev *slave, ++ struct mlx5_ib_lb_state *lb_state) + { + mlx5_nic_vport_update_local_lb(slave, false); + mlx5_nic_vport_update_local_lb(master, false); ++ ++ lb_state->force_enable = false; + } + + int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) + { + int err = 0; + ++ if (dev->lb.force_enable) ++ return 0; ++ + mutex_lock(&dev->lb.mutex); + if (td) + dev->lb.user_td++; +@@ -1890,6 +1898,9 @@ int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) + + void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) + { ++ if (dev->lb.force_enable) ++ return; ++ + mutex_lock(&dev->lb.mutex); + if (td) + dev->lb.user_td--; +@@ -3597,7 +3608,7 @@ static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev, + + lockdep_assert_held(&mlx5_ib_multiport_mutex); + +- mlx5_ib_disable_lb_mp(ibdev->mdev, mpi->mdev); ++ mlx5_ib_disable_lb_mp(ibdev->mdev, mpi->mdev, &ibdev->lb); + + mlx5_core_mp_event_replay(ibdev->mdev, + MLX5_DRIVER_EVENT_AFFILIATION_REMOVED, +@@ -3694,7 +3705,7 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev, + MLX5_DRIVER_EVENT_AFFILIATION_DONE, + &key); + +- err = mlx5_ib_enable_lb_mp(ibdev->mdev, mpi->mdev); ++ err = mlx5_ib_enable_lb_mp(ibdev->mdev, mpi->mdev, &ibdev->lb); + if (err) + goto unbind; + +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 6ffa394c2e6d..4c15e8d4488f 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -1107,6 +1107,7 @@ struct mlx5_ib_lb_state { + u32 user_td; + int qps; + bool enabled; ++ bool force_enable; + }; + + struct mlx5_ib_pf_eq { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1594-rdma-mlx5-fix-page-size-bitmap-calculation-for-ksm-mode.patch b/SOURCES/1594-rdma-mlx5-fix-page-size-bitmap-calculation-for-ksm-mode.patch new file mode 100644 index 000000000..9f8b136ed --- /dev/null +++ b/SOURCES/1594-rdma-mlx5-fix-page-size-bitmap-calculation-for-ksm-mode.patch @@ -0,0 +1,50 @@ +From 383138b02e58ab08c149df8a1d6447113b9a015b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:51:53 -0400 +Subject: [PATCH] RDMA/mlx5: Fix page size bitmap calculation for KSM mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 372fdb5c75b61f038f4abf596abdcf01acbdb7af +Author: Edward Srouji +Date: Sun Aug 24 17:48:39 2025 +0300 + + RDMA/mlx5: Fix page size bitmap calculation for KSM mode + + When using KSM (Key Scatter-gather Memory) access mode, the HW requires + the IOVA to be aligned to the selected page size. + Without this alignment, the HW may not function correctly. + + Currently, mlx5_umem_mkc_find_best_pgsz() does not filter out page sizes + that would result in misaligned IOVAs for KSM mode. This can lead to + selecting page sizes that are incompatible with the given IOVA. + + Fix this by filtering the page size bitmap when in KSM mode, keeping + only page sizes to which the IOVA is aligned to. + + Fixes: fcfb03597b7d ("RDMA/mlx5: Align mkc page size capability check to PRM") + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20250824144839.154717-1-edwards@nvidia.com + Reviewed-by: Michael Guralnik + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h +index 4c15e8d4488f..b20d3e5efd9e 100644 +--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h ++++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h +@@ -1801,6 +1801,10 @@ mlx5_umem_mkc_find_best_pgsz(struct mlx5_ib_dev *dev, struct ib_umem *umem, + + bitmap = GENMASK_ULL(max_log_entity_size_cap, min_log_entity_size_cap); + ++ /* In KSM mode HW requires IOVA and mkey's page size to be aligned */ ++ if (access_mode == MLX5_MKC_ACCESS_MODE_KSM && iova) ++ bitmap &= GENMASK_ULL(__ffs64(iova), 0); ++ + return ib_umem_find_best_pgsz(umem, bitmap, iova); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1595-rdma-use-pe-format-specifier-for-error-pointers.patch b/SOURCES/1595-rdma-use-pe-format-specifier-for-error-pointers.patch new file mode 100644 index 000000000..ef978005d --- /dev/null +++ b/SOURCES/1595-rdma-use-pe-format-specifier-for-error-pointers.patch @@ -0,0 +1,140 @@ +From 959cd78d8f953ff0e1cf6ee8f5b1746cfb9dd8a7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:52:28 -0400 +Subject: [PATCH] RDMA: Use %pe format specifier for error pointers + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Include only the mlx5 hunks. + +commit 4b6b6233f50f72353b54295ba594990b19f33223 +Author: Leon Romanovsky +Date: Thu Sep 18 20:53:41 2025 +0300 + + RDMA: Use %pe format specifier for error pointers + + Convert error logging throughout the RDMA subsystem to use + the %pe format specifier instead of PTR_ERR() with integer + format specifiers. + + Link: https://patch.msgid.link/e81ec02df1e474be20417fb62e779776e3f47a50.1758217936.git.leon@kernel.org + Reviewed-by: Zhu Yanjun + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/data_direct.c b/drivers/infiniband/hw/mlx5/data_direct.c +index b9ba84afaae2..b81ac5709b56 100644 +--- a/drivers/infiniband/hw/mlx5/data_direct.c ++++ b/drivers/infiniband/hw/mlx5/data_direct.c +@@ -35,7 +35,7 @@ static int mlx5_data_direct_vpd_get_vuid(struct mlx5_data_direct_dev *dev) + + vpd_data = pci_vpd_alloc(pdev, &vpd_size); + if (IS_ERR(vpd_data)) { +- pci_err(pdev, "Unable to read VPD, err=%ld\n", PTR_ERR(vpd_data)); ++ pci_err(pdev, "Unable to read VPD, err=%pe\n", vpd_data); + return PTR_ERR(vpd_data); + } + +diff --git a/drivers/infiniband/hw/mlx5/gsi.c b/drivers/infiniband/hw/mlx5/gsi.c +index b804f2dd5628..d5487834ed25 100644 +--- a/drivers/infiniband/hw/mlx5/gsi.c ++++ b/drivers/infiniband/hw/mlx5/gsi.c +@@ -131,8 +131,9 @@ int mlx5_ib_create_gsi(struct ib_pd *pd, struct mlx5_ib_qp *mqp, + gsi->cq = ib_alloc_cq(pd->device, gsi, attr->cap.max_send_wr, 0, + IB_POLL_SOFTIRQ); + if (IS_ERR(gsi->cq)) { +- mlx5_ib_warn(dev, "unable to create send CQ for GSI QP. error %ld\n", +- PTR_ERR(gsi->cq)); ++ mlx5_ib_warn(dev, ++ "unable to create send CQ for GSI QP. error %pe\n", ++ gsi->cq); + ret = PTR_ERR(gsi->cq); + goto err_free_wrs; + } +@@ -147,8 +148,9 @@ int mlx5_ib_create_gsi(struct ib_pd *pd, struct mlx5_ib_qp *mqp, + + gsi->rx_qp = ib_create_qp(pd, &hw_init_attr); + if (IS_ERR(gsi->rx_qp)) { +- mlx5_ib_warn(dev, "unable to create hardware GSI QP. error %ld\n", +- PTR_ERR(gsi->rx_qp)); ++ mlx5_ib_warn(dev, ++ "unable to create hardware GSI QP. error %pe\n", ++ gsi->rx_qp); + ret = PTR_ERR(gsi->rx_qp); + goto err_destroy_cq; + } +@@ -294,8 +296,9 @@ static void setup_qp(struct mlx5_ib_gsi_qp *gsi, u16 qp_index) + + qp = create_gsi_ud_qp(gsi); + if (IS_ERR(qp)) { +- mlx5_ib_warn(dev, "unable to create hardware UD QP for GSI: %ld\n", +- PTR_ERR(qp)); ++ mlx5_ib_warn(dev, ++ "unable to create hardware UD QP for GSI: %pe\n", ++ qp); + return; + } + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index e9bcd93517a9..8f2c6e84127f 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -3051,14 +3051,16 @@ int mlx5_ib_dev_res_cq_init(struct mlx5_ib_dev *dev) + pd = ib_alloc_pd(ibdev, 0); + if (IS_ERR(pd)) { + ret = PTR_ERR(pd); +- mlx5_ib_err(dev, "Couldn't allocate PD for res init, err=%d\n", ret); ++ mlx5_ib_err(dev, "Couldn't allocate PD for res init, err=%pe\n", ++ pd); + goto unlock; + } + + cq = ib_create_cq(ibdev, NULL, NULL, NULL, &cq_attr); + if (IS_ERR(cq)) { + ret = PTR_ERR(cq); +- mlx5_ib_err(dev, "Couldn't create CQ for res init, err=%d\n", ret); ++ mlx5_ib_err(dev, "Couldn't create CQ for res init, err=%pe\n", ++ cq); + ib_dealloc_pd(pd); + goto unlock; + } +@@ -3102,7 +3104,9 @@ int mlx5_ib_dev_res_srq_init(struct mlx5_ib_dev *dev) + s0 = ib_create_srq(devr->p0, &attr); + if (IS_ERR(s0)) { + ret = PTR_ERR(s0); +- mlx5_ib_err(dev, "Couldn't create SRQ 0 for res init, err=%d\n", ret); ++ mlx5_ib_err(dev, ++ "Couldn't create SRQ 0 for res init, err=%pe\n", ++ s0); + goto unlock; + } + +@@ -3114,7 +3118,9 @@ int mlx5_ib_dev_res_srq_init(struct mlx5_ib_dev *dev) + s1 = ib_create_srq(devr->p0, &attr); + if (IS_ERR(s1)) { + ret = PTR_ERR(s1); +- mlx5_ib_err(dev, "Couldn't create SRQ 1 for res init, err=%d\n", ret); ++ mlx5_ib_err(dev, ++ "Couldn't create SRQ 1 for res init, err=%pe\n", ++ s1); + ib_destroy_srq(s0); + } + +diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c +index d3c82ffa300e..325fa04cbe8a 100644 +--- a/drivers/infiniband/hw/mlx5/mr.c ++++ b/drivers/infiniband/hw/mlx5/mr.c +@@ -1652,8 +1652,7 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, + fd, access_flags); + + if (IS_ERR(umem_dmabuf)) { +- mlx5_ib_dbg(dev, "umem_dmabuf get failed (%ld)\n", +- PTR_ERR(umem_dmabuf)); ++ mlx5_ib_dbg(dev, "umem_dmabuf get failed (%pe)\n", umem_dmabuf); + return ERR_CAST(umem_dmabuf); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1596-rdma-net-mlx5-query-vports-mac-address-from-device.patch b/SOURCES/1596-rdma-net-mlx5-query-vports-mac-address-from-device.patch new file mode 100644 index 000000000..1ffa89202 --- /dev/null +++ b/SOURCES/1596-rdma-net-mlx5-query-vports-mac-address-from-device.patch @@ -0,0 +1,223 @@ +From 1411ab84153ec7c4943797b252cea03fbcbcd772 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:33 -0400 +Subject: [PATCH] {rdma,net}/mlx5: Query vports mac address from device + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit eea31f21dce10814e34dc7ef7ed5136269c7bb59 +Author: Adithya Jayachandran +Date: Wed Oct 15 18:40:55 2025 -0700 + + {rdma,net}/mlx5: Query vports mac address from device + + Before this patch during either switchdev or legacy mode enablement we + cleared the mac address of vports between changes. This change allows us + to preserve the vports mac address between eswitch mode changes. + + Vports hold information for VFs/SFs such as the permanent mac address. + VF/SF mac can be set either by iproute vf interface or devlink function + interface. For no obvious reason we reset it to 0 on switchdev/legacy + mode changes, this patch is fixing that, to align with other vport + information that are never reset, e.g GUID,mtu,promisc mode, etc .. + + Signed-off-by: Adithya Jayachandran + Signed-off-by: Saeed Mahameed + Reviewed-by: Mark Bloch + Acked-by: Leon Romanovsky # RDMA + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index 8f2c6e84127f..dc2c5cc47860 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -842,7 +842,7 @@ static int mlx5_query_node_guid(struct mlx5_ib_dev *dev, + break; + + case MLX5_VPORT_ACCESS_METHOD_NIC: +- err = mlx5_query_nic_vport_node_guid(dev->mdev, &tmp); ++ err = mlx5_query_nic_vport_node_guid(dev->mdev, 0, false, &tmp); + break; + + default: +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index e2ffb87b94cb..25af8bd7f077 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -875,13 +875,10 @@ static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) + vport_num, 1, + vport->info.link_state); + +- /* Host PF has its own mac/guid. */ +- if (vport_num) { +- mlx5_modify_nic_vport_mac_address(esw->dev, vport_num, +- vport->info.mac); +- mlx5_modify_nic_vport_node_guid(esw->dev, vport_num, +- vport->info.node_guid); +- } ++ mlx5_query_nic_vport_mac_address(esw->dev, vport_num, true, ++ vport->info.mac); ++ mlx5_query_nic_vport_node_guid(esw->dev, vport_num, true, ++ &vport->info.node_guid); + + flags = (vport->info.vlan || vport->info.qos) ? + SET_VLAN_STRIP | SET_VLAN_INSERT : 0; +@@ -947,12 +944,6 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, + goto err_vhca_mapping; + } + +- /* External controller host PF has factory programmed MAC. +- * Read it from the device. +- */ +- if (mlx5_core_is_ecpf(esw->dev) && vport_num == MLX5_VPORT_PF) +- mlx5_query_nic_vport_mac_address(esw->dev, vport_num, true, vport->info.mac); +- + esw_vport_change_handle_locked(vport); + + esw->enabled_vports++; +@@ -2235,6 +2226,9 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw, + ivi->vf = vport - 1; + + mutex_lock(&esw->state_lock); ++ ++ mlx5_query_nic_vport_mac_address(esw->dev, vport, true, ++ evport->info.mac); + ether_addr_copy(ivi->mac, evport->info.mac); + ivi->linkstate = evport->info.link_state; + ivi->vlan = evport->info.vlan; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 44a142a041b2..05270d5bc6df 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -4302,6 +4302,9 @@ int mlx5_devlink_port_fn_hw_addr_get(struct devlink_port *port, + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); + + mutex_lock(&esw->state_lock); ++ ++ mlx5_query_nic_vport_mac_address(esw->dev, vport->vport, true, ++ vport->info.mac); + ether_addr_copy(hw_addr, vport->info.mac); + *hw_addr_len = ETH_ALEN; + mutex_unlock(&esw->state_lock); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +index 2ed2e530b07d..d1483f66cd0c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +@@ -78,15 +78,14 @@ int mlx5_modify_vport_admin_state(struct mlx5_core_dev *mdev, u8 opmod, + } + + static int mlx5_query_nic_vport_context(struct mlx5_core_dev *mdev, u16 vport, +- u32 *out) ++ bool other_vport, u32 *out) + { + u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {}; + + MLX5_SET(query_nic_vport_context_in, in, opcode, + MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT); + MLX5_SET(query_nic_vport_context_in, in, vport_number, vport); +- if (vport) +- MLX5_SET(query_nic_vport_context_in, in, other_vport, 1); ++ MLX5_SET(query_nic_vport_context_in, in, other_vport, other_vport); + + return mlx5_cmd_exec_inout(mdev, query_nic_vport_context, in, out); + } +@@ -97,7 +96,7 @@ int mlx5_query_nic_vport_min_inline(struct mlx5_core_dev *mdev, + u32 out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {}; + int err; + +- err = mlx5_query_nic_vport_context(mdev, vport, out); ++ err = mlx5_query_nic_vport_context(mdev, vport, vport > 0, out); + if (!err) + *min_inline = MLX5_GET(query_nic_vport_context_out, out, + nic_vport_context.min_wqe_inline_mode); +@@ -219,7 +218,7 @@ int mlx5_query_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 *mtu) + if (!out) + return -ENOMEM; + +- err = mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, 0, false, out); + if (!err) + *mtu = MLX5_GET(query_nic_vport_context_out, out, + nic_vport_context.mtu); +@@ -429,7 +428,7 @@ int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev, + if (!out) + return -ENOMEM; + +- err = mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, 0, false, out); + if (err) + goto out; + +@@ -451,7 +450,7 @@ int mlx5_query_nic_vport_sd_group(struct mlx5_core_dev *mdev, u8 *sd_group) + if (!out) + return -ENOMEM; + +- err = mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, 0, false, out); + if (err) + goto out; + +@@ -462,7 +461,8 @@ int mlx5_query_nic_vport_sd_group(struct mlx5_core_dev *mdev, u8 *sd_group) + return err; + } + +-int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid) ++int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, ++ u16 vport, bool other_vport, u64 *node_guid) + { + u32 *out; + int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out); +@@ -472,7 +472,7 @@ int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid) + if (!out) + return -ENOMEM; + +- err = mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, vport, other_vport, out); + if (err) + goto out; + +@@ -529,7 +529,7 @@ int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev, + if (!out) + return -ENOMEM; + +- err = mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, 0, false, out); + if (err) + goto out; + +@@ -804,7 +804,7 @@ int mlx5_query_nic_vport_promisc(struct mlx5_core_dev *mdev, + if (!out) + return -ENOMEM; + +- err = mlx5_query_nic_vport_context(mdev, vport, out); ++ err = mlx5_query_nic_vport_context(mdev, vport, vport > 0, out); + if (err) + goto out; + +@@ -908,7 +908,7 @@ int mlx5_nic_vport_query_local_lb(struct mlx5_core_dev *mdev, bool *status) + if (!out) + return -ENOMEM; + +- err = mlx5_query_nic_vport_context(mdev, 0, out); ++ err = mlx5_query_nic_vport_context(mdev, 0, false, out); + if (err) + goto out; + +diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h +index c87b9507cfa1..f876bfc0669c 100644 +--- a/include/linux/mlx5/vport.h ++++ b/include/linux/mlx5/vport.h +@@ -73,7 +73,8 @@ int mlx5_modify_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 mtu); + int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev, + u64 *system_image_guid); + int mlx5_query_nic_vport_sd_group(struct mlx5_core_dev *mdev, u8 *sd_group); +-int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid); ++int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, ++ u16 vport, bool other_vport, u64 *node_guid); + int mlx5_modify_nic_vport_node_guid(struct mlx5_core_dev *mdev, + u16 vport, u64 node_guid); + int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1597-net-mlx5-use-common-mlx5-same-hw-devs-function.patch b/SOURCES/1597-net-mlx5-use-common-mlx5-same-hw-devs-function.patch new file mode 100644 index 000000000..20e5e4ac3 --- /dev/null +++ b/SOURCES/1597-net-mlx5-use-common-mlx5-same-hw-devs-function.patch @@ -0,0 +1,77 @@ +From 0d9a9d2de5e05ddcfd2a420e7529e5eb93c4a905 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:33 -0400 +Subject: [PATCH] net/mlx5: Use common mlx5_same_hw_devs function + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 211de28b1caf51581ba5e0978e83213db4f488c6 +Author: Mark Bloch +Date: Thu Oct 23 12:16:56 2025 +0300 + + net/mlx5: Use common mlx5_same_hw_devs function + + Refactor duplicate hardware device comparison code to use the common + mlx5_same_hw_devs() function instead of reimplementing system GUID + comparison logic in multiple places. + + This cleanup eliminates code duplication in: + - Bridge representor device comparison. + - TC hardware device comparison. + + Using the centralized function improves maintainability and ensures + consistent behavior across the driver. + + Signed-off-by: Mark Bloch + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761211020-925651-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c +index 9d1c677814e0..87a2ad69526d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c +@@ -30,15 +30,11 @@ static bool mlx5_esw_bridge_dev_same_hw(struct net_device *dev, struct mlx5_eswi + { + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5_core_dev *mdev, *esw_mdev; +- u64 system_guid, esw_system_guid; + + mdev = priv->mdev; + esw_mdev = esw->dev; + +- system_guid = mlx5_query_nic_system_image_guid(mdev); +- esw_system_guid = mlx5_query_nic_system_image_guid(esw_mdev); +- +- return system_guid == esw_system_guid; ++ return mlx5_same_hw_devs(mdev, esw_mdev); + } + + static struct net_device * +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index 7a34a502f97f..45d004c7b0dd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -3614,15 +3614,11 @@ static bool same_port_devs(struct mlx5e_priv *priv, struct mlx5e_priv *peer_priv + bool mlx5e_same_hw_devs(struct mlx5e_priv *priv, struct mlx5e_priv *peer_priv) + { + struct mlx5_core_dev *fmdev, *pmdev; +- u64 fsystem_guid, psystem_guid; + + fmdev = priv->mdev; + pmdev = peer_priv->mdev; + +- fsystem_guid = mlx5_query_nic_system_image_guid(fmdev); +- psystem_guid = mlx5_query_nic_system_image_guid(pmdev); +- +- return (fsystem_guid == psystem_guid); ++ return mlx5_same_hw_devs(fmdev, pmdev); + } + + static int +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1598-net-mlx5-add-software-system-image-guid-infrastructure.patch b/SOURCES/1598-net-mlx5-add-software-system-image-guid-infrastructure.patch new file mode 100644 index 000000000..cf0986ae2 --- /dev/null +++ b/SOURCES/1598-net-mlx5-add-software-system-image-guid-infrastructure.patch @@ -0,0 +1,424 @@ +From 8445ffd300ce776e75275484b16cace67ff4a5ac Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:33 -0400 +Subject: [PATCH] net/mlx5: Add software system image GUID infrastructure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7718f2a8b87af7363d60819ac0ac0da8b2f8ff00 +Author: Mark Bloch +Date: Thu Oct 23 12:16:57 2025 +0300 + + net/mlx5: Add software system image GUID infrastructure + + Replace direct hardware system image GUID usage with a new software + system image GUID function that supports variable-length identifiers. + + Key changes: + - Add mlx5_query_nic_sw_system_image_guid() function with length parameter. + - Update all callsites to use the new function and buffer/length approach. + - Modify mapping contexts to use byte arrays instead of u64 keys. + - Update devcom matching to support variable-length keys. + - Change mlx5_same_hw_devs() to use buffer comparison instead of u64. + + This refactoring prepares the infrastructure for balance ID support, + which requires extending the system image GUID with additional data. + The change maintains backward compatibility while enabling future + enhancements. + + Signed-off-by: Mark Bloch + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761211020-925651-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c +index 891bbbbfbbf1..64c04f52990f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c +@@ -564,10 +564,14 @@ int mlx5_rescan_drivers_locked(struct mlx5_core_dev *dev) + + bool mlx5_same_hw_devs(struct mlx5_core_dev *dev, struct mlx5_core_dev *peer_dev) + { +- u64 fsystem_guid, psystem_guid; ++ u8 fsystem_guid[MLX5_SW_IMAGE_GUID_MAX_BYTES]; ++ u8 psystem_guid[MLX5_SW_IMAGE_GUID_MAX_BYTES]; ++ u8 flen; ++ u8 plen; + +- fsystem_guid = mlx5_query_nic_system_image_guid(dev); +- psystem_guid = mlx5_query_nic_system_image_guid(peer_dev); ++ mlx5_query_nic_sw_system_image_guid(dev, fsystem_guid, &flen); ++ mlx5_query_nic_sw_system_image_guid(peer_dev, psystem_guid, &plen); + +- return (fsystem_guid && psystem_guid && fsystem_guid == psystem_guid); ++ return plen && flen && flen == plen && ++ !memcmp(fsystem_guid, psystem_guid, flen); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c +index 0b1ac6e5c890..8818f65d1fbc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c +@@ -40,11 +40,8 @@ void mlx5e_destroy_devlink(struct mlx5e_dev *mlx5e_dev) + static void + mlx5e_devlink_get_port_parent_id(struct mlx5_core_dev *dev, struct netdev_phys_item_id *ppid) + { +- u64 parent_id; +- +- parent_id = mlx5_query_nic_system_image_guid(dev); +- ppid->id_len = sizeof(parent_id); +- memcpy(ppid->id, &parent_id, sizeof(parent_id)); ++ BUILD_BUG_ON(MLX5_SW_IMAGE_GUID_MAX_BYTES > MAX_PHYS_ITEM_ID_LEN); ++ mlx5_query_nic_sw_system_image_guid(dev, ppid->id, &ppid->id_len); + } + + int mlx5e_devlink_port_register(struct mlx5e_dev *mlx5e_dev, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c +index 4e72ca8070e2..1de18c7e96ec 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c +@@ -6,6 +6,7 @@ + #include + #include + #include ++#include + + #include "mapping.h" + +@@ -24,7 +25,8 @@ struct mapping_ctx { + struct delayed_work dwork; + struct list_head pending_list; + spinlock_t pending_list_lock; /* Guards pending list */ +- u64 id; ++ u8 id[MLX5_SW_IMAGE_GUID_MAX_BYTES]; ++ u8 id_len; + u8 type; + struct list_head list; + refcount_t refcount; +@@ -220,13 +222,15 @@ mapping_create(size_t data_size, u32 max_id, bool delayed_removal) + } + + struct mapping_ctx * +-mapping_create_for_id(u64 id, u8 type, size_t data_size, u32 max_id, bool delayed_removal) ++mapping_create_for_id(u8 *id, u8 id_len, u8 type, size_t data_size, u32 max_id, ++ bool delayed_removal) + { + struct mapping_ctx *ctx; + + mutex_lock(&shared_ctx_lock); + list_for_each_entry(ctx, &shared_ctx_list, list) { +- if (ctx->id == id && ctx->type == type) { ++ if (ctx->type == type && ctx->id_len == id_len && ++ !memcmp(id, ctx->id, id_len)) { + if (refcount_inc_not_zero(&ctx->refcount)) + goto unlock; + break; +@@ -237,7 +241,8 @@ mapping_create_for_id(u64 id, u8 type, size_t data_size, u32 max_id, bool delaye + if (IS_ERR(ctx)) + goto unlock; + +- ctx->id = id; ++ memcpy(ctx->id, id, id_len); ++ ctx->id_len = id_len; + ctx->type = type; + list_add(&ctx->list, &shared_ctx_list); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h +index 4e2119f0f4c1..e86a103d58b9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h +@@ -27,6 +27,7 @@ void mapping_destroy(struct mapping_ctx *ctx); + /* adds mapping with an id or get an existing mapping with the same id + */ + struct mapping_ctx * +-mapping_create_for_id(u64 id, u8 type, size_t data_size, u32 max_id, bool delayed_removal); ++mapping_create_for_id(u8 *id, u8 id_len, u8 type, size_t data_size, u32 max_id, ++ bool delayed_removal); + + #endif /* __MLX5_MAPPING_H__ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c +index 896f718483c3..991f47050643 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c +@@ -307,7 +307,8 @@ mlx5e_tc_int_port_init(struct mlx5e_priv *priv) + { + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5e_tc_int_port_priv *int_port_priv; +- u64 mapping_id; ++ u8 mapping_id[MLX5_SW_IMAGE_GUID_MAX_BYTES]; ++ u8 id_len; + + if (!mlx5e_tc_int_port_supported(esw)) + return NULL; +@@ -316,9 +317,10 @@ mlx5e_tc_int_port_init(struct mlx5e_priv *priv) + if (!int_port_priv) + return NULL; + +- mapping_id = mlx5_query_nic_system_image_guid(priv->mdev); ++ mlx5_query_nic_sw_system_image_guid(priv->mdev, mapping_id, &id_len); + +- int_port_priv->metadata_mapping = mapping_create_for_id(mapping_id, MAPPING_TYPE_INT_PORT, ++ int_port_priv->metadata_mapping = mapping_create_for_id(mapping_id, id_len, ++ MAPPING_TYPE_INT_PORT, + sizeof(u32) * 2, + (1 << ESW_VPORT_BITS) - 1, true); + if (IS_ERR(int_port_priv->metadata_mapping)) { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +index 870d12364f99..fc0e57403d25 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +@@ -2287,9 +2287,10 @@ mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains, + enum mlx5_flow_namespace_type ns_type, + struct mlx5e_post_act *post_act) + { ++ u8 mapping_id[MLX5_SW_IMAGE_GUID_MAX_BYTES]; + struct mlx5_tc_ct_priv *ct_priv; + struct mlx5_core_dev *dev; +- u64 mapping_id; ++ u8 id_len; + int err; + + dev = priv->mdev; +@@ -2301,16 +2302,18 @@ mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains, + if (!ct_priv) + goto err_alloc; + +- mapping_id = mlx5_query_nic_system_image_guid(dev); ++ mlx5_query_nic_sw_system_image_guid(dev, mapping_id, &id_len); + +- ct_priv->zone_mapping = mapping_create_for_id(mapping_id, MAPPING_TYPE_ZONE, ++ ct_priv->zone_mapping = mapping_create_for_id(mapping_id, id_len, ++ MAPPING_TYPE_ZONE, + sizeof(u16), 0, true); + if (IS_ERR(ct_priv->zone_mapping)) { + err = PTR_ERR(ct_priv->zone_mapping); + goto err_mapping_zone; + } + +- ct_priv->labels_mapping = mapping_create_for_id(mapping_id, MAPPING_TYPE_LABELS, ++ ct_priv->labels_mapping = mapping_create_for_id(mapping_id, id_len, ++ MAPPING_TYPE_LABELS, + sizeof(u32) * 4, 0, true); + if (IS_ERR(ct_priv->labels_mapping)) { + err = PTR_ERR(ct_priv->labels_mapping); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index 45d004c7b0dd..17ae07b47f7f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -5233,10 +5233,11 @@ static void mlx5e_tc_nic_destroy_miss_table(struct mlx5e_priv *priv) + int mlx5e_tc_nic_init(struct mlx5e_priv *priv) + { + struct mlx5e_tc_table *tc = mlx5e_fs_get_tc(priv->fs); ++ u8 mapping_id[MLX5_SW_IMAGE_GUID_MAX_BYTES]; + struct mlx5_core_dev *dev = priv->mdev; + struct mapping_ctx *chains_mapping; + struct mlx5_chains_attr attr = {}; +- u64 mapping_id; ++ u8 id_len; + int err; + + mlx5e_mod_hdr_tbl_init(&tc->mod_hdr); +@@ -5252,11 +5253,13 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv) + lockdep_set_class(&tc->ht.mutex, &tc_ht_lock_key); + lockdep_init_map(&tc->ht.run_work.lockdep_map, "tc_ht_wq_key", &tc_ht_wq_key, 0); + +- mapping_id = mlx5_query_nic_system_image_guid(dev); ++ mlx5_query_nic_sw_system_image_guid(dev, mapping_id, &id_len); + +- chains_mapping = mapping_create_for_id(mapping_id, MAPPING_TYPE_CHAIN, ++ chains_mapping = mapping_create_for_id(mapping_id, id_len, ++ MAPPING_TYPE_CHAIN, + sizeof(struct mlx5_mapped_obj), +- MLX5E_TC_TABLE_CHAIN_TAG_MASK, true); ++ MLX5E_TC_TABLE_CHAIN_TAG_MASK, ++ true); + + if (IS_ERR(chains_mapping)) { + err = PTR_ERR(chains_mapping); +@@ -5387,14 +5390,15 @@ void mlx5e_tc_ht_cleanup(struct rhashtable *tc_ht) + int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv) + { + const size_t sz_enc_opts = sizeof(struct tunnel_match_enc_opts); ++ u8 mapping_id[MLX5_SW_IMAGE_GUID_MAX_BYTES]; + struct mlx5_devcom_match_attr attr = {}; + struct netdev_phys_item_id ppid; + struct mlx5e_rep_priv *rpriv; + struct mapping_ctx *mapping; + struct mlx5_eswitch *esw; + struct mlx5e_priv *priv; +- u64 mapping_id; + int err = 0; ++ u8 id_len; + + rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv); + priv = netdev_priv(rpriv->netdev); +@@ -5412,9 +5416,9 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv) + + uplink_priv->tc_psample = mlx5e_tc_sample_init(esw, uplink_priv->post_act); + +- mapping_id = mlx5_query_nic_system_image_guid(esw->dev); ++ mlx5_query_nic_sw_system_image_guid(esw->dev, mapping_id, &id_len); + +- mapping = mapping_create_for_id(mapping_id, MAPPING_TYPE_TUNNEL, ++ mapping = mapping_create_for_id(mapping_id, id_len, MAPPING_TYPE_TUNNEL, + sizeof(struct tunnel_match_key), + TUNNEL_INFO_BITS_MASK, true); + +@@ -5427,8 +5431,10 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv) + /* Two last values are reserved for stack devices slow path table mark + * and bridge ingress push mark. + */ +- mapping = mapping_create_for_id(mapping_id, MAPPING_TYPE_TUNNEL_ENC_OPTS, +- sz_enc_opts, ENC_OPTS_BITS_MASK - 2, true); ++ mapping = mapping_create_for_id(mapping_id, id_len, ++ MAPPING_TYPE_TUNNEL_ENC_OPTS, ++ sz_enc_opts, ENC_OPTS_BITS_MASK - 2, ++ true); + if (IS_ERR(mapping)) { + err = PTR_ERR(mapping); + goto err_enc_opts_mapping; +@@ -5449,7 +5455,7 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv) + + err = dev_get_port_parent_id(priv->netdev, &ppid, false); + if (!err) { +- memcpy(&attr.key.val, &ppid.id, sizeof(attr.key.val)); ++ memcpy(&attr.key.buf, &ppid.id, ppid.id_len); + attr.flags = MLX5_DEVCOM_MATCH_FLAGS_NS; + attr.net = mlx5_core_net(esw->dev); + mlx5_esw_offloads_devcom_init(esw, &attr); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +index cf88a106d80d..89a58dee50b3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +@@ -7,11 +7,7 @@ + static void + mlx5_esw_get_port_parent_id(struct mlx5_core_dev *dev, struct netdev_phys_item_id *ppid) + { +- u64 parent_id; +- +- parent_id = mlx5_query_nic_system_image_guid(dev); +- ppid->id_len = sizeof(parent_id); +- memcpy(ppid->id, &parent_id, sizeof(parent_id)); ++ mlx5_query_nic_sw_system_image_guid(dev, ppid->id, &ppid->id_len); + } + + static bool mlx5_esw_devlink_port_supported(struct mlx5_eswitch *esw, u16 vport_num) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 05270d5bc6df..8eb08d2276be 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -3556,10 +3556,11 @@ bool mlx5_esw_offloads_controller_valid(const struct mlx5_eswitch *esw, u32 cont + + int esw_offloads_enable(struct mlx5_eswitch *esw) + { ++ u8 mapping_id[MLX5_SW_IMAGE_GUID_MAX_BYTES]; + struct mapping_ctx *reg_c0_obj_pool; + struct mlx5_vport *vport; + unsigned long i; +- u64 mapping_id; ++ u8 id_len; + int err; + + mutex_init(&esw->offloads.termtbl_mutex); +@@ -3581,9 +3582,10 @@ int esw_offloads_enable(struct mlx5_eswitch *esw) + if (err) + goto err_vport_metadata; + +- mapping_id = mlx5_query_nic_system_image_guid(esw->dev); ++ mlx5_query_nic_sw_system_image_guid(esw->dev, mapping_id, &id_len); + +- reg_c0_obj_pool = mapping_create_for_id(mapping_id, MAPPING_TYPE_CHAIN, ++ reg_c0_obj_pool = mapping_create_for_id(mapping_id, id_len, ++ MAPPING_TYPE_CHAIN, + sizeof(struct mlx5_mapped_obj), + ESW_REG_C0_USER_DATA_METADATA_MASK, + true); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +index 3db0387bf6dc..1ac933cd8f02 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +@@ -1418,10 +1418,12 @@ static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev) + static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev) + { + struct mlx5_devcom_match_attr attr = { +- .key.val = mlx5_query_nic_system_image_guid(dev), + .flags = MLX5_DEVCOM_MATCH_FLAGS_NS, + .net = mlx5_core_net(dev), + }; ++ u8 len __always_unused; ++ ++ mlx5_query_nic_sw_system_image_guid(dev, attr.key.buf, &len); + + /* This component is use to sync adding core_dev to lag_dev and to sync + * changes of mlx5_adev_devices between LAG layer and other layers. +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +index 609c85f47917..91e5ae529d5c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +@@ -10,8 +10,10 @@ enum mlx5_devom_match_flags { + MLX5_DEVCOM_MATCH_FLAGS_NS = BIT(0), + }; + ++#define MLX5_DEVCOM_MATCH_KEY_MAX 32 + union mlx5_devcom_match_key { + u64 val; ++ u8 buf[MLX5_DEVCOM_MATCH_KEY_MAX]; + }; + + struct mlx5_devcom_match_attr { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index 082259b56816..acef7d0ffa09 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -444,6 +444,8 @@ int mlx5_init_one_light(struct mlx5_core_dev *dev); + void mlx5_uninit_one_light(struct mlx5_core_dev *dev); + void mlx5_unload_one_light(struct mlx5_core_dev *dev); + ++void mlx5_query_nic_sw_system_image_guid(struct mlx5_core_dev *mdev, u8 *buf, ++ u8 *len); + int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, u16 vport, + u16 opmod); + #define mlx5_vport_get_other_func_general_cap(dev, vport, out) \ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +index d1483f66cd0c..8f23d2e9d284 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +@@ -1190,6 +1190,21 @@ u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev) + } + EXPORT_SYMBOL_GPL(mlx5_query_nic_system_image_guid); + ++void mlx5_query_nic_sw_system_image_guid(struct mlx5_core_dev *mdev, u8 *buf, ++ u8 *len) ++{ ++ u64 fw_system_image_guid; ++ ++ *len = 0; ++ ++ fw_system_image_guid = mlx5_query_nic_system_image_guid(mdev); ++ if (!fw_system_image_guid) ++ return; ++ ++ memcpy(buf, &fw_system_image_guid, sizeof(fw_system_image_guid)); ++ *len += sizeof(fw_system_image_guid); ++} ++ + static bool mlx5_vport_use_vhca_id_as_func_id(struct mlx5_core_dev *dev, + u16 vport_num, u16 *vhca_id) + { +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 5405ca1038f9..dcf262aa9ea6 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -1379,4 +1379,7 @@ static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev) + { + return devlink_net(priv_to_devlink(dev)); + } ++ ++#define MLX5_SW_IMAGE_GUID_MAX_BYTES 8 ++ + #endif /* MLX5_DRIVER_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1599-net-mlx5-refactor-ptp-clock-devcom-pairing.patch b/SOURCES/1599-net-mlx5-refactor-ptp-clock-devcom-pairing.patch new file mode 100644 index 000000000..e75dda44b --- /dev/null +++ b/SOURCES/1599-net-mlx5-refactor-ptp-clock-devcom-pairing.patch @@ -0,0 +1,84 @@ +From deaf4963449dd436d854c56969edc939d10a0311 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5: Refactor PTP clock devcom pairing + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cd36818c34ac5ff7f6a50ce88822c7bbb5ac9e0d +Author: Mark Bloch +Date: Thu Oct 23 12:16:58 2025 +0300 + + net/mlx5: Refactor PTP clock devcom pairing + + Refactor PTP clock device component pairing to use the clock identity + buffer instead of casting it to a u64 key. This change leverages the new + software system image GUID infrastructure. + + Changes include: + - Pass identity buffer to mlx5_shared_clock_register(). + - Use memcpy for identity buffer in devcom matching attributes. + - Remove intermediate u64 key conversion. + - Add BUILD_BUG_ON to ensure identity size fits in match key. + + Signed-off-by: Mark Bloch + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761211020-925651-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +index 29e7fa09c32c..0ba0ef8bae42 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +@@ -1432,15 +1432,17 @@ static int mlx5_clock_alloc(struct mlx5_core_dev *mdev, bool shared) + return 0; + } + +-static void mlx5_shared_clock_register(struct mlx5_core_dev *mdev, u64 key) ++static void mlx5_shared_clock_register(struct mlx5_core_dev *mdev, ++ u8 identity[MLX5_RT_CLOCK_IDENTITY_SIZE]) + { + struct mlx5_core_dev *peer_dev, *next = NULL; +- struct mlx5_devcom_match_attr attr = { +- .key.val = key, +- }; ++ struct mlx5_devcom_match_attr attr = {}; + struct mlx5_devcom_comp_dev *compd; + struct mlx5_devcom_comp_dev *pos; + ++ BUILD_BUG_ON(MLX5_RT_CLOCK_IDENTITY_SIZE > MLX5_DEVCOM_MATCH_KEY_MAX); ++ memcpy(attr.key.buf, identity, MLX5_RT_CLOCK_IDENTITY_SIZE); ++ + compd = mlx5_devcom_register_component(mdev->priv.devc, + MLX5_DEVCOM_SHARED_CLOCK, + &attr, NULL, mdev); +@@ -1594,7 +1596,6 @@ int mlx5_init_clock(struct mlx5_core_dev *mdev) + { + u8 identity[MLX5_RT_CLOCK_IDENTITY_SIZE]; + struct mlx5_clock_dev_state *clock_state; +- u64 key; + int err; + + if (!MLX5_CAP_GEN(mdev, device_frequency_khz)) { +@@ -1610,12 +1611,10 @@ int mlx5_init_clock(struct mlx5_core_dev *mdev) + mdev->clock_state = clock_state; + + if (MLX5_CAP_MCAM_REG3(mdev, mrtcq) && mlx5_real_time_mode(mdev)) { +- if (mlx5_clock_identity_get(mdev, identity)) { ++ if (mlx5_clock_identity_get(mdev, identity)) + mlx5_core_warn(mdev, "failed to get rt clock identity, create ptp dev per function\n"); +- } else { +- memcpy(&key, &identity, sizeof(key)); +- mlx5_shared_clock_register(mdev, key); +- } ++ else ++ mlx5_shared_clock_register(mdev, identity); + } + + if (!mdev->clock) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1600-net-mlx5-refactor-hca-cap-2-setting.patch b/SOURCES/1600-net-mlx5-refactor-hca-cap-2-setting.patch new file mode 100644 index 000000000..0e4074296 --- /dev/null +++ b/SOURCES/1600-net-mlx5-refactor-hca-cap-2-setting.patch @@ -0,0 +1,78 @@ +From 91356ee2e31410460f681a78ec9a2dd33fa38c32 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5: Refactor HCA cap 2 setting + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 075e85a1261e4653c2068e68a8c91da6c7bc4e60 +Author: Mark Bloch +Date: Thu Oct 23 12:16:59 2025 +0300 + + net/mlx5: Refactor HCA cap 2 setting + + Refactor HCA capability 2 setting logic to be more structured and + conditional. Move the sw_vhca_id_valid setting inside proper conditional + checks and prepare the function for additional capability settings. + + The refactoring: + - Always copy current capabilities to set_hca_cap buffer. + - Apply sw_vhca_id_valid setting only when conditions are met. + - Improve code readability and maintainability. + + This cleanup prepares the handle_hca_cap_2() function for the upcoming + balance ID capability setting. + + Signed-off-by: Mark Bloch + Reviewed-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761211020-925651-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 81930a461e62..4f2969a56ff2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -553,6 +553,7 @@ EXPORT_SYMBOL(mlx5_is_roce_on); + + static int handle_hca_cap_2(struct mlx5_core_dev *dev, void *set_ctx) + { ++ bool do_set = false; + void *set_hca_cap; + int err; + +@@ -563,17 +564,22 @@ static int handle_hca_cap_2(struct mlx5_core_dev *dev, void *set_ctx) + if (err) + return err; + +- if (!MLX5_CAP_GEN_2_MAX(dev, sw_vhca_id_valid) || +- !(dev->priv.sw_vhca_id > 0)) +- return 0; +- + set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, + capability); + memcpy(set_hca_cap, dev->caps.hca[MLX5_CAP_GENERAL_2]->cur, + MLX5_ST_SZ_BYTES(cmd_hca_cap_2)); +- MLX5_SET(cmd_hca_cap_2, set_hca_cap, sw_vhca_id_valid, 1); + +- return set_caps(dev, set_ctx, MLX5_CAP_GENERAL_2); ++ if (MLX5_CAP_GEN_2_MAX(dev, sw_vhca_id_valid) && ++ dev->priv.sw_vhca_id > 0) { ++ MLX5_SET(cmd_hca_cap_2, set_hca_cap, sw_vhca_id_valid, 1); ++ do_set = true; ++ } ++ ++ /* some FW versions that support querying MLX5_CAP_GENERAL_2 ++ * capabilities but don't support setting them. ++ * Skip unnecessary update to hca_cap_2 when no changes were introduced ++ */ ++ return do_set ? set_caps(dev, set_ctx, MLX5_CAP_GENERAL_2) : 0; + } + + static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1601-net-mlx5-add-balance-id-support-for-lag-multiplane-groups.patch b/SOURCES/1601-net-mlx5-add-balance-id-support-for-lag-multiplane-groups.patch new file mode 100644 index 000000000..80310021e --- /dev/null +++ b/SOURCES/1601-net-mlx5-add-balance-id-support-for-lag-multiplane-groups.patch @@ -0,0 +1,86 @@ +From 826353d76400033b9de3a11f625bd5d28db13e04 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5: Add balance ID support for LAG multiplane groups + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 20d78ead947783b039b02ca4b8c551b4d1894759 +Author: Mark Bloch +Date: Thu Oct 23 12:17:00 2025 +0300 + + net/mlx5: Add balance ID support for LAG multiplane groups + + Implement balance ID support for multiplane LAG configurations. This + feature enables per-multiplane group load balancing by extending the + software system image GUID with a balance ID component. + + Key implementations: + - Enable lag_per_mp_group capability when supported by hardware. + - Append load_balance_id to software system image GUID when conditions + are met. + - Increase MLX5_SW_IMAGE_GUID_MAX_BYTES from 8 to 9 to accommodate the + extra byte. + + The balance ID is appended to the system image GUID only when both + load_balance_id and lag_per_mp_group capabilities are available, ensuring + backward compatibility while enabling enhanced LAG functionality. + + This enhancement allows for more granular load balancing control in complex + multi-plane LAG deployments, improving network performance and flexibility. + + Signed-off-by: Mark Bloch + Reviewed-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761211020-925651-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 4f2969a56ff2..b0d8d9888629 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -575,6 +575,11 @@ static int handle_hca_cap_2(struct mlx5_core_dev *dev, void *set_ctx) + do_set = true; + } + ++ if (MLX5_CAP_GEN_2_MAX(dev, lag_per_mp_group)) { ++ MLX5_SET(cmd_hca_cap_2, set_hca_cap, lag_per_mp_group, 1); ++ do_set = true; ++ } ++ + /* some FW versions that support querying MLX5_CAP_GENERAL_2 + * capabilities but don't support setting them. + * Skip unnecessary update to hca_cap_2 when no changes were introduced +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +index 8f23d2e9d284..306affbcfd3b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c +@@ -1203,6 +1203,10 @@ void mlx5_query_nic_sw_system_image_guid(struct mlx5_core_dev *mdev, u8 *buf, + + memcpy(buf, &fw_system_image_guid, sizeof(fw_system_image_guid)); + *len += sizeof(fw_system_image_guid); ++ ++ if (MLX5_CAP_GEN_2(mdev, load_balance_id) && ++ MLX5_CAP_GEN_2(mdev, lag_per_mp_group)) ++ buf[(*len)++] = MLX5_CAP_GEN_2(mdev, load_balance_id); + } + + static bool mlx5_vport_use_vhca_id_as_func_id(struct mlx5_core_dev *dev, +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index dcf262aa9ea6..046396269ccf 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -1380,6 +1380,6 @@ static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev) + return devlink_net(priv_to_devlink(dev)); + } + +-#define MLX5_SW_IMAGE_GUID_MAX_BYTES 8 ++#define MLX5_SW_IMAGE_GUID_MAX_BYTES 9 + + #endif /* MLX5_DRIVER_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1602-net-mlx5e-remove-redundant-tstamp-pointer-from-channel-struc.patch b/SOURCES/1602-net-mlx5e-remove-redundant-tstamp-pointer-from-channel-struc.patch new file mode 100644 index 000000000..8020111f8 --- /dev/null +++ b/SOURCES/1602-net-mlx5e-remove-redundant-tstamp-pointer-from-channel-struc.patch @@ -0,0 +1,139 @@ +From 7af23ae57d22fb171661939465f9532c0b0efdaf Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5e: Remove redundant tstamp pointer from channel + structures + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7ea4376b3972d89385599307d1ad4f20eb763a05 +Author: Carolina Jubran +Date: Thu Oct 30 12:25:05 2025 +0200 + + net/mlx5e: Remove redundant tstamp pointer from channel structures + + Remove the tstamp pointer field from mlx5e_channel, mlx5e_ptp, and + mlx5e_trap structures, since it was only used to reference the tstamp + field in the priv structure. Instead, directly use the tstamp field + from priv when initializing RQ structures. + + Also remove the unused hwtstamp_config field from mlx5_clock structure + as part of the cleanup. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761819910-1011051-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 964df2d545b0..32598ea81e40 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -784,7 +784,6 @@ struct mlx5e_channel { + /* control */ + struct mlx5e_priv *priv; + struct mlx5_core_dev *mdev; +- struct hwtstamp_config *tstamp; + DECLARE_BITMAP(state, MLX5E_CHANNEL_NUM_STATES); + int ix; + int vec_ix; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +index d1e0f974b8a3..7685494cc57b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +@@ -881,7 +881,6 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params, + + c->priv = priv; + c->mdev = priv->mdev; +- c->tstamp = &priv->tstamp; + c->pdev = mlx5_core_dma_dev(priv->mdev); + c->netdev = priv->netdev; + c->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.hw_objs.mkey); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +index 1b3c9648220b..1c0e0a86a9ac 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +@@ -64,7 +64,6 @@ struct mlx5e_ptp { + /* control */ + struct mlx5e_priv *priv; + struct mlx5_core_dev *mdev; +- struct hwtstamp_config *tstamp; + DECLARE_BITMAP(state, MLX5E_PTP_STATE_NUM_STATES); + struct mlx5_sq_bfreg *bfreg; + }; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c +index 5099a1c47f4f..6d0a0d6e8d5f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c +@@ -144,7 +144,6 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_priv *priv) + + t->priv = priv; + t->mdev = priv->mdev; +- t->tstamp = &priv->tstamp; + t->pdev = mlx5_core_dma_dev(priv->mdev); + t->netdev = priv->netdev; + t->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.hw_objs.mkey); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.h b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.h +index aa3f17658c6d..394e917ea2b0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.h +@@ -22,7 +22,6 @@ struct mlx5e_trap { + /* control */ + struct mlx5e_priv *priv; + struct mlx5_core_dev *mdev; +- struct hwtstamp_config *tstamp; + DECLARE_BITMAP(state, MLX5E_CHANNEL_NUM_STATES); + + struct mlx5e_params params; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +index dbd88eb5c082..dc5a4afa4974 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +@@ -71,7 +71,7 @@ static int mlx5e_init_xsk_rq(struct mlx5e_channel *c, + rq->pdev = c->pdev; + rq->netdev = c->netdev; + rq->priv = c->priv; +- rq->tstamp = c->tstamp; ++ rq->tstamp = &c->priv->tstamp; + rq->clock = mdev->clock; + rq->icosq = &c->icosq; + rq->ix = c->ix; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index ef655b8abc96..4b2407b38e7b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -734,7 +734,7 @@ static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param + rq->pdev = c->pdev; + rq->netdev = c->netdev; + rq->priv = c->priv; +- rq->tstamp = c->tstamp; ++ rq->tstamp = &c->priv->tstamp; + rq->clock = mdev->clock; + rq->icosq = &c->icosq; + rq->ix = c->ix; +@@ -2788,7 +2788,6 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, + + c->priv = priv; + c->mdev = mdev; +- c->tstamp = &priv->tstamp; + c->ix = ix; + c->vec_ix = vec_ix; + c->sd_ix = mlx5_sd_ch_ix_get_dev_ix(mdev, ix); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.h +index c18a652c0faa..aff3aed62c74 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.h +@@ -54,7 +54,6 @@ struct mlx5_timer { + + struct mlx5_clock { + seqlock_t lock; +- struct hwtstamp_config hwtstamp_config; + struct ptp_clock *ptp; + struct ptp_clock_info ptp_info; + struct mlx5_pps pps_info; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1603-net-mlx5e-remove-unnecessary-tstamp-local-variable-in-mlx5i-.patch b/SOURCES/1603-net-mlx5e-remove-unnecessary-tstamp-local-variable-in-mlx5i-.patch new file mode 100644 index 000000000..50caee578 --- /dev/null +++ b/SOURCES/1603-net-mlx5e-remove-unnecessary-tstamp-local-variable-in-mlx5i-.patch @@ -0,0 +1,58 @@ +From bca2504af0dcf4f1fe144801c35aadc2d8a92410 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5e: Remove unnecessary tstamp local variable in + mlx5i_complete_rx_cqe + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit bf791659743b1a8e20f5810b1ac893b7b24f650e +Author: Carolina Jubran +Date: Thu Oct 30 12:25:06 2025 +0200 + + net/mlx5e: Remove unnecessary tstamp local variable in mlx5i_complete_rx_cqe + + Remove the tstamp local variable in mlx5i_complete_rx_cqe() and directly + pass the tstamp field from priv to mlx5e_rx_hw_stamp(). The local variable + was only used once and provided no additional value. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761819910-1011051-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 21be5dcf47d5..ed1fb4096271 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -2605,7 +2605,6 @@ static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq, + u32 cqe_bcnt, + struct sk_buff *skb) + { +- struct hwtstamp_config *tstamp; + struct mlx5e_rq_stats *stats; + struct net_device *netdev; + struct mlx5e_priv *priv; +@@ -2629,7 +2628,6 @@ static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq, + } + + priv = mlx5i_epriv(netdev); +- tstamp = &priv->tstamp; + stats = &priv->channel_stats[rq->ix]->rq; + + flags_rqpn = be32_to_cpu(cqe->flags_rqpn); +@@ -2665,7 +2663,7 @@ static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq, + stats->csum_none++; + } + +- if (unlikely(mlx5e_rx_hw_stamp(tstamp))) ++ if (unlikely(mlx5e_rx_hw_stamp(&priv->tstamp))) + skb_hwtstamps(skb)->hwtstamp = mlx5e_cqe_ts_to_ns(rq->ptp_cyc2time, + rq->clock, get_cqe_ts(cqe)); + skb_record_rx_queue(skb, rq->ix); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1604-net-mlx5e-rename-hwstamp-functions-to-hwtstamp.patch b/SOURCES/1604-net-mlx5e-rename-hwstamp-functions-to-hwtstamp.patch new file mode 100644 index 000000000..24eda6e68 --- /dev/null +++ b/SOURCES/1604-net-mlx5e-rename-hwstamp-functions-to-hwtstamp.patch @@ -0,0 +1,93 @@ +From 2698fb06119fde802d822433049c18d489b6d0be Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5e: Rename hwstamp functions to hwtstamp + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fee182371a59414d43633a5ea6f1cda160418a16 +Author: Carolina Jubran +Date: Thu Oct 30 12:25:07 2025 +0200 + + net/mlx5e: Rename hwstamp functions to hwtstamp + + Rename mlx5e_hwstamp_set/get() functions to mlx5e_hwtstamp_set/get() + to better reflect that these functions handle hardware timestamping, + not just hardware stamping. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761819910-1011051-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 32598ea81e40..f042de60688d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -1022,8 +1022,8 @@ void mlx5e_self_test(struct net_device *ndev, struct ethtool_test *etest, + u64 *buf); + void mlx5e_set_rx_mode_work(struct work_struct *work); + +-int mlx5e_hwstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr); +-int mlx5e_hwstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr); ++int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr); ++int mlx5e_hwtstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr); + int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val, bool rx_filter); + + int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 4b2407b38e7b..9a6d82b5ccc7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -4725,7 +4725,7 @@ static int mlx5e_hwstamp_config_ptp_rx(struct mlx5e_priv *priv, bool ptp_rx) + &new_params.ptp_rx, true); + } + +-int mlx5e_hwstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) ++int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) + { + struct hwtstamp_config config; + bool rx_cqe_compress_def; +@@ -4803,7 +4803,7 @@ int mlx5e_hwstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) + return err; + } + +-int mlx5e_hwstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr) ++int mlx5e_hwtstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr) + { + struct hwtstamp_config *cfg = &priv->tstamp; + +@@ -4819,9 +4819,9 @@ static int mlx5e_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) + + switch (cmd) { + case SIOCSHWTSTAMP: +- return mlx5e_hwstamp_set(priv, ifr); ++ return mlx5e_hwtstamp_set(priv, ifr); + case SIOCGHWTSTAMP: +- return mlx5e_hwstamp_get(priv, ifr); ++ return mlx5e_hwtstamp_get(priv, ifr); + default: + return -EOPNOTSUPP; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +index 0979d672d47f..f3a249b59482 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +@@ -560,9 +560,9 @@ int mlx5i_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) + + switch (cmd) { + case SIOCSHWTSTAMP: +- return mlx5e_hwstamp_set(priv, ifr); ++ return mlx5e_hwtstamp_set(priv, ifr); + case SIOCGHWTSTAMP: +- return mlx5e_hwstamp_get(priv, ifr); ++ return mlx5e_hwtstamp_get(priv, ifr); + default: + return -EOPNOTSUPP; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1605-net-mlx5e-rename-timestamp-fields-to-hwtstamp-config.patch b/SOURCES/1605-net-mlx5e-rename-timestamp-fields-to-hwtstamp-config.patch new file mode 100644 index 000000000..e87bf9951 --- /dev/null +++ b/SOURCES/1605-net-mlx5e-rename-timestamp-fields-to-hwtstamp-config.patch @@ -0,0 +1,194 @@ +From e9557d650bf01676fa2ad210556ab69f3d2f552f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5e: Rename timestamp fields to hwtstamp_config + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 91baaf96f5d0b764aec462dd50a8433f5c8d621f +Author: Carolina Jubran +Date: Thu Oct 30 12:25:08 2025 +0200 + + net/mlx5e: Rename timestamp fields to hwtstamp_config + + Rename hardware timestamp-related fields from 'tstamp' to + 'hwtstamp_config' throughout the MLX5 driver. The new name is more + descriptive as it clearly indicates these fields contain hardware + timestamp configuration. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761819910-1011051-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index f042de60688d..a853ec4e529a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -696,7 +696,7 @@ struct mlx5e_rq { + struct mlx5e_rq_stats *stats; + struct mlx5e_cq cq; + struct mlx5e_cq_decomp cqd; +- struct hwtstamp_config *tstamp; ++ struct hwtstamp_config *hwtstamp_config; + struct mlx5_clock *clock; + struct mlx5e_icosq *icosq; + struct mlx5e_priv *priv; +@@ -917,7 +917,7 @@ struct mlx5e_priv { + u8 max_opened_tc; + bool tx_ptp_opened; + bool rx_ptp_opened; +- struct hwtstamp_config tstamp; ++ struct hwtstamp_config hwtstamp_config; + u16 q_counter[MLX5_SD_MAX_GROUP_SZ]; + u16 drop_rq_q_counter; + struct notifier_block events_nb; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +index 7685494cc57b..92b57e3aaa85 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +@@ -698,7 +698,7 @@ static int mlx5e_init_ptp_rq(struct mlx5e_ptp *c, struct mlx5e_params *params, + rq->netdev = priv->netdev; + rq->priv = priv; + rq->clock = mdev->clock; +- rq->tstamp = &priv->tstamp; ++ rq->hwtstamp_config = &priv->hwtstamp_config; + rq->mdev = mdev; + rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + rq->stats = &c->priv->ptp_stats.rq; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +index b1415992ffa2..0686fbdd5a05 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +@@ -318,7 +318,8 @@ mlx5e_rx_reporter_diagnose_common_ptp_config(struct mlx5e_priv *priv, struct mlx + struct devlink_fmsg *fmsg) + { + mlx5e_health_fmsg_named_obj_nest_start(fmsg, "PTP"); +- devlink_fmsg_u32_pair_put(fmsg, "filter_type", priv->tstamp.rx_filter); ++ devlink_fmsg_u32_pair_put(fmsg, "filter_type", ++ priv->hwtstamp_config.rx_filter); + mlx5e_rx_reporter_diagnose_generic_rq(&ptp_ch->rq, fmsg); + mlx5e_health_fmsg_named_obj_nest_end(fmsg); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c +index 6d0a0d6e8d5f..1b1c89014b70 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c +@@ -47,7 +47,7 @@ static void mlx5e_init_trap_rq(struct mlx5e_trap *t, struct mlx5e_params *params + rq->netdev = priv->netdev; + rq->priv = priv; + rq->clock = mdev->clock; +- rq->tstamp = &priv->tstamp; ++ rq->hwtstamp_config = &priv->hwtstamp_config; + rq->mdev = mdev; + rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + rq->stats = &priv->trap_stats.rq; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +index 1f9d012231d8..027c55187378 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +@@ -178,7 +178,7 @@ static int mlx5e_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp) + { + const struct mlx5e_xdp_buff *_ctx = (void *)ctx; + +- if (unlikely(!mlx5e_rx_hw_stamp(_ctx->rq->tstamp))) ++ if (unlikely(!mlx5e_rx_hw_stamp(_ctx->rq->hwtstamp_config))) + return -ENODATA; + + *timestamp = mlx5e_cqe_ts_to_ns(_ctx->rq->ptp_cyc2time, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +index dc5a4afa4974..5981c71cae2d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +@@ -71,7 +71,7 @@ static int mlx5e_init_xsk_rq(struct mlx5e_channel *c, + rq->pdev = c->pdev; + rq->netdev = c->netdev; + rq->priv = c->priv; +- rq->tstamp = &c->priv->tstamp; ++ rq->hwtstamp_config = &c->priv->hwtstamp_config; + rq->clock = mdev->clock; + rq->icosq = &c->icosq; + rq->ix = c->ix; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index e7c9f22ac1fc..5a0f5589b894 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -2275,7 +2275,7 @@ static int set_pflag_rx_cqe_compress(struct net_device *netdev, + if (!MLX5_CAP_GEN(mdev, cqe_compression)) + return -EOPNOTSUPP; + +- rx_filter = priv->tstamp.rx_filter != HWTSTAMP_FILTER_NONE; ++ rx_filter = priv->hwtstamp_config.rx_filter != HWTSTAMP_FILTER_NONE; + err = mlx5e_modify_rx_cqe_compression_locked(priv, enable, rx_filter); + if (err) + return err; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 9a6d82b5ccc7..2fb121be52d8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -734,7 +734,7 @@ static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param + rq->pdev = c->pdev; + rq->netdev = c->netdev; + rq->priv = c->priv; +- rq->tstamp = &c->priv->tstamp; ++ rq->hwtstamp_config = &c->priv->hwtstamp_config; + rq->clock = mdev->clock; + rq->icosq = &c->icosq; + rq->ix = c->ix; +@@ -3429,8 +3429,8 @@ int mlx5e_safe_reopen_channels(struct mlx5e_priv *priv) + + void mlx5e_timestamp_init(struct mlx5e_priv *priv) + { +- priv->tstamp.tx_type = HWTSTAMP_TX_OFF; +- priv->tstamp.rx_filter = HWTSTAMP_FILTER_NONE; ++ priv->hwtstamp_config.tx_type = HWTSTAMP_TX_OFF; ++ priv->hwtstamp_config.rx_filter = HWTSTAMP_FILTER_NONE; + } + + static void mlx5e_modify_admin_state(struct mlx5_core_dev *mdev, +@@ -4790,7 +4790,7 @@ int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) + if (err) + goto err_unlock; + +- memcpy(&priv->tstamp, &config, sizeof(config)); ++ memcpy(&priv->hwtstamp_config, &config, sizeof(config)); + mutex_unlock(&priv->state_lock); + + /* might need to fix some features */ +@@ -4805,7 +4805,7 @@ int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) + + int mlx5e_hwtstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr) + { +- struct hwtstamp_config *cfg = &priv->tstamp; ++ struct hwtstamp_config *cfg = &priv->hwtstamp_config; + + if (!MLX5_CAP_GEN(priv->mdev, device_frequency_khz)) + return -EOPNOTSUPP; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index ed1fb4096271..0afdc68896c3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -1582,7 +1582,7 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, + stats->lro_bytes += cqe_bcnt; + } + +- if (unlikely(mlx5e_rx_hw_stamp(rq->tstamp))) ++ if (unlikely(mlx5e_rx_hw_stamp(rq->hwtstamp_config))) + skb_hwtstamps(skb)->hwtstamp = mlx5e_cqe_ts_to_ns(rq->ptp_cyc2time, + rq->clock, get_cqe_ts(cqe)); + skb_record_rx_queue(skb, rq->ix); +@@ -2663,7 +2663,7 @@ static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq, + stats->csum_none++; + } + +- if (unlikely(mlx5e_rx_hw_stamp(&priv->tstamp))) ++ if (unlikely(mlx5e_rx_hw_stamp(&priv->hwtstamp_config))) + skb_hwtstamps(skb)->hwtstamp = mlx5e_cqe_ts_to_ns(rq->ptp_cyc2time, + rq->clock, get_cqe_ts(cqe)); + skb_record_rx_queue(skb, rq->ix); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1606-net-mlx5e-convert-to-new-hwtstamp-get-set-interface.patch b/SOURCES/1606-net-mlx5e-convert-to-new-hwtstamp-get-set-interface.patch new file mode 100644 index 000000000..25cb07f09 --- /dev/null +++ b/SOURCES/1606-net-mlx5e-convert-to-new-hwtstamp-get-set-interface.patch @@ -0,0 +1,322 @@ +From 3b75686a21a3077a62a49db325f7265cd9f56376 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5e: Convert to new hwtstamp_get/set interface + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 1c7fe48a90158486a66665f1401d0b90bd390ef0 +Author: Carolina Jubran +Date: Thu Oct 30 12:25:10 2025 +0200 + + net/mlx5e: Convert to new hwtstamp_get/set interface + + Migrate from the legacy ioctl hardware timestamping interface to the + ndo_hwtstamp_get/set operations. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1761819910-1011051-7-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index a853ec4e529a..208870fe79c1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -696,7 +696,7 @@ struct mlx5e_rq { + struct mlx5e_rq_stats *stats; + struct mlx5e_cq cq; + struct mlx5e_cq_decomp cqd; +- struct hwtstamp_config *hwtstamp_config; ++ struct kernel_hwtstamp_config *hwtstamp_config; + struct mlx5_clock *clock; + struct mlx5e_icosq *icosq; + struct mlx5e_priv *priv; +@@ -917,7 +917,7 @@ struct mlx5e_priv { + u8 max_opened_tc; + bool tx_ptp_opened; + bool rx_ptp_opened; +- struct hwtstamp_config hwtstamp_config; ++ struct kernel_hwtstamp_config hwtstamp_config; + u16 q_counter[MLX5_SD_MAX_GROUP_SZ]; + u16 drop_rq_q_counter; + struct notifier_block events_nb; +@@ -1022,8 +1022,11 @@ void mlx5e_self_test(struct net_device *ndev, struct ethtool_test *etest, + u64 *buf); + void mlx5e_set_rx_mode_work(struct work_struct *work); + +-int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr); +-int mlx5e_hwtstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr); ++int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, ++ struct kernel_hwtstamp_config *config, ++ struct netlink_ext_ack *extack); ++int mlx5e_hwtstamp_get(struct mlx5e_priv *priv, ++ struct kernel_hwtstamp_config *config); + int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val, bool rx_filter); + + int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h +index 8189d5e1ef49..07945e182b4f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h +@@ -92,7 +92,7 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget); + void mlx5e_free_rx_descs(struct mlx5e_rq *rq); + void mlx5e_free_rx_missing_descs(struct mlx5e_rq *rq); + +-static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config) ++static inline bool mlx5e_rx_hw_stamp(struct kernel_hwtstamp_config *config) + { + return config->rx_filter == HWTSTAMP_FILTER_ALL; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 2fb121be52d8..2390975c1252 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -4725,22 +4725,23 @@ static int mlx5e_hwstamp_config_ptp_rx(struct mlx5e_priv *priv, bool ptp_rx) + &new_params.ptp_rx, true); + } + +-int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) ++int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, ++ struct kernel_hwtstamp_config *config, ++ struct netlink_ext_ack *extack) + { +- struct hwtstamp_config config; + bool rx_cqe_compress_def; + bool ptp_rx; + int err; + + if (!MLX5_CAP_GEN(priv->mdev, device_frequency_khz) || +- (mlx5_clock_get_ptp_index(priv->mdev) == -1)) ++ (mlx5_clock_get_ptp_index(priv->mdev) == -1)) { ++ NL_SET_ERR_MSG_MOD(extack, ++ "Timestamps are not supported on this device"); + return -EOPNOTSUPP; +- +- if (copy_from_user(&config, ifr->ifr_data, sizeof(config))) +- return -EFAULT; ++ } + + /* TX HW timestamp */ +- switch (config.tx_type) { ++ switch (config->tx_type) { + case HWTSTAMP_TX_OFF: + case HWTSTAMP_TX_ON: + break; +@@ -4752,7 +4753,7 @@ int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) + rx_cqe_compress_def = priv->channels.params.rx_cqe_compress_def; + + /* RX HW timestamp */ +- switch (config.rx_filter) { ++ switch (config->rx_filter) { + case HWTSTAMP_FILTER_NONE: + ptp_rx = false; + break; +@@ -4771,7 +4772,7 @@ int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) + case HWTSTAMP_FILTER_PTP_V2_SYNC: + case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: + case HWTSTAMP_FILTER_NTP_ALL: +- config.rx_filter = HWTSTAMP_FILTER_ALL; ++ config->rx_filter = HWTSTAMP_FILTER_ALL; + /* ptp_rx is set if both HW TS is set and CQE + * compression is set + */ +@@ -4784,47 +4785,50 @@ int mlx5e_hwtstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr) + + if (!mlx5e_profile_feature_cap(priv->profile, PTP_RX)) + err = mlx5e_hwstamp_config_no_ptp_rx(priv, +- config.rx_filter != HWTSTAMP_FILTER_NONE); ++ config->rx_filter != HWTSTAMP_FILTER_NONE); + else + err = mlx5e_hwstamp_config_ptp_rx(priv, ptp_rx); + if (err) + goto err_unlock; + +- memcpy(&priv->hwtstamp_config, &config, sizeof(config)); ++ priv->hwtstamp_config = *config; + mutex_unlock(&priv->state_lock); + + /* might need to fix some features */ + netdev_update_features(priv->netdev); + +- return copy_to_user(ifr->ifr_data, &config, +- sizeof(config)) ? -EFAULT : 0; ++ return 0; + err_unlock: + mutex_unlock(&priv->state_lock); + return err; + } + +-int mlx5e_hwtstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr) ++static int mlx5e_hwtstamp_set_ndo(struct net_device *netdev, ++ struct kernel_hwtstamp_config *config, ++ struct netlink_ext_ack *extack) + { +- struct hwtstamp_config *cfg = &priv->hwtstamp_config; ++ struct mlx5e_priv *priv = netdev_priv(netdev); ++ ++ return mlx5e_hwtstamp_set(priv, config, extack); ++} + ++int mlx5e_hwtstamp_get(struct mlx5e_priv *priv, ++ struct kernel_hwtstamp_config *config) ++{ + if (!MLX5_CAP_GEN(priv->mdev, device_frequency_khz)) + return -EOPNOTSUPP; + +- return copy_to_user(ifr->ifr_data, cfg, sizeof(*cfg)) ? -EFAULT : 0; ++ *config = priv->hwtstamp_config; ++ ++ return 0; + } + +-static int mlx5e_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) ++static int mlx5e_hwtstamp_get_ndo(struct net_device *dev, ++ struct kernel_hwtstamp_config *config) + { + struct mlx5e_priv *priv = netdev_priv(dev); + +- switch (cmd) { +- case SIOCSHWTSTAMP: +- return mlx5e_hwtstamp_set(priv, ifr); +- case SIOCGHWTSTAMP: +- return mlx5e_hwtstamp_get(priv, ifr); +- default: +- return -EOPNOTSUPP; +- } ++ return mlx5e_hwtstamp_get(priv, config); + } + + #ifdef CONFIG_MLX5_ESWITCH +@@ -5268,13 +5272,14 @@ const struct net_device_ops mlx5e_netdev_ops = { + .ndo_set_features = mlx5e_set_features, + .ndo_fix_features = mlx5e_fix_features, + .ndo_change_mtu = mlx5e_change_nic_mtu, +- .ndo_eth_ioctl = mlx5e_ioctl, + .ndo_set_tx_maxrate = mlx5e_set_tx_maxrate, + .ndo_features_check = mlx5e_features_check, + .ndo_tx_timeout = mlx5e_tx_timeout, + .ndo_bpf = mlx5e_xdp, + .ndo_xdp_xmit = mlx5e_xdp_xmit, + .ndo_xsk_wakeup = mlx5e_xsk_wakeup, ++ .ndo_hwtstamp_get = mlx5e_hwtstamp_get_ndo, ++ .ndo_hwtstamp_set = mlx5e_hwtstamp_set_ndo, + #ifdef CONFIG_MLX5_EN_ARFS + .ndo_rx_flow_steer = mlx5e_rx_flow_steer, + #endif +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +index f3a249b59482..dd8daf5af7a6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +@@ -44,6 +44,23 @@ static int mlx5i_open(struct net_device *netdev); + static int mlx5i_close(struct net_device *netdev); + static int mlx5i_change_mtu(struct net_device *netdev, int new_mtu); + ++int mlx5i_hwtstamp_set(struct net_device *dev, ++ struct kernel_hwtstamp_config *config, ++ struct netlink_ext_ack *extack) ++{ ++ struct mlx5e_priv *epriv = mlx5i_epriv(dev); ++ ++ return mlx5e_hwtstamp_set(epriv, config, extack); ++} ++ ++int mlx5i_hwtstamp_get(struct net_device *dev, ++ struct kernel_hwtstamp_config *config) ++{ ++ struct mlx5e_priv *epriv = mlx5i_epriv(dev); ++ ++ return mlx5e_hwtstamp_get(epriv, config); ++} ++ + static const struct net_device_ops mlx5i_netdev_ops = { + .ndo_open = mlx5i_open, + .ndo_stop = mlx5i_close, +@@ -51,7 +68,8 @@ static const struct net_device_ops mlx5i_netdev_ops = { + .ndo_init = mlx5i_dev_init, + .ndo_uninit = mlx5i_dev_cleanup, + .ndo_change_mtu = mlx5i_change_mtu, +- .ndo_eth_ioctl = mlx5i_ioctl, ++ .ndo_hwtstamp_get = mlx5i_hwtstamp_get, ++ .ndo_hwtstamp_set = mlx5i_hwtstamp_set, + }; + + /* IPoIB mlx5 netdev profile */ +@@ -554,20 +572,6 @@ int mlx5i_dev_init(struct net_device *dev) + return 0; + } + +-int mlx5i_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) +-{ +- struct mlx5e_priv *priv = mlx5i_epriv(dev); +- +- switch (cmd) { +- case SIOCSHWTSTAMP: +- return mlx5e_hwtstamp_set(priv, ifr); +- case SIOCGHWTSTAMP: +- return mlx5e_hwtstamp_get(priv, ifr); +- default: +- return -EOPNOTSUPP; +- } +-} +- + void mlx5i_dev_cleanup(struct net_device *dev) + { + struct mlx5e_priv *priv = mlx5i_epriv(dev); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h +index 2ab6437a1c49..d67d5a72bb41 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h +@@ -88,7 +88,11 @@ struct net_device *mlx5i_pkey_get_netdev(struct net_device *netdev, u32 qpn); + /* Shared ndo functions */ + int mlx5i_dev_init(struct net_device *dev); + void mlx5i_dev_cleanup(struct net_device *dev); +-int mlx5i_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); ++int mlx5i_hwtstamp_set(struct net_device *dev, ++ struct kernel_hwtstamp_config *config, ++ struct netlink_ext_ack *extack); ++int mlx5i_hwtstamp_get(struct net_device *dev, ++ struct kernel_hwtstamp_config *config); + + /* Parent profile functions */ + int mlx5i_init(struct mlx5_core_dev *mdev, struct net_device *netdev); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c +index 028a76944d82..04444dad3a0d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c +@@ -140,7 +140,6 @@ static int mlx5i_pkey_close(struct net_device *netdev); + static int mlx5i_pkey_dev_init(struct net_device *dev); + static void mlx5i_pkey_dev_cleanup(struct net_device *netdev); + static int mlx5i_pkey_change_mtu(struct net_device *netdev, int new_mtu); +-static int mlx5i_pkey_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); + + static const struct net_device_ops mlx5i_pkey_netdev_ops = { + .ndo_open = mlx5i_pkey_open, +@@ -149,7 +148,8 @@ static const struct net_device_ops mlx5i_pkey_netdev_ops = { + .ndo_get_stats64 = mlx5i_get_stats, + .ndo_uninit = mlx5i_pkey_dev_cleanup, + .ndo_change_mtu = mlx5i_pkey_change_mtu, +- .ndo_eth_ioctl = mlx5i_pkey_ioctl, ++ .ndo_hwtstamp_get = mlx5i_hwtstamp_get, ++ .ndo_hwtstamp_set = mlx5i_hwtstamp_set, + }; + + /* Child NDOs */ +@@ -184,11 +184,6 @@ static int mlx5i_pkey_dev_init(struct net_device *dev) + return mlx5i_dev_init(dev); + } + +-static int mlx5i_pkey_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) +-{ +- return mlx5i_ioctl(dev, ifr, cmd); +-} +- + static void mlx5i_pkey_dev_cleanup(struct net_device *netdev) + { + mlx5i_parent_put(netdev); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1607-net-mlx5e-enhance-function-structures-for-self-loopback-prev.patch b/SOURCES/1607-net-mlx5e-enhance-function-structures-for-self-loopback-prev.patch new file mode 100644 index 000000000..2d9e52e8d --- /dev/null +++ b/SOURCES/1607-net-mlx5e-enhance-function-structures-for-self-loopback-prev.patch @@ -0,0 +1,136 @@ +From 3c2408d24b13ce27a3265d9cf94bac7817af07f0 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5e: Enhance function structures for self loopback + prevention application + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 091400a5d411ee7398095ba832361eb12b345f3d +Author: Tariq Toukan +Date: Thu Oct 30 15:32:33 2025 +0200 + + net/mlx5e: Enhance function structures for self loopback prevention application + + The re-application of self loopback prevention attributes in TIRs is + necessary in old firmwares (where tis_tir_td_order cap is cleared) after + recreation of SQs. + + However, this is not needed in new firmware with tis_tir_td_order=1. + + As a preparation patch, enhance the function structures to differentiate + between an explicit loopback prevention configuration apply, and the + re-apply operation required by old firmware. + + Loopback selftests should now call mlx5e_modify_tirs_lb() directly, as + their use case is not related to the firmware limitation. + + Signed-off-by: Tariq Toukan + Reviewed-by: Carolina Jubran + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761831159-1013140-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 208870fe79c1..85f940869968 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -1152,7 +1152,9 @@ extern const struct ethtool_ops mlx5e_ethtool_ops; + int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, u32 *mkey); + int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises); + void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev); +-int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb, ++int mlx5e_modify_tirs_lb(struct mlx5_core_dev *mdev, bool enable_uc_lb, ++ bool enable_mc_lb); ++int mlx5e_refresh_tirs(struct mlx5_core_dev *mdev, bool enable_uc_lb, + bool enable_mc_lb); + void mlx5e_mkey_set_relaxed_ordering(struct mlx5_core_dev *mdev, void *mkc); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +index 30424ccad584..376a018b2db1 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +@@ -247,10 +247,9 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) + memset(res, 0, sizeof(*res)); + } + +-int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb, +- bool enable_mc_lb) ++int mlx5e_modify_tirs_lb(struct mlx5_core_dev *mdev, bool enable_uc_lb, ++ bool enable_mc_lb) + { +- struct mlx5_core_dev *mdev = priv->mdev; + struct mlx5e_tir *tir; + u8 lb_flags = 0; + int err = 0; +@@ -285,7 +284,16 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb, + + kvfree(in); + if (err) +- netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", tirn, err); ++ mlx5_core_err(mdev, ++ "modify tir(0x%x) enable_lb uc(%d) mc(%d) failed, %d\n", ++ tirn, ++ enable_uc_lb, enable_mc_lb, err); + + return err; + } ++ ++int mlx5e_refresh_tirs(struct mlx5_core_dev *mdev, bool enable_uc_lb, ++ bool enable_mc_lb) ++{ ++ return mlx5e_modify_tirs_lb(mdev, enable_uc_lb, enable_mc_lb); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 2390975c1252..b08aa2c7c837 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -6089,7 +6089,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv) + + static int mlx5e_update_nic_rx(struct mlx5e_priv *priv) + { +- return mlx5e_refresh_tirs(priv, false, false); ++ return mlx5e_refresh_tirs(priv->mdev, false, false); + } + + static const struct mlx5e_profile mlx5e_nic_profile = { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +index 2f7a543feca6..fcad464bc4d5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +@@ -214,7 +214,7 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv *priv, + return err; + } + +- err = mlx5e_refresh_tirs(priv, true, false); ++ err = mlx5e_modify_tirs_lb(priv->mdev, true, false); + if (err) + goto out; + +@@ -243,7 +243,7 @@ static void mlx5e_test_loopback_cleanup(struct mlx5e_priv *priv, + mlx5_nic_vport_update_local_lb(priv->mdev, false); + + dev_remove_pack(&lbtp->pt); +- mlx5e_refresh_tirs(priv, false, false); ++ mlx5e_modify_tirs_lb(priv->mdev, false, false); + } + + static int mlx5e_cond_loopback(struct mlx5e_priv *priv) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +index dd8daf5af7a6..a5ff11922d8d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +@@ -331,7 +331,7 @@ void mlx5i_destroy_underlay_qp(struct mlx5_core_dev *mdev, u32 qpn) + + int mlx5i_update_nic_rx(struct mlx5e_priv *priv) + { +- return mlx5e_refresh_tirs(priv, true, true); ++ return mlx5e_refresh_tirs(priv->mdev, true, true); + } + + int mlx5i_create_tis(struct mlx5_core_dev *mdev, u32 underlay_qpn, u32 *tisn) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1608-net-mlx5e-use-tir-api-in-mlx5e-modify-tirs-lb.patch b/SOURCES/1608-net-mlx5e-use-tir-api-in-mlx5e-modify-tirs-lb.patch new file mode 100644 index 000000000..e00c6c310 --- /dev/null +++ b/SOURCES/1608-net-mlx5e-use-tir-api-in-mlx5e-modify-tirs-lb.patch @@ -0,0 +1,152 @@ +From 4ad5e38a3dab61e3f8cfceed6c47b4cd4f8e976f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:34 -0400 +Subject: [PATCH] net/mlx5e: Use TIR API in mlx5e_modify_tirs_lb() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5c51a86122b20326229c6c9dff4a92c186cbb6bf +Author: Tariq Toukan +Date: Thu Oct 30 15:32:34 2025 +0200 + + net/mlx5e: Use TIR API in mlx5e_modify_tirs_lb() + + Extend the TIR API and use it in mlx5e_modify_tirs_lb() instead of the + explicit modify_tir code. + + Signed-off-by: Tariq Toukan + Reviewed-by: Carolina Jubran + Reviewed-by: Dragos Tatulea + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761831159-1013140-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c +index 19499072f67f..0b55e77f19c8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c +@@ -146,6 +146,31 @@ void mlx5e_tir_builder_build_direct(struct mlx5e_tir_builder *builder) + MLX5_SET(tirc, tirc, rx_hash_fn, MLX5_RX_HASH_FN_INVERTED_XOR8); + } + ++static void mlx5e_tir_context_self_lb_block(void *tirc, bool enable_uc_lb, ++ bool enable_mc_lb) ++{ ++ u8 lb_flags = 0; ++ ++ if (enable_uc_lb) ++ lb_flags = MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST; ++ if (enable_mc_lb) ++ lb_flags |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST; ++ ++ MLX5_SET(tirc, tirc, self_lb_block, lb_flags); ++} ++ ++void mlx5e_tir_builder_build_self_lb_block(struct mlx5e_tir_builder *builder, ++ bool enable_uc_lb, ++ bool enable_mc_lb) ++{ ++ void *tirc = mlx5e_tir_builder_get_tirc(builder); ++ ++ if (builder->modify) ++ MLX5_SET(modify_tir_in, builder->in, bitmask.self_lb_en, 1); ++ ++ mlx5e_tir_context_self_lb_block(tirc, enable_uc_lb, enable_mc_lb); ++} ++ + void mlx5e_tir_builder_build_tls(struct mlx5e_tir_builder *builder) + { + void *tirc = mlx5e_tir_builder_get_tirc(builder); +@@ -153,9 +178,7 @@ void mlx5e_tir_builder_build_tls(struct mlx5e_tir_builder *builder) + WARN_ON(builder->modify); + + MLX5_SET(tirc, tirc, tls_en, 1); +- MLX5_SET(tirc, tirc, self_lb_block, +- MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST | +- MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST); ++ mlx5e_tir_context_self_lb_block(tirc, true, true); + } + + int mlx5e_tir_init(struct mlx5e_tir *tir, struct mlx5e_tir_builder *builder, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tir.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tir.h +index e8df3aaf6562..958eeb959a19 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tir.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tir.h +@@ -35,6 +35,9 @@ void mlx5e_tir_builder_build_rss(struct mlx5e_tir_builder *builder, + const struct mlx5e_rss_params_traffic_type *rss_tt, + bool inner); + void mlx5e_tir_builder_build_direct(struct mlx5e_tir_builder *builder); ++void mlx5e_tir_builder_build_self_lb_block(struct mlx5e_tir_builder *builder, ++ bool enable_uc_lb, ++ bool enable_mc_lb); + void mlx5e_tir_builder_build_tls(struct mlx5e_tir_builder *builder); + + struct mlx5_core_dev; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +index 376a018b2db1..022a0cf7063c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +@@ -250,44 +250,31 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) + int mlx5e_modify_tirs_lb(struct mlx5_core_dev *mdev, bool enable_uc_lb, + bool enable_mc_lb) + { ++ struct mlx5e_tir_builder *builder; + struct mlx5e_tir *tir; +- u8 lb_flags = 0; +- int err = 0; +- u32 tirn = 0; +- int inlen; +- void *in; ++ int err = 0; + +- inlen = MLX5_ST_SZ_BYTES(modify_tir_in); +- in = kvzalloc(inlen, GFP_KERNEL); +- if (!in) ++ builder = mlx5e_tir_builder_alloc(true); ++ if (!builder) + return -ENOMEM; + +- if (enable_uc_lb) +- lb_flags = MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST; +- +- if (enable_mc_lb) +- lb_flags |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST; +- +- if (lb_flags) +- MLX5_SET(modify_tir_in, in, ctx.self_lb_block, lb_flags); +- +- MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1); ++ mlx5e_tir_builder_build_self_lb_block(builder, enable_uc_lb, ++ enable_mc_lb); + + mutex_lock(&mdev->mlx5e_res.hw_objs.td.list_lock); + list_for_each_entry(tir, &mdev->mlx5e_res.hw_objs.td.tirs_list, list) { +- tirn = tir->tirn; +- err = mlx5_core_modify_tir(mdev, tirn, in); +- if (err) ++ err = mlx5e_tir_modify(tir, builder); ++ if (err) { ++ mlx5_core_err(mdev, ++ "modify tir(0x%x) enable_lb uc(%d) mc(%d) failed, %d\n", ++ mlx5e_tir_get_tirn(tir), ++ enable_uc_lb, enable_mc_lb, err); + break; ++ } + } + mutex_unlock(&mdev->mlx5e_res.hw_objs.td.list_lock); + +- kvfree(in); +- if (err) +- mlx5_core_err(mdev, +- "modify tir(0x%x) enable_lb uc(%d) mc(%d) failed, %d\n", +- tirn, +- enable_uc_lb, enable_mc_lb, err); ++ mlx5e_tir_builder_free(builder); + + return err; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1609-net-mlx5e-allow-setting-self-loopback-prevention-bits-on-tir.patch b/SOURCES/1609-net-mlx5e-allow-setting-self-loopback-prevention-bits-on-tir.patch new file mode 100644 index 000000000..cddfcfd1d --- /dev/null +++ b/SOURCES/1609-net-mlx5e-allow-setting-self-loopback-prevention-bits-on-tir.patch @@ -0,0 +1,107 @@ +From 7adf722efde27ea91e0879133117958641da4dbe Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:35 -0400 +Subject: [PATCH] net/mlx5e: Allow setting self loopback prevention bits on TIR + init + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 99b002018f6a3dc08c789e2962070d6de7cb3bac +Author: Tariq Toukan +Date: Thu Oct 30 15:32:35 2025 +0200 + + net/mlx5e: Allow setting self loopback prevention bits on TIR init + + Until now, IPoIB was creating TIRs without setting self loopback + prevention, then modifying them in activation stage. + + This is a preparation patch, that will be used by IPoIB to init TIRs + properly without the need for following calls of modify_tir. + + Signed-off-by: Tariq Toukan + Reviewed-by: Carolina Jubran + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761831159-1013140-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +index c96cbc4b0dbf..88b0e1050d1a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c +@@ -231,6 +231,8 @@ mlx5e_rss_create_tir(struct mlx5e_rss *rss, enum mlx5_traffic_types tt, + rqtn, rss_inner); + mlx5e_tir_builder_build_packet_merge(builder, pkt_merge_param); + rss_tt = mlx5e_rss_get_tt_config(rss, tt); ++ mlx5e_tir_builder_build_self_lb_block(builder, rss->params.self_lb_blk, ++ rss->params.self_lb_blk); + mlx5e_tir_builder_build_rss(builder, &rss->hash, &rss_tt, inner); + + err = mlx5e_tir_init(tir, builder, rss->mdev, true); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +index 5fb03cd0a411..17664757a561 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h +@@ -23,6 +23,7 @@ struct mlx5e_rss_init_params { + struct mlx5e_rss_params { + bool inner_ft_support; + u32 drop_rqn; ++ bool self_lb_blk; + }; + + struct mlx5e_rss_params_traffic_type +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +index ac26a32845d0..55c117b7d8c4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c +@@ -71,6 +71,8 @@ static int mlx5e_rx_res_rss_init_def(struct mlx5e_rx_res *res, + rss_params = (struct mlx5e_rss_params) { + .inner_ft_support = inner_ft_support, + .drop_rqn = res->drop_rqn, ++ .self_lb_blk = ++ res->features & MLX5E_RX_RES_FEATURE_SELF_LB_BLOCK, + }; + + rss = mlx5e_rss_init(res->mdev, &rss_params, &init_params); +@@ -104,6 +106,8 @@ int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 rss_idx, unsigned int in + rss_params = (struct mlx5e_rss_params) { + .inner_ft_support = inner_ft_support, + .drop_rqn = res->drop_rqn, ++ .self_lb_blk = ++ res->features & MLX5E_RX_RES_FEATURE_SELF_LB_BLOCK, + }; + + rss = mlx5e_rss_init(res->mdev, &rss_params, &init_params); +@@ -346,6 +350,7 @@ static struct mlx5e_rx_res *mlx5e_rx_res_alloc(struct mlx5_core_dev *mdev, unsig + static int mlx5e_rx_res_channels_init(struct mlx5e_rx_res *res) + { + bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT; ++ bool self_lb_blk = res->features & MLX5E_RX_RES_FEATURE_SELF_LB_BLOCK; + struct mlx5e_tir_builder *builder; + int err = 0; + int ix; +@@ -376,6 +381,8 @@ static int mlx5e_rx_res_channels_init(struct mlx5e_rx_res *res) + mlx5e_rqt_get_rqtn(&res->channels[ix].direct_rqt), + inner_ft_support); + mlx5e_tir_builder_build_packet_merge(builder, &res->pkt_merge_param); ++ mlx5e_tir_builder_build_self_lb_block(builder, self_lb_blk, ++ self_lb_blk); + mlx5e_tir_builder_build_direct(builder); + + err = mlx5e_tir_init(&res->channels[ix].direct_tir, builder, res->mdev, true); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h +index 65a857c215e1..675780120a20 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h +@@ -21,6 +21,7 @@ enum mlx5e_rx_res_features { + MLX5E_RX_RES_FEATURE_INNER_FT = BIT(0), + MLX5E_RX_RES_FEATURE_PTP = BIT(1), + MLX5E_RX_RES_FEATURE_MULTI_VHCA = BIT(2), ++ MLX5E_RX_RES_FEATURE_SELF_LB_BLOCK = BIT(3), + }; + + /* Setup */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1610-net-mlx5-ipoib-set-self-loopback-prevention-in-tir-init.patch b/SOURCES/1610-net-mlx5-ipoib-set-self-loopback-prevention-in-tir-init.patch new file mode 100644 index 000000000..58b371218 --- /dev/null +++ b/SOURCES/1610-net-mlx5-ipoib-set-self-loopback-prevention-in-tir-init.patch @@ -0,0 +1,55 @@ +From e1bddea2eaecdc2d6f4985f941f1af987197ba1d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:35 -0400 +Subject: [PATCH] net/mlx5: IPoIB, set self loopback prevention in TIR init + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a4c81e72f132b93a3b920196621a7b78c71fb7fc +Author: Tariq Toukan +Date: Thu Oct 30 15:32:36 2025 +0200 + + net/mlx5: IPoIB, set self loopback prevention in TIR init + + In IPoIB, the self loopback prevention configuration apply in activation + stage has two roles: fulfill a firmware requirement for old firmware + (tis_tir_td_order=0), and update the proper configuration as it was not + set in init. + + Here we set the proper configuration in init, to allow skipping the + modify_tirs commands on new firmware in a downstream patch. + + Signed-off-by: Tariq Toukan + Reviewed-by: Carolina Jubran + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761831159-1013140-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +index a5ff11922d8d..22037785d112 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +@@ -424,6 +424,7 @@ static void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv) + static int mlx5i_init_rx(struct mlx5e_priv *priv) + { + struct mlx5_core_dev *mdev = priv->mdev; ++ enum mlx5e_rx_res_features features; + int err; + + priv->fs = mlx5e_fs_init(priv->profile, mdev, +@@ -442,7 +443,9 @@ static int mlx5i_init_rx(struct mlx5e_priv *priv) + goto err_destroy_q_counters; + } + +- priv->rx_res = mlx5e_rx_res_create(priv->mdev, 0, priv->max_nch, priv->drop_rq.rqn, ++ features = MLX5E_RX_RES_FEATURE_SELF_LB_BLOCK; ++ priv->rx_res = mlx5e_rx_res_create(priv->mdev, features, priv->max_nch, ++ priv->drop_rq.rqn, + &priv->channels.params.packet_merge, + priv->channels.params.num_channels); + if (IS_ERR(priv->rx_res)) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1611-net-mlx5e-do-not-re-apply-tir-loopback-configuration-if-not-.patch b/SOURCES/1611-net-mlx5e-do-not-re-apply-tir-loopback-configuration-if-not-.patch new file mode 100644 index 000000000..78550e6e0 --- /dev/null +++ b/SOURCES/1611-net-mlx5e-do-not-re-apply-tir-loopback-configuration-if-not-.patch @@ -0,0 +1,52 @@ +From 083b2b315daab48969e31a6226515df7ef33d1b2 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:35 -0400 +Subject: [PATCH] net/mlx5e: Do not re-apply TIR loopback configuration if not + necessary + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 477c352adda4ba0bd80c945ab13165161802239e +Author: Tariq Toukan +Date: Thu Oct 30 15:32:37 2025 +0200 + + net/mlx5e: Do not re-apply TIR loopback configuration if not necessary + + On old firmware, (tis_tir_td_order=0), TIR of a transport domain should + either be created after all SQs of the same domain, or TIR.self_lb_en + should be reapplied using MODIFY_TIR, for self loopback filtering to + function correctly. + + This is not necessary anymnore on new FW (tis_tir_td_order=1), thus + there's no need for calling modify_tir operations after creating a new + set of SQs to maintain the self loopback prevention functional. + + Skip these operations. + + This saves O(max_num_channels) MODIFY_TIR firmware commands in + operations like interface up or channels configuration change. + + Signed-off-by: Tariq Toukan + Reviewed-by: Carolina Jubran + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761831159-1013140-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +index 022a0cf7063c..5a2ac7b6f260 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +@@ -282,5 +282,8 @@ int mlx5e_modify_tirs_lb(struct mlx5_core_dev *mdev, bool enable_uc_lb, + int mlx5e_refresh_tirs(struct mlx5_core_dev *mdev, bool enable_uc_lb, + bool enable_mc_lb) + { ++ if (MLX5_CAP_GEN(mdev, tis_tir_td_order)) ++ return 0; /* refresh not needed */ ++ + return mlx5e_modify_tirs_lb(mdev, enable_uc_lb, enable_mc_lb); + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1612-net-mlx5e-pass-old-channels-as-argument-to-mlx5e-switch-priv.patch b/SOURCES/1612-net-mlx5e-pass-old-channels-as-argument-to-mlx5e-switch-priv.patch new file mode 100644 index 000000000..64a7261ea --- /dev/null +++ b/SOURCES/1612-net-mlx5e-pass-old-channels-as-argument-to-mlx5e-switch-priv.patch @@ -0,0 +1,122 @@ +From 7cb440aa7ae915a4b220ce7984cc03306e39c7c6 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:35 -0400 +Subject: [PATCH] net/mlx5e: Pass old channels as argument to + mlx5e_switch_priv_channels + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 911e3a37b024163d8329e3560d6fd5f0f0da2558 +Author: Tariq Toukan +Date: Thu Oct 30 15:32:38 2025 +0200 + + net/mlx5e: Pass old channels as argument to mlx5e_switch_priv_channels + + Let the caller function mlx5e_safe_switch_params() maintain a copy + of the old channels, and pass it to mlx5e_switch_priv_channels(). + + This is in preparation for the next patch. + + Signed-off-by: Tariq Toukan + Reviewed-by: Carolina Jubran + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761831159-1013140-7-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index b08aa2c7c837..223cbc800144 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -3341,12 +3341,12 @@ static int mlx5e_switch_priv_params(struct mlx5e_priv *priv, + } + + static int mlx5e_switch_priv_channels(struct mlx5e_priv *priv, ++ struct mlx5e_channels *old_chs, + struct mlx5e_channels *new_chs, + mlx5e_fp_preactivate preactivate, + void *context) + { + struct net_device *netdev = priv->netdev; +- struct mlx5e_channels old_chs; + int carrier_ok; + int err = 0; + +@@ -3355,7 +3355,6 @@ static int mlx5e_switch_priv_channels(struct mlx5e_priv *priv, + + mlx5e_deactivate_priv_channels(priv); + +- old_chs = priv->channels; + priv->channels = *new_chs; + + /* New channels are ready to roll, call the preactivate hook if needed +@@ -3364,12 +3363,12 @@ static int mlx5e_switch_priv_channels(struct mlx5e_priv *priv, + if (preactivate) { + err = preactivate(priv, context); + if (err) { +- priv->channels = old_chs; ++ priv->channels = *old_chs; + goto out; + } + } + +- mlx5e_close_channels(&old_chs); ++ mlx5e_close_channels(old_chs); + priv->profile->update_rx(priv); + + mlx5e_selq_apply(&priv->selq); +@@ -3388,16 +3387,20 @@ int mlx5e_safe_switch_params(struct mlx5e_priv *priv, + mlx5e_fp_preactivate preactivate, + void *context, bool reset) + { +- struct mlx5e_channels *new_chs; ++ struct mlx5e_channels *old_chs, *new_chs; + int err; + + reset &= test_bit(MLX5E_STATE_OPENED, &priv->state); + if (!reset) + return mlx5e_switch_priv_params(priv, params, preactivate, context); + ++ old_chs = kzalloc(sizeof(*old_chs), GFP_KERNEL); + new_chs = kzalloc(sizeof(*new_chs), GFP_KERNEL); +- if (!new_chs) +- return -ENOMEM; ++ if (!old_chs || !new_chs) { ++ err = -ENOMEM; ++ goto err_free_chs; ++ } ++ + new_chs->params = *params; + + mlx5e_selq_prepare_params(&priv->selq, &new_chs->params); +@@ -3406,11 +3409,15 @@ int mlx5e_safe_switch_params(struct mlx5e_priv *priv, + if (err) + goto err_cancel_selq; + +- err = mlx5e_switch_priv_channels(priv, new_chs, preactivate, context); ++ *old_chs = priv->channels; ++ ++ err = mlx5e_switch_priv_channels(priv, old_chs, new_chs, ++ preactivate, context); + if (err) + goto err_close; + + kfree(new_chs); ++ kfree(old_chs); + return 0; + + err_close: +@@ -3418,7 +3425,9 @@ int mlx5e_safe_switch_params(struct mlx5e_priv *priv, + + err_cancel_selq: + mlx5e_selq_cancel(&priv->selq); ++err_free_chs: + kfree(new_chs); ++ kfree(old_chs); + return err; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1613-net-mlx5e-defer-channels-closure-to-reduce-interface-down-ti.patch b/SOURCES/1613-net-mlx5e-defer-channels-closure-to-reduce-interface-down-ti.patch new file mode 100644 index 000000000..60ffd57b1 --- /dev/null +++ b/SOURCES/1613-net-mlx5e-defer-channels-closure-to-reduce-interface-down-ti.patch @@ -0,0 +1,67 @@ +From 9628d012855f087f2800246493e00999c0dd58ea Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:56:35 -0400 +Subject: [PATCH] net/mlx5e: Defer channels closure to reduce interface down + time + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3b88a535a8e10d83335f04c60aafbdfd37146a01 +Author: Tariq Toukan +Date: Thu Oct 30 15:32:39 2025 +0200 + + net/mlx5e: Defer channels closure to reduce interface down time + + Cap bit tis_tir_td_order=1 indicates that an old firmware requirement / + limitation no longer exists. When unset, the latency of several firmware + commands significantly increases with the presence of high number of + co-existing channels (both old and new sets). Hence, we used to close + unneeded old channels before invoking those firmware commands. + + Today, on capable devices, this is no longer the case. Minimize the + interface down time by deferring the old channels closure, after the + activation of the new ones. + + Perf numbers: + Measured the number of dropped packets in a simple ping flood test, + during a configuration change operation, that switches the number of + channels from 247 to 248. + + Before: 71 packets lost + After: 15 packets lost, ~80% saving. + + Signed-off-by: Tariq Toukan + Reviewed-by: Carolina Jubran + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1761831159-1013140-8-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 223cbc800144..261b96e41d7e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -3368,7 +3368,8 @@ static int mlx5e_switch_priv_channels(struct mlx5e_priv *priv, + } + } + +- mlx5e_close_channels(old_chs); ++ if (!MLX5_CAP_GEN(priv->mdev, tis_tir_td_order)) ++ mlx5e_close_channels(old_chs); + priv->profile->update_rx(priv); + + mlx5e_selq_apply(&priv->selq); +@@ -3416,6 +3417,9 @@ int mlx5e_safe_switch_params(struct mlx5e_priv *priv, + if (err) + goto err_close; + ++ if (MLX5_CAP_GEN(priv->mdev, tis_tir_td_order)) ++ mlx5e_close_channels(old_chs); ++ + kfree(new_chs); + kfree(old_chs); + return 0; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1614-pci-tph-expose-pcie-tph-get-st-table-loc.patch b/SOURCES/1614-pci-tph-expose-pcie-tph-get-st-table-loc.patch new file mode 100644 index 000000000..3807af8e7 --- /dev/null +++ b/SOURCES/1614-pci-tph-expose-pcie-tph-get-st-table-loc.patch @@ -0,0 +1,87 @@ +From 9a458f4698a1ae08042ed9332d3246964288e31f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 20:36:54 -0400 +Subject: [PATCH] PCI/TPH: Expose pcie_tph_get_st_table_loc() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7b8a8ec20cfce2298f6737089f5d17407ea346b4 +Author: Yishai Hadas +Date: Mon Oct 27 11:34:01 2025 +0200 + + PCI/TPH: Expose pcie_tph_get_st_table_loc() + + Expose pcie_tph_get_st_table_loc() to be used by drivers as will be done + in the next patch from the series. + + Signed-off-by: Yishai Hadas + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251027-st-direct-mode-v1-1-e0ad953866b6@nvidia.com + Acked-by: Bjorn Helgaas + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c +index 6822c2e3f93f..47a160b399f9 100644 +--- a/drivers/pci/tph.c ++++ b/drivers/pci/tph.c +@@ -155,7 +155,16 @@ static u8 get_st_modes(struct pci_dev *pdev) + return reg; + } + +-static u32 get_st_table_loc(struct pci_dev *pdev) ++/** ++ * pcie_tph_get_st_table_loc - Return the device's ST table location ++ * @pdev: PCI device to query ++ * ++ * Return: ++ * PCI_TPH_LOC_NONE - Not present ++ * PCI_TPH_LOC_CAP - Located in the TPH Requester Extended Capability ++ * PCI_TPH_LOC_MSIX - Located in the MSI-X Table ++ */ ++u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev) + { + u32 reg; + +@@ -163,6 +172,7 @@ static u32 get_st_table_loc(struct pci_dev *pdev) + + return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg); + } ++EXPORT_SYMBOL(pcie_tph_get_st_table_loc); + + /* + * Return the size of ST table. If ST table is not in TPH Requester Extended +@@ -174,7 +184,7 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev) + u32 loc; + + /* Check ST table location first */ +- loc = get_st_table_loc(pdev); ++ loc = pcie_tph_get_st_table_loc(pdev); + + /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */ + loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc); +@@ -341,7 +351,7 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag) + */ + set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE); + +- loc = get_st_table_loc(pdev); ++ loc = pcie_tph_get_st_table_loc(pdev); + /* Convert loc to match with PCI_TPH_LOC_* */ + loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc); + +diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h +index 9e4e331b1603..ba28140ce670 100644 +--- a/include/linux/pci-tph.h ++++ b/include/linux/pci-tph.h +@@ -29,6 +29,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev, + void pcie_disable_tph(struct pci_dev *pdev); + int pcie_enable_tph(struct pci_dev *pdev, int mode); + u16 pcie_tph_get_st_table_size(struct pci_dev *pdev); ++u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev); + #else + static inline int pcie_tph_set_st_entry(struct pci_dev *pdev, + unsigned int index, u16 tag) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1615-net-mlx5-add-direct-st-mode-support-for-rdma.patch b/SOURCES/1615-net-mlx5-add-direct-st-mode-support-for-rdma.patch new file mode 100644 index 000000000..da2bb3f19 --- /dev/null +++ b/SOURCES/1615-net-mlx5-add-direct-st-mode-support-for-rdma.patch @@ -0,0 +1,107 @@ +From caf19e68e6d7d37e7aabf04ad3beea4462451166 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:28 -0400 +Subject: [PATCH] net/mlx5: Add direct ST mode support for RDMA + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2d838c11e10e9169cae4f7778345c11b5447ef05 +Author: Yishai Hadas +Date: Mon Oct 27 11:34:02 2025 +0200 + + net/mlx5: Add direct ST mode support for RDMA + + Add support for direct ST mode where ST Table Location equals + PCI_TPH_LOC_NONE. + + In that case, no steering table exists, the steering tag itself will be + used directly by the SW, FW, HW from the mkey. + + This enables RDMA users to use the current exposed APIs to work in + direct mode. + + Signed-off-by: Yishai Hadas + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251027-st-direct-mode-v1-2-e0ad953866b6@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c +index 47fe215f66bf..ef06fe6cbb51 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c +@@ -19,13 +19,16 @@ struct mlx5_st { + struct mutex lock; + struct xa_limit index_limit; + struct xarray idx_xa; /* key == index, value == struct mlx5_st_idx_data */ ++ u8 direct_mode : 1; + }; + + struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev) + { + struct pci_dev *pdev = dev->pdev; + struct mlx5_st *st; ++ u8 direct_mode = 0; + u16 num_entries; ++ u32 tbl_loc; + int ret; + + if (!MLX5_CAP_GEN(dev, mkey_pcie_tph)) +@@ -40,10 +43,16 @@ struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev) + if (!pdev->tph_cap) + return NULL; + +- num_entries = pcie_tph_get_st_table_size(pdev); +- /* We need a reserved entry for non TPH cases */ +- if (num_entries < 2) +- return NULL; ++ tbl_loc = pcie_tph_get_st_table_loc(pdev); ++ if (tbl_loc == PCI_TPH_LOC_NONE) ++ direct_mode = 1; ++ ++ if (!direct_mode) { ++ num_entries = pcie_tph_get_st_table_size(pdev); ++ /* We need a reserved entry for non TPH cases */ ++ if (num_entries < 2) ++ return NULL; ++ } + + /* The OS doesn't support ST */ + ret = pcie_enable_tph(pdev, PCI_TPH_ST_DS_MODE); +@@ -56,6 +65,10 @@ struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev) + + mutex_init(&st->lock); + xa_init_flags(&st->idx_xa, XA_FLAGS_ALLOC); ++ st->direct_mode = direct_mode; ++ if (st->direct_mode) ++ return st; ++ + /* entry 0 is reserved for non TPH cases */ + st->index_limit.min = MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX + 1; + st->index_limit.max = num_entries - 1; +@@ -96,6 +109,11 @@ int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, + if (ret) + return ret; + ++ if (st->direct_mode) { ++ *st_index = tag; ++ return 0; ++ } ++ + mutex_lock(&st->lock); + + xa_for_each(&st->idx_xa, index, idx_data) { +@@ -145,6 +163,9 @@ int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index) + if (!st) + return -EOPNOTSUPP; + ++ if (st->direct_mode) ++ return 0; ++ + mutex_lock(&st->lock); + idx_data = xa_load(&st->idx_xa, st_index); + if (WARN_ON_ONCE(!idx_data)) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1616-net-mlx5-add-other-eswitch-hw-capabilities.patch b/SOURCES/1616-net-mlx5-add-other-eswitch-hw-capabilities.patch new file mode 100644 index 000000000..0ee7bae3d --- /dev/null +++ b/SOURCES/1616-net-mlx5-add-other-eswitch-hw-capabilities.patch @@ -0,0 +1,173 @@ +From 7fc60da49f5667617b1c008424e771a0ed46da17 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:28 -0400 +Subject: [PATCH] net/mlx5: Add OTHER_ESWITCH HW capabilities + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6948417b3f1fafbeab85c051f8dba5e305a8f9c4 +Author: Patrisious Haddad +Date: Wed Oct 29 17:42:53 2025 +0200 + + net/mlx5: Add OTHER_ESWITCH HW capabilities + + Add OTHER_ESWITCH capabilities which includes other_eswitch and + eswitch_owner_vhca_id to all steering objects. + + Signed-off-by: Patrisious Haddad + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-1-98bb707b5d57@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index 2207404a125c..53cba23d1fcb 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -5250,13 +5250,15 @@ struct mlx5_ifc_set_fte_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + + u8 reserved_at_60[0x20]; + + u8 table_type[0x8]; +- u8 reserved_at_88[0x18]; ++ u8 reserved_at_88[0x8]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 reserved_at_a0[0x8]; + u8 table_id[0x18]; +@@ -8808,13 +8810,15 @@ struct mlx5_ifc_destroy_flow_table_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + + u8 reserved_at_60[0x20]; + + u8 table_type[0x8]; +- u8 reserved_at_88[0x18]; ++ u8 reserved_at_88[0x8]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 reserved_at_a0[0x8]; + u8 table_id[0x18]; +@@ -8839,13 +8843,15 @@ struct mlx5_ifc_destroy_flow_group_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + + u8 reserved_at_60[0x20]; + + u8 table_type[0x8]; +- u8 reserved_at_88[0x18]; ++ u8 reserved_at_88[0x8]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 reserved_at_a0[0x8]; + u8 table_id[0x18]; +@@ -8984,13 +8990,15 @@ struct mlx5_ifc_delete_fte_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + + u8 reserved_at_60[0x20]; + + u8 table_type[0x8]; +- u8 reserved_at_88[0x18]; ++ u8 reserved_at_88[0x8]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 reserved_at_a0[0x8]; + u8 table_id[0x18]; +@@ -9534,13 +9542,15 @@ struct mlx5_ifc_create_flow_table_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + + u8 reserved_at_60[0x20]; + + u8 table_type[0x8]; +- u8 reserved_at_88[0x18]; ++ u8 reserved_at_88[0x8]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 reserved_at_a0[0x20]; + +@@ -9579,7 +9589,8 @@ struct mlx5_ifc_create_flow_group_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + + u8 reserved_at_60[0x20]; +@@ -9587,7 +9598,7 @@ struct mlx5_ifc_create_flow_group_in_bits { + u8 table_type[0x8]; + u8 reserved_at_88[0x4]; + u8 group_type[0x4]; +- u8 reserved_at_90[0x10]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 reserved_at_a0[0x8]; + u8 table_id[0x18]; +@@ -11877,10 +11888,12 @@ struct mlx5_ifc_set_flow_table_root_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + +- u8 reserved_at_60[0x20]; ++ u8 reserved_at_60[0x10]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 table_type[0x8]; + u8 reserved_at_88[0x7]; +@@ -11920,14 +11933,16 @@ struct mlx5_ifc_modify_flow_table_in_bits { + u8 op_mod[0x10]; + + u8 other_vport[0x1]; +- u8 reserved_at_41[0xf]; ++ u8 other_eswitch[0x1]; ++ u8 reserved_at_42[0xe]; + u8 vport_number[0x10]; + + u8 reserved_at_60[0x10]; + u8 modify_field_select[0x10]; + + u8 table_type[0x8]; +- u8 reserved_at_88[0x18]; ++ u8 reserved_at_88[0x8]; ++ u8 eswitch_owner_vhca_id[0x10]; + + u8 reserved_at_a0[0x8]; + u8 table_id[0x18]; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1617-net-mlx5-fs-add-other-eswitch-support-for-steering-tables.patch b/SOURCES/1617-net-mlx5-fs-add-other-eswitch-support-for-steering-tables.patch new file mode 100644 index 000000000..1ee35a745 --- /dev/null +++ b/SOURCES/1617-net-mlx5-fs-add-other-eswitch-support-for-steering-tables.patch @@ -0,0 +1,203 @@ +From ca872d4a1b466fec01b471e20d13d29a18f41e84 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:28 -0400 +Subject: [PATCH] net/mlx5: fs, Add other_eswitch support for steering tables + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3b848dec7e821bace785b9e405bf1884c077635a +Author: Patrisious Haddad +Date: Wed Oct 29 17:42:54 2025 +0200 + + net/mlx5: fs, Add other_eswitch support for steering tables + + Add other_eswitch support which allows flow tables creation above vports + that reside on different esw managers. + + The new flag MLX5_FLOW_TABLE_OTHER_ESWITCH indicates if the + esw_owner_vhca_id attribute is supported. + + Note that this is only supported if the Advanced-RDMA cap- + rdma_transport_manager_other_eswitch is set. + And it is the caller responsibility to check that. + + Signed-off-by: Patrisious Haddad + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-2-98bb707b5d57@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +index 1af76da8b132..ced747bef641 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +@@ -239,6 +239,10 @@ static int mlx5_cmd_update_root_ft(struct mlx5_flow_root_namespace *ns, + MLX5_SET(set_flow_table_root_in, in, vport_number, ft->vport); + MLX5_SET(set_flow_table_root_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(set_flow_table_root_in, in, eswitch_owner_vhca_id, ++ ft->esw_owner_vhca_id); ++ MLX5_SET(set_flow_table_root_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + + err = mlx5_cmd_exec_in(dev, set_flow_table_root, in); + if (!err && +@@ -302,6 +306,10 @@ static int mlx5_cmd_create_flow_table(struct mlx5_flow_root_namespace *ns, + MLX5_SET(create_flow_table_in, in, vport_number, ft->vport); + MLX5_SET(create_flow_table_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(create_flow_table_in, in, eswitch_owner_vhca_id, ++ ft->esw_owner_vhca_id); ++ MLX5_SET(create_flow_table_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + + MLX5_SET(create_flow_table_in, in, flow_table_context.decap_en, + en_decap); +@@ -360,6 +368,10 @@ static int mlx5_cmd_destroy_flow_table(struct mlx5_flow_root_namespace *ns, + MLX5_SET(destroy_flow_table_in, in, vport_number, ft->vport); + MLX5_SET(destroy_flow_table_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(destroy_flow_table_in, in, eswitch_owner_vhca_id, ++ ft->esw_owner_vhca_id); ++ MLX5_SET(destroy_flow_table_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + + err = mlx5_cmd_exec_in(dev, destroy_flow_table, in); + if (!err) +@@ -394,6 +406,10 @@ static int mlx5_cmd_modify_flow_table(struct mlx5_flow_root_namespace *ns, + MLX5_SET(modify_flow_table_in, in, vport_number, ft->vport); + MLX5_SET(modify_flow_table_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(modify_flow_table_in, in, eswitch_owner_vhca_id, ++ ft->esw_owner_vhca_id); ++ MLX5_SET(modify_flow_table_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + MLX5_SET(modify_flow_table_in, in, modify_field_select, + MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID); + if (next_ft) { +@@ -429,6 +445,10 @@ static int mlx5_cmd_create_flow_group(struct mlx5_flow_root_namespace *ns, + MLX5_SET(create_flow_group_in, in, vport_number, ft->vport); + MLX5_SET(create_flow_group_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(create_flow_group_in, in, eswitch_owner_vhca_id, ++ ft->esw_owner_vhca_id); ++ MLX5_SET(create_flow_group_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + err = mlx5_cmd_exec_inout(dev, create_flow_group, in, out); + if (!err) + fg->id = MLX5_GET(create_flow_group_out, out, +@@ -451,6 +471,10 @@ static int mlx5_cmd_destroy_flow_group(struct mlx5_flow_root_namespace *ns, + MLX5_SET(destroy_flow_group_in, in, vport_number, ft->vport); + MLX5_SET(destroy_flow_group_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(destroy_flow_group_in, in, eswitch_owner_vhca_id, ++ ft->esw_owner_vhca_id); ++ MLX5_SET(destroy_flow_group_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + return mlx5_cmd_exec_in(dev, destroy_flow_group, in); + } + +@@ -559,6 +583,9 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev, + MLX5_SET(set_fte_in, in, vport_number, ft->vport); + MLX5_SET(set_fte_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(set_fte_in, in, eswitch_owner_vhca_id, ft->esw_owner_vhca_id); ++ MLX5_SET(set_fte_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + + in_flow_context = MLX5_ADDR_OF(set_fte_in, in, flow_context); + MLX5_SET(flow_context, in_flow_context, group_id, group_id); +@@ -788,6 +815,10 @@ static int mlx5_cmd_delete_fte(struct mlx5_flow_root_namespace *ns, + MLX5_SET(delete_fte_in, in, vport_number, ft->vport); + MLX5_SET(delete_fte_in, in, other_vport, + !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT)); ++ MLX5_SET(delete_fte_in, in, eswitch_owner_vhca_id, ++ ft->esw_owner_vhca_id); ++ MLX5_SET(delete_fte_in, in, other_eswitch, ++ !!(ft->flags & MLX5_FLOW_TABLE_OTHER_ESWITCH)); + + return mlx5_cmd_exec_in(dev, delete_fte, in); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 2db3ffb0a2b2..87e381c82ed3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -939,10 +939,10 @@ static struct mlx5_flow_group *alloc_insert_flow_group(struct mlx5_flow_table *f + return fg; + } + +-static struct mlx5_flow_table *alloc_flow_table(int level, u16 vport, +- enum fs_flow_table_type table_type, +- enum fs_flow_table_op_mod op_mod, +- u32 flags) ++static struct mlx5_flow_table * ++alloc_flow_table(struct mlx5_flow_table_attr *ft_attr, u16 vport, ++ enum fs_flow_table_type table_type, ++ enum fs_flow_table_op_mod op_mod) + { + struct mlx5_flow_table *ft; + int ret; +@@ -957,12 +957,13 @@ static struct mlx5_flow_table *alloc_flow_table(int level, u16 vport, + return ERR_PTR(ret); + } + +- ft->level = level; ++ ft->level = ft_attr->level; + ft->node.type = FS_TYPE_FLOW_TABLE; + ft->op_mod = op_mod; + ft->type = table_type; + ft->vport = vport; +- ft->flags = flags; ++ ft->esw_owner_vhca_id = ft_attr->esw_owner_vhca_id; ++ ft->flags = ft_attr->flags; + INIT_LIST_HEAD(&ft->fwd_rules); + mutex_init(&ft->lock); + +@@ -1370,10 +1371,7 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa + /* The level is related to the + * priority level range. + */ +- ft = alloc_flow_table(ft_attr->level, +- vport, +- root->table_type, +- op_mod, ft_attr->flags); ++ ft = alloc_flow_table(ft_attr, vport, root->table_type, op_mod); + if (IS_ERR(ft)) { + err = PTR_ERR(ft); + goto unlock_root; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +index 8458ce203dac..0a9a5ef34c21 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +@@ -205,6 +205,7 @@ struct mlx5_flow_table { + }; + u32 id; + u16 vport; ++ u16 esw_owner_vhca_id; + unsigned int max_fte; + unsigned int level; + enum fs_flow_table_type type; +diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h +index 6ac76a0c3827..6325a7fa0df2 100644 +--- a/include/linux/mlx5/fs.h ++++ b/include/linux/mlx5/fs.h +@@ -71,6 +71,7 @@ enum { + MLX5_FLOW_TABLE_UNMANAGED = BIT(3), + MLX5_FLOW_TABLE_OTHER_VPORT = BIT(4), + MLX5_FLOW_TABLE_UPLINK_VPORT = BIT(5), ++ MLX5_FLOW_TABLE_OTHER_ESWITCH = BIT(6), + }; + + #define LEFTOVERS_RULE_NUM 2 +@@ -208,6 +209,7 @@ struct mlx5_flow_table_attr { + u32 flags; + u16 uid; + u16 vport; ++ u16 esw_owner_vhca_id; + struct mlx5_flow_table *next_ft; + + struct { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1618-net-mlx5-fs-set-non-default-device-per-namespace.patch b/SOURCES/1618-net-mlx5-fs-set-non-default-device-per-namespace.patch new file mode 100644 index 000000000..b1db3c463 --- /dev/null +++ b/SOURCES/1618-net-mlx5-fs-set-non-default-device-per-namespace.patch @@ -0,0 +1,168 @@ +From 8307810e0c08308cacef1229ba11018e55ea7d09 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] net/mlx5: fs, set non default device per namespace + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 583b4fe1c19d978bb787e0adf9ce469cb7f68455 +Author: Patrisious Haddad +Date: Wed Oct 29 17:42:55 2025 +0200 + + net/mlx5: fs, set non default device per namespace + + Add mlx5_fs_set_root_dev() function which swaps the root namespace + core device with another one for a given table_type. + + It is intended for usage only by RDMA_TRANSPORT tables in case of LAG + configuration, to allow the creation of tables during LAG always + through the LAG master device, which is valid since during LAG the + master is allowed to manage the RDMA_TRANSPORT tables of its slaves. + + In addition move the table_type enum to global include to allow its use + in a downstream patch in the RDMA driver. + + Signed-off-by: Patrisious Haddad + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-3-98bb707b5d57@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 87e381c82ed3..5b210c54a592 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -3308,6 +3308,62 @@ init_rdma_transport_tx_root_ns_one(struct mlx5_flow_steering *steering, + return ret; + } + ++static bool mlx5_fs_ns_is_empty(struct mlx5_flow_namespace *ns) ++{ ++ struct fs_prio *iter_prio; ++ ++ fs_for_each_prio(iter_prio, ns) { ++ if (iter_prio->num_ft) ++ return false; ++ } ++ ++ return true; ++} ++ ++int mlx5_fs_set_root_dev(struct mlx5_core_dev *dev, ++ struct mlx5_core_dev *new_dev, ++ enum fs_flow_table_type table_type) ++{ ++ struct mlx5_flow_root_namespace **root; ++ int total_vports; ++ int i; ++ ++ switch (table_type) { ++ case FS_FT_RDMA_TRANSPORT_TX: ++ root = dev->priv.steering->rdma_transport_tx_root_ns; ++ total_vports = dev->priv.steering->rdma_transport_tx_vports; ++ break; ++ case FS_FT_RDMA_TRANSPORT_RX: ++ root = dev->priv.steering->rdma_transport_rx_root_ns; ++ total_vports = dev->priv.steering->rdma_transport_rx_vports; ++ break; ++ default: ++ WARN_ON_ONCE(true); ++ return -EINVAL; ++ } ++ ++ for (i = 0; i < total_vports; i++) { ++ mutex_lock(&root[i]->chain_lock); ++ if (!mlx5_fs_ns_is_empty(&root[i]->ns)) { ++ mutex_unlock(&root[i]->chain_lock); ++ goto err; ++ } ++ root[i]->dev = new_dev; ++ mutex_unlock(&root[i]->chain_lock); ++ } ++ return 0; ++err: ++ while (i--) { ++ mutex_lock(&root[i]->chain_lock); ++ root[i]->dev = dev; ++ mutex_unlock(&root[i]->chain_lock); ++ } ++ /* If you hit this error try destroying all flow tables and try again */ ++ mlx5_core_err(dev, "Failed to set root device for RDMA TRANSPORT\n"); ++ return -EINVAL; ++} ++EXPORT_SYMBOL(mlx5_fs_set_root_dev); ++ + static int init_rdma_transport_rx_root_ns(struct mlx5_flow_steering *steering) + { + struct mlx5_core_dev *dev = steering->dev; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +index 0a9a5ef34c21..1c6591425260 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +@@ -103,24 +103,6 @@ enum fs_node_type { + FS_TYPE_FLOW_DEST + }; + +-enum fs_flow_table_type { +- FS_FT_NIC_RX = 0x0, +- FS_FT_NIC_TX = 0x1, +- FS_FT_ESW_EGRESS_ACL = 0x2, +- FS_FT_ESW_INGRESS_ACL = 0x3, +- FS_FT_FDB = 0X4, +- FS_FT_SNIFFER_RX = 0X5, +- FS_FT_SNIFFER_TX = 0X6, +- FS_FT_RDMA_RX = 0X7, +- FS_FT_RDMA_TX = 0X8, +- FS_FT_PORT_SEL = 0X9, +- FS_FT_FDB_RX = 0xa, +- FS_FT_FDB_TX = 0xb, +- FS_FT_RDMA_TRANSPORT_RX = 0xd, +- FS_FT_RDMA_TRANSPORT_TX = 0xe, +- FS_FT_MAX_TYPE = FS_FT_RDMA_TRANSPORT_TX, +-}; +- + enum fs_flow_table_op_mod { + FS_FT_OP_MOD_NORMAL, + FS_FT_OP_MOD_LAG_DEMUX, +diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h +index 6325a7fa0df2..fe721557bd1d 100644 +--- a/include/linux/mlx5/fs.h ++++ b/include/linux/mlx5/fs.h +@@ -128,6 +128,24 @@ enum { + FDB_PER_VPORT, + }; + ++enum fs_flow_table_type { ++ FS_FT_NIC_RX = 0x0, ++ FS_FT_NIC_TX = 0x1, ++ FS_FT_ESW_EGRESS_ACL = 0x2, ++ FS_FT_ESW_INGRESS_ACL = 0x3, ++ FS_FT_FDB = 0X4, ++ FS_FT_SNIFFER_RX = 0X5, ++ FS_FT_SNIFFER_TX = 0X6, ++ FS_FT_RDMA_RX = 0X7, ++ FS_FT_RDMA_TX = 0X8, ++ FS_FT_PORT_SEL = 0X9, ++ FS_FT_FDB_RX = 0xa, ++ FS_FT_FDB_TX = 0xb, ++ FS_FT_RDMA_TRANSPORT_RX = 0xd, ++ FS_FT_RDMA_TRANSPORT_TX = 0xe, ++ FS_FT_MAX_TYPE = FS_FT_RDMA_TRANSPORT_TX, ++}; ++ + struct mlx5_pkt_reformat; + struct mlx5_modify_hdr; + struct mlx5_flow_definer; +@@ -355,4 +373,8 @@ u32 mlx5_flow_table_id(struct mlx5_flow_table *ft); + + struct mlx5_flow_root_namespace * + mlx5_get_root_namespace(struct mlx5_core_dev *dev, enum mlx5_flow_namespace_type ns_type); ++ ++int mlx5_fs_set_root_dev(struct mlx5_core_dev *dev, ++ struct mlx5_core_dev *new_dev, ++ enum fs_flow_table_type table_type); + #endif +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1619-net-mlx5-mpfs-add-support-for-dynamic-enable-disable.patch b/SOURCES/1619-net-mlx5-mpfs-add-support-for-dynamic-enable-disable.patch new file mode 100644 index 000000000..c43d1871b --- /dev/null +++ b/SOURCES/1619-net-mlx5-mpfs-add-support-for-dynamic-enable-disable.patch @@ -0,0 +1,247 @@ +From 013c1eb99ebac50e30b766e6e16c0a25cb50c4f1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] net/mlx5: MPFS, add support for dynamic enable/disable + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9902b6381d76ccd2e08e2703390e8c8a3bcda482 +Author: Saeed Mahameed +Date: Fri Nov 7 23:04:03 2025 -0800 + + net/mlx5: MPFS, add support for dynamic enable/disable + + MPFS (Multi PF Switch) is enabled by default in Multi-Host environments, + the driver keeps a list of desired unicast mac addresses of all vports + (vfs/Sfs) and applied to HW via L2_table FW command. + + Add API to dynamically apply the list of MACs to HW when needed for next + patches, to utilize this new API in devlink eswitch active/in-active uAPI. + + Signed-off-by: Saeed Mahameed + Signed-off-by: Adithya Jayachandran + Reviewed-by: Jiri Pirko + Link: https://patch.msgid.link/20251108070404.1551708-3-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c +index 4450091e181a..99fb7a53add0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c +@@ -65,13 +65,14 @@ static int del_l2table_entry_cmd(struct mlx5_core_dev *dev, u32 index) + /* UC L2 table hash node */ + struct l2table_node { + struct l2addr_node node; +- u32 index; /* index in HW l2 table */ ++ int index; /* index in HW l2 table */ + int ref_count; + }; + + struct mlx5_mpfs { + struct hlist_head hash[MLX5_L2_ADDR_HASH_SIZE]; + struct mutex lock; /* Synchronize l2 table access */ ++ bool enabled; + u32 size; + unsigned long *bitmap; + }; +@@ -114,6 +115,8 @@ int mlx5_mpfs_init(struct mlx5_core_dev *dev) + return -ENOMEM; + } + ++ mpfs->enabled = true; ++ + dev->priv.mpfs = mpfs; + return 0; + } +@@ -135,7 +138,7 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac) + struct mlx5_mpfs *mpfs = dev->priv.mpfs; + struct l2table_node *l2addr; + int err = 0; +- u32 index; ++ int index; + + if (!mpfs) + return 0; +@@ -148,30 +151,34 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac) + goto out; + } + +- err = alloc_l2table_index(mpfs, &index); +- if (err) +- goto out; +- + l2addr = l2addr_hash_add(mpfs->hash, mac, struct l2table_node, GFP_KERNEL); + if (!l2addr) { + err = -ENOMEM; +- goto hash_add_err; ++ goto out; + } + +- err = set_l2table_entry_cmd(dev, index, mac); +- if (err) +- goto set_table_entry_err; ++ index = -1; ++ ++ if (mpfs->enabled) { ++ err = alloc_l2table_index(mpfs, &index); ++ if (err) ++ goto hash_del; ++ err = set_l2table_entry_cmd(dev, index, mac); ++ if (err) ++ goto free_l2table_index; ++ mlx5_core_dbg(dev, "MPFS entry %pM, set @index (%d)\n", ++ l2addr->node.addr, l2addr->index); ++ } + + l2addr->index = index; + l2addr->ref_count = 1; + + mlx5_core_dbg(dev, "MPFS mac added %pM, index (%d)\n", mac, index); + goto out; +- +-set_table_entry_err: +- l2addr_hash_del(l2addr); +-hash_add_err: ++free_l2table_index: + free_l2table_index(mpfs, index); ++hash_del: ++ l2addr_hash_del(l2addr); + out: + mutex_unlock(&mpfs->lock); + return err; +@@ -183,7 +190,7 @@ int mlx5_mpfs_del_mac(struct mlx5_core_dev *dev, u8 *mac) + struct mlx5_mpfs *mpfs = dev->priv.mpfs; + struct l2table_node *l2addr; + int err = 0; +- u32 index; ++ int index; + + if (!mpfs) + return 0; +@@ -200,12 +207,87 @@ int mlx5_mpfs_del_mac(struct mlx5_core_dev *dev, u8 *mac) + goto unlock; + + index = l2addr->index; +- del_l2table_entry_cmd(dev, index); ++ if (index >= 0) { ++ del_l2table_entry_cmd(dev, index); ++ free_l2table_index(mpfs, index); ++ mlx5_core_dbg(dev, "MPFS entry %pM, deleted @index (%d)\n", ++ mac, index); ++ } + l2addr_hash_del(l2addr); +- free_l2table_index(mpfs, index); + mlx5_core_dbg(dev, "MPFS mac deleted %pM, index (%d)\n", mac, index); + unlock: + mutex_unlock(&mpfs->lock); + return err; + } + EXPORT_SYMBOL(mlx5_mpfs_del_mac); ++ ++int mlx5_mpfs_enable(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_mpfs *mpfs = dev->priv.mpfs; ++ struct l2table_node *l2addr; ++ struct hlist_node *n; ++ int err = 0, i; ++ ++ if (!mpfs) ++ return -ENODEV; ++ ++ mutex_lock(&mpfs->lock); ++ if (mpfs->enabled) ++ goto out; ++ mpfs->enabled = true; ++ mlx5_core_dbg(dev, "MPFS enabling mpfs\n"); ++ ++ mlx5_mpfs_foreach(l2addr, n, mpfs, i) { ++ u32 index; ++ ++ err = alloc_l2table_index(mpfs, &index); ++ if (err) { ++ mlx5_core_err(dev, "Failed to allocated MPFS index for %pM, err(%d)\n", ++ l2addr->node.addr, err); ++ goto out; ++ } ++ ++ err = set_l2table_entry_cmd(dev, index, l2addr->node.addr); ++ if (err) { ++ mlx5_core_err(dev, "Failed to set MPFS l2table entry for %pM index=%d, err(%d)\n", ++ l2addr->node.addr, index, err); ++ free_l2table_index(mpfs, index); ++ goto out; ++ } ++ ++ l2addr->index = index; ++ mlx5_core_dbg(dev, "MPFS entry %pM, set @index (%d)\n", ++ l2addr->node.addr, l2addr->index); ++ } ++out: ++ mutex_unlock(&mpfs->lock); ++ return err; ++} ++ ++void mlx5_mpfs_disable(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_mpfs *mpfs = dev->priv.mpfs; ++ struct l2table_node *l2addr; ++ struct hlist_node *n; ++ int i; ++ ++ if (!mpfs) ++ return; ++ ++ mutex_lock(&mpfs->lock); ++ if (!mpfs->enabled) ++ goto unlock; ++ mlx5_mpfs_foreach(l2addr, n, mpfs, i) { ++ if (l2addr->index < 0) ++ continue; ++ del_l2table_entry_cmd(dev, l2addr->index); ++ free_l2table_index(mpfs, l2addr->index); ++ mlx5_core_dbg(dev, "MPFS entry %pM, deleted @index (%d)\n", ++ l2addr->node.addr, l2addr->index); ++ l2addr->index = -1; ++ } ++ mpfs->enabled = false; ++ mlx5_core_dbg(dev, "MPFS disabled\n"); ++unlock: ++ mutex_unlock(&mpfs->lock); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h +index 4a293542a7aa..9c63838ce1f3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.h +@@ -45,6 +45,10 @@ struct l2addr_node { + u8 addr[ETH_ALEN]; + }; + ++#define mlx5_mpfs_foreach(hs, tmp, mpfs, i) \ ++ for (i = 0; i < MLX5_L2_ADDR_HASH_SIZE; i++) \ ++ hlist_for_each_entry_safe(hs, tmp, &(mpfs)->hash[i], node.hlist) ++ + #define for_each_l2hash_node(hn, tmp, hash, i) \ + for (i = 0; i < MLX5_L2_ADDR_HASH_SIZE; i++) \ + hlist_for_each_entry_safe(hn, tmp, &(hash)[i], hlist) +@@ -82,11 +86,16 @@ struct l2addr_node { + }) + + #ifdef CONFIG_MLX5_MPFS ++struct mlx5_core_dev; + int mlx5_mpfs_init(struct mlx5_core_dev *dev); + void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev); ++int mlx5_mpfs_enable(struct mlx5_core_dev *dev); ++void mlx5_mpfs_disable(struct mlx5_core_dev *dev); + #else /* #ifndef CONFIG_MLX5_MPFS */ + static inline int mlx5_mpfs_init(struct mlx5_core_dev *dev) { return 0; } + static inline void mlx5_mpfs_cleanup(struct mlx5_core_dev *dev) {} ++static inline int mlx5_mpfs_enable(struct mlx5_core_dev *dev) { return 0; } ++static inline void mlx5_mpfs_disable(struct mlx5_core_dev *dev) {} + #endif + + #endif +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1620-net-mlx5-e-switch-support-eswitch-inactive-mode.patch b/SOURCES/1620-net-mlx5-e-switch-support-eswitch-inactive-mode.patch new file mode 100644 index 000000000..cb9b63188 --- /dev/null +++ b/SOURCES/1620-net-mlx5-e-switch-support-eswitch-inactive-mode.patch @@ -0,0 +1,471 @@ +From c62f3f0c11a849afa227a35847a2d50d58a3475d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] net/mlx5: E-Switch, support eswitch inactive mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9da611df15aa8d519f9947b88a5c733267cba888 +Author: Saeed Mahameed +Date: Fri Nov 7 23:04:04 2025 -0800 + + net/mlx5: E-Switch, support eswitch inactive mode + + Add support for eswitch switchdev inactive mode + + Inactive mode: Drop all traffic going to FDB, Remove + mpfs l2 rules and disconnect adjacent vports. + + Active mode: Traffic flows through FDB, mpfs table populated, and + adjacent vports are connected. + + Signed-off-by: Saeed Mahameed + Signed-off-by: Adithya Jayachandran + Reviewed-by: Jiri Pirko + Link: https://patch.msgid.link/20251108070404.1551708-4-saeed@kernel.org + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +index 0091ba697bae..250af09b5af2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/adj_vport.c +@@ -4,13 +4,8 @@ + #include "fs_core.h" + #include "eswitch.h" + +-enum { +- MLX5_ADJ_VPORT_DISCONNECT = 0x0, +- MLX5_ADJ_VPORT_CONNECT = 0x1, +-}; +- +-static int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, +- u16 vport, bool connect) ++int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, u16 vport, ++ bool connect) + { + u32 in[MLX5_ST_SZ_DW(modify_vport_state_in)] = {}; + +@@ -24,7 +19,7 @@ static int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, + MLX5_SET(modify_vport_state_in, in, egress_connect_valid, 1); + MLX5_SET(modify_vport_state_in, in, ingress_connect, connect); + MLX5_SET(modify_vport_state_in, in, egress_connect, connect); +- ++ MLX5_SET(modify_vport_state_in, in, admin_state, connect); + return mlx5_cmd_exec_in(dev, modify_vport_state, in); + } + +@@ -96,7 +91,6 @@ static int mlx5_esw_adj_vport_create(struct mlx5_eswitch *esw, u16 vhca_id, + if (err) + goto acl_ns_remove; + +- mlx5_esw_adj_vport_modify(esw->dev, vport_num, MLX5_ADJ_VPORT_CONNECT); + return 0; + + acl_ns_remove: +@@ -117,8 +111,7 @@ static void mlx5_esw_adj_vport_destroy(struct mlx5_eswitch *esw, + + esw_debug(esw->dev, "Destroying adjacent vport %d for vhca_id 0x%x\n", + vport_num, vport->vhca_id); +- mlx5_esw_adj_vport_modify(esw->dev, vport_num, +- MLX5_ADJ_VPORT_DISCONNECT); ++ + mlx5_esw_offloads_rep_remove(esw, vport); + mlx5_fs_vport_egress_acl_ns_remove(esw->dev->priv.steering, + vport->index); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 16eb99aba2a7..beaec450a734 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -264,6 +264,9 @@ struct mlx5_eswitch_fdb { + + struct offloads_fdb { + struct mlx5_flow_namespace *ns; ++ struct mlx5_flow_table *drop_root; ++ struct mlx5_flow_handle *drop_root_rule; ++ struct mlx5_fc *drop_root_fc; + struct mlx5_flow_table *tc_miss_table; + struct mlx5_flow_table *slow_fdb; + struct mlx5_flow_group *send_to_vport_grp; +@@ -392,6 +395,7 @@ struct mlx5_eswitch { + struct mlx5_esw_offload offloads; + u32 last_vport_idx; + int mode; ++ bool offloads_inactive; + u16 manager_vport; + u16 first_host_vport; + u8 num_peers; +@@ -634,6 +638,8 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev); + + void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw); + void mlx5_esw_adjacent_vhcas_cleanup(struct mlx5_eswitch *esw); ++int mlx5_esw_adj_vport_modify(struct mlx5_core_dev *dev, u16 vport, ++ bool connect); + + #define MLX5_DEBUG_ESWITCH_MASK BIT(3) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 8eb08d2276be..8ebca0d17f65 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -1577,6 +1577,7 @@ esw_chains_create(struct mlx5_eswitch *esw, struct mlx5_flow_table *miss_fdb) + attr.max_grp_num = esw->params.large_group_num; + attr.default_ft = miss_fdb; + attr.mapping = esw->offloads.reg_c0_obj_pool; ++ attr.fs_base_prio = FDB_BYPASS_PATH; + + chains = mlx5_chains_create(dev, &attr); + if (IS_ERR(chains)) { +@@ -2355,6 +2356,131 @@ static void esw_mode_change(struct mlx5_eswitch *esw, u16 mode) + mlx5_devcom_comp_unlock(esw->dev->priv.hca_devcom_comp); + } + ++static void mlx5_esw_fdb_drop_destroy(struct mlx5_eswitch *esw) ++{ ++ if (!esw->fdb_table.offloads.drop_root) ++ return; ++ ++ esw_debug(esw->dev, "Destroying FDB drop root table %#x fc %#x\n", ++ esw->fdb_table.offloads.drop_root->id, ++ esw->fdb_table.offloads.drop_root_fc->id); ++ mlx5_del_flow_rules(esw->fdb_table.offloads.drop_root_rule); ++ /* Don't free flow counter here, can be reused on a later activation */ ++ mlx5_destroy_flow_table(esw->fdb_table.offloads.drop_root); ++ esw->fdb_table.offloads.drop_root_rule = NULL; ++ esw->fdb_table.offloads.drop_root = NULL; ++} ++ ++static int mlx5_esw_fdb_drop_create(struct mlx5_eswitch *esw) ++{ ++ struct mlx5_flow_destination drop_fc_dst = {}; ++ struct mlx5_flow_table_attr ft_attr = {}; ++ struct mlx5_flow_destination *dst = NULL; ++ struct mlx5_core_dev *dev = esw->dev; ++ struct mlx5_flow_namespace *root_ns; ++ struct mlx5_flow_act flow_act = {}; ++ struct mlx5_flow_handle *flow_rule; ++ struct mlx5_flow_table *table; ++ int err = 0, dst_num = 0; ++ ++ if (esw->fdb_table.offloads.drop_root) ++ return 0; ++ ++ root_ns = esw->fdb_table.offloads.ns; ++ ++ ft_attr.prio = FDB_DROP_ROOT; ++ ft_attr.max_fte = 1; ++ ft_attr.autogroup.max_num_groups = 1; ++ table = mlx5_create_auto_grouped_flow_table(root_ns, &ft_attr); ++ if (IS_ERR(table)) { ++ esw_warn(dev, "Failed to create fdb drop root table, err %pe\n", ++ table); ++ return PTR_ERR(table); ++ } ++ ++ /* Drop FC reusable, create once on first deactivation of FDB */ ++ if (!esw->fdb_table.offloads.drop_root_fc) { ++ struct mlx5_fc *counter = mlx5_fc_create(dev, 0); ++ ++ err = PTR_ERR_OR_ZERO(counter); ++ if (err) ++ esw_warn(esw->dev, "create fdb drop fc err %d\n", err); ++ else ++ esw->fdb_table.offloads.drop_root_fc = counter; ++ } ++ ++ flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP; ++ ++ if (esw->fdb_table.offloads.drop_root_fc) { ++ flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_COUNT; ++ drop_fc_dst.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; ++ drop_fc_dst.counter = esw->fdb_table.offloads.drop_root_fc; ++ dst = &drop_fc_dst; ++ dst_num++; ++ } ++ ++ flow_rule = mlx5_add_flow_rules(table, NULL, &flow_act, dst, dst_num); ++ err = PTR_ERR_OR_ZERO(flow_rule); ++ if (err) { ++ esw_warn(esw->dev, ++ "fs offloads: Failed to add vport rx drop rule err %d\n", ++ err); ++ goto err_flow_rule; ++ } ++ ++ esw->fdb_table.offloads.drop_root = table; ++ esw->fdb_table.offloads.drop_root_rule = flow_rule; ++ esw_debug(esw->dev, "Created FDB drop root table %#x fc %#x\n", ++ table->id, dst ? dst->counter->id : 0); ++ return 0; ++ ++err_flow_rule: ++ /* no need to free drop fc, esw_offloads_steering_cleanup will do it */ ++ mlx5_destroy_flow_table(table); ++ return err; ++} ++ ++static void mlx5_esw_fdb_active(struct mlx5_eswitch *esw) ++{ ++ struct mlx5_vport *vport; ++ unsigned long i; ++ ++ mlx5_esw_fdb_drop_destroy(esw); ++ mlx5_mpfs_enable(esw->dev); ++ ++ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) { ++ if (!vport->adjacent) ++ continue; ++ esw_debug(esw->dev, "Connecting vport %d to eswitch\n", ++ vport->vport); ++ mlx5_esw_adj_vport_modify(esw->dev, vport->vport, true); ++ } ++ ++ esw->offloads_inactive = false; ++ esw_warn(esw->dev, "MPFS/FDB active\n"); ++} ++ ++static void mlx5_esw_fdb_inactive(struct mlx5_eswitch *esw) ++{ ++ struct mlx5_vport *vport; ++ unsigned long i; ++ ++ mlx5_mpfs_disable(esw->dev); ++ mlx5_esw_fdb_drop_create(esw); ++ ++ mlx5_esw_for_each_vf_vport(esw, i, vport, U16_MAX) { ++ if (!vport->adjacent) ++ continue; ++ esw_debug(esw->dev, "Disconnecting vport %u from eswitch\n", ++ vport->vport); ++ ++ mlx5_esw_adj_vport_modify(esw->dev, vport->vport, false); ++ } ++ ++ esw->offloads_inactive = true; ++ esw_warn(esw->dev, "MPFS/FDB inactive\n"); ++} ++ + static int esw_offloads_start(struct mlx5_eswitch *esw, + struct netlink_ext_ack *extack) + { +@@ -3438,6 +3564,10 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw) + + static void esw_offloads_steering_cleanup(struct mlx5_eswitch *esw) + { ++ mlx5_esw_fdb_drop_destroy(esw); ++ if (esw->fdb_table.offloads.drop_root_fc) ++ mlx5_fc_destroy(esw->dev, esw->fdb_table.offloads.drop_root_fc); ++ esw->fdb_table.offloads.drop_root_fc = NULL; + esw_destroy_vport_rx_drop_rule(esw); + esw_destroy_vport_rx_drop_group(esw); + esw_destroy_vport_rx_group(esw); +@@ -3600,6 +3730,11 @@ int esw_offloads_enable(struct mlx5_eswitch *esw) + if (err) + goto err_steering_init; + ++ if (esw->offloads_inactive) ++ mlx5_esw_fdb_inactive(esw); ++ else ++ mlx5_esw_fdb_active(esw); ++ + /* Representor will control the vport link state */ + mlx5_esw_for_each_vf_vport(esw, i, vport, esw->esw_funcs.num_vfs) + vport->info.link_state = MLX5_VPORT_ADMIN_STATE_DOWN; +@@ -3666,6 +3801,9 @@ void esw_offloads_disable(struct mlx5_eswitch *esw) + esw_offloads_metadata_uninit(esw); + mlx5_rdma_disable_roce(esw->dev); + mlx5_esw_adjacent_vhcas_cleanup(esw); ++ /* must be done after vhcas cleanup to avoid adjacent vports connect */ ++ if (esw->offloads_inactive) ++ mlx5_esw_fdb_active(esw); /* legacy mode always active */ + mutex_destroy(&esw->offloads.termtbl_mutex); + } + +@@ -3676,6 +3814,7 @@ static int esw_mode_from_devlink(u16 mode, u16 *mlx5_mode) + *mlx5_mode = MLX5_ESWITCH_LEGACY; + break; + case DEVLINK_ESWITCH_MODE_SWITCHDEV: ++ case DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE: + *mlx5_mode = MLX5_ESWITCH_OFFLOADS; + break; + default: +@@ -3685,14 +3824,17 @@ static int esw_mode_from_devlink(u16 mode, u16 *mlx5_mode) + return 0; + } + +-static int esw_mode_to_devlink(u16 mlx5_mode, u16 *mode) ++static int esw_mode_to_devlink(struct mlx5_eswitch *esw, u16 *mode) + { +- switch (mlx5_mode) { ++ switch (esw->mode) { + case MLX5_ESWITCH_LEGACY: + *mode = DEVLINK_ESWITCH_MODE_LEGACY; + break; + case MLX5_ESWITCH_OFFLOADS: +- *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV; ++ if (esw->offloads_inactive) ++ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE; ++ else ++ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV; + break; + default: + return -EINVAL; +@@ -3798,6 +3940,45 @@ static bool mlx5_devlink_netdev_netns_immutable_set(struct devlink *devlink, + return ret; + } + ++/* Returns true when only changing between active and inactive switchdev mode */ ++static bool mlx5_devlink_switchdev_active_mode_change(struct mlx5_eswitch *esw, ++ u16 devlink_mode) ++{ ++ /* current mode is not switchdev */ ++ if (esw->mode != MLX5_ESWITCH_OFFLOADS) ++ return false; ++ ++ /* new mode is not switchdev */ ++ if (devlink_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV && ++ devlink_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE) ++ return false; ++ ++ /* already inactive: no change in current state */ ++ if (devlink_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE && ++ esw->offloads_inactive) ++ return false; ++ ++ /* already active: no change in current state */ ++ if (devlink_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && ++ !esw->offloads_inactive) ++ return false; ++ ++ down_write(&esw->mode_lock); ++ esw->offloads_inactive = !esw->offloads_inactive; ++ esw->eswitch_operation_in_progress = true; ++ up_write(&esw->mode_lock); ++ ++ if (esw->offloads_inactive) ++ mlx5_esw_fdb_inactive(esw); ++ else ++ mlx5_esw_fdb_active(esw); ++ ++ down_write(&esw->mode_lock); ++ esw->eswitch_operation_in_progress = false; ++ up_write(&esw->mode_lock); ++ return true; ++} ++ + int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + struct netlink_ext_ack *extack) + { +@@ -3812,12 +3993,16 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + if (esw_mode_from_devlink(mode, &mlx5_mode)) + return -EINVAL; + +- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && mlx5_get_sd(esw->dev)) { ++ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS && mlx5_get_sd(esw->dev)) { + NL_SET_ERR_MSG_MOD(extack, + "Can't change E-Switch mode to switchdev when multi-PF netdev (Socket Direct) is configured."); + return -EPERM; + } + ++ /* Avoid try_lock, active/inactive mode change is not restricted */ ++ if (mlx5_devlink_switchdev_active_mode_change(esw, mode)) ++ return 0; ++ + mlx5_lag_disable_change(esw->dev); + err = mlx5_esw_try_lock(esw); + if (err < 0) { +@@ -3840,7 +4025,7 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + esw->eswitch_operation_in_progress = true; + up_write(&esw->mode_lock); + +- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && ++ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS && + !mlx5_devlink_netdev_netns_immutable_set(devlink, true)) { + NL_SET_ERR_MSG_MOD(extack, + "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's."); +@@ -3848,25 +4033,27 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + goto skip; + } + +- if (mode == DEVLINK_ESWITCH_MODE_LEGACY) ++ if (mlx5_mode == MLX5_ESWITCH_LEGACY) + esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY; + mlx5_eswitch_disable_locked(esw); +- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) { ++ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS) { + if (mlx5_devlink_trap_get_num_active(esw->dev)) { + NL_SET_ERR_MSG_MOD(extack, + "Can't change mode while devlink traps are active"); + err = -EOPNOTSUPP; + goto skip; + } ++ esw->offloads_inactive = ++ (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE); + err = esw_offloads_start(esw, extack); +- } else if (mode == DEVLINK_ESWITCH_MODE_LEGACY) { ++ } else if (mlx5_mode == MLX5_ESWITCH_LEGACY) { + err = esw_offloads_stop(esw, extack); + } else { + err = -EINVAL; + } + + skip: +- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && err) ++ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS && err) + mlx5_devlink_netdev_netns_immutable_set(devlink, false); + down_write(&esw->mode_lock); + esw->eswitch_operation_in_progress = false; +@@ -3885,7 +4072,7 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode) + if (IS_ERR(esw)) + return PTR_ERR(esw); + +- return esw_mode_to_devlink(esw->mode, mode); ++ return esw_mode_to_devlink(esw, mode); + } + + static int mlx5_esw_vports_inline_set(struct mlx5_eswitch *esw, u8 mlx5_mode, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +index 5b210c54a592..2b755a0035ce 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +@@ -3574,6 +3574,11 @@ static int init_fdb_root_ns(struct mlx5_flow_steering *steering) + if (!steering->fdb_root_ns) + return -ENOMEM; + ++ maj_prio = fs_create_prio(&steering->fdb_root_ns->ns, FDB_DROP_ROOT, 1); ++ err = PTR_ERR_OR_ZERO(maj_prio); ++ if (err) ++ goto out_err; ++ + err = create_fdb_bypass(steering); + if (err) + goto out_err; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c +index 99fb7a53add0..4a88a42ae4f7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c +@@ -167,7 +167,7 @@ int mlx5_mpfs_add_mac(struct mlx5_core_dev *dev, u8 *mac) + if (err) + goto free_l2table_index; + mlx5_core_dbg(dev, "MPFS entry %pM, set @index (%d)\n", +- l2addr->node.addr, l2addr->index); ++ l2addr->node.addr, index); + } + + l2addr->index = index; +diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h +index fe721557bd1d..9cadb1d5e6df 100644 +--- a/include/linux/mlx5/fs.h ++++ b/include/linux/mlx5/fs.h +@@ -117,6 +117,7 @@ enum mlx5_flow_namespace_type { + }; + + enum { ++ FDB_DROP_ROOT, + FDB_BYPASS_PATH, + FDB_CRYPTO_INGRESS, + FDB_TC_OFFLOAD, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1621-net-mlx5-expose-definition-for-1600gbps-link-mode.patch b/SOURCES/1621-net-mlx5-expose-definition-for-1600gbps-link-mode.patch new file mode 100644 index 000000000..406d7a521 --- /dev/null +++ b/SOURCES/1621-net-mlx5-expose-definition-for-1600gbps-link-mode.patch @@ -0,0 +1,39 @@ +From e4e0b1c258bc9c16e4824753b52ae59780d8fc9f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] net/mlx5: Expose definition for 1600Gbps link mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5422318e27d7a4662701f518e2e51b9f73a331b1 +Author: Tariq Toukan +Date: Tue Nov 11 14:24:48 2025 +0200 + + net/mlx5: Expose definition for 1600Gbps link mode + + This patch exposes new link mode for 1600Gbps, utilizing 8 lanes at + 200Gbps per lane. + + Co-developed-by: Yael Chemla + Reviewed-by: Shahar Shitrit + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1762863888-1092798-1-git-send-email-tariqt@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h +index 58770b86f793..1df9d9a57bbc 100644 +--- a/include/linux/mlx5/port.h ++++ b/include/linux/mlx5/port.h +@@ -112,6 +112,7 @@ enum mlx5e_ext_link_mode { + MLX5E_400GAUI_2_400GBASE_CR2_KR2 = 17, + MLX5E_800GAUI_8_800GBASE_CR8_KR8 = 19, + MLX5E_800GAUI_4_800GBASE_CR4_KR4 = 20, ++ MLX5E_1600TAUI_8_1600TBASE_CR8_KR8 = 23, + MLX5E_EXT_LINK_MODES_NUMBER, + }; + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1622-mlx5-extract-grxrings-from-get-rxnfc.patch b/SOURCES/1622-mlx5-extract-grxrings-from-get-rxnfc.patch new file mode 100644 index 000000000..101f8d24e --- /dev/null +++ b/SOURCES/1622-mlx5-extract-grxrings-from-get-rxnfc.patch @@ -0,0 +1,119 @@ +From 632bcac88d530e5d0155819e9828957287db541a Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] mlx5: extract GRXRINGS from .get_rxnfc + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 945499665f63197801b64fabb0bccf9d15ed09bf +Author: Breno Leitao +Date: Thu Nov 13 08:46:04 2025 -0800 + + mlx5: extract GRXRINGS from .get_rxnfc + + Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to + optimize RX ring queries") added specific support for GRXRINGS callback, + simplifying .get_rxnfc. + + Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new + .get_rx_ring_count() for both the mlx5 ethernet and IPoIB drivers. + + The ETHTOOL_GRXRINGS handling was previously kept in .get_rxnfc() to + support "ethtool -x" when CONFIG_MLX5_EN_RXNFC=n. With the new + dedicated .get_rx_ring_count() callback, this is no longer necessary. + + This simplifies the RX ring count retrieval and aligns mlx5 with the new + ethtool API for querying RX ring parameters. + + Signed-off-by: Breno Leitao + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20251113-mlx_grxrings-v1-2-0017f2af7dd0@debian.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index 5a0f5589b894..61760af93969 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -2496,21 +2496,18 @@ static int mlx5e_set_rxfh_fields(struct net_device *dev, + return mlx5e_ethtool_set_rxfh_fields(priv, cmd, extack); + } + ++static u32 mlx5e_get_rx_ring_count(struct net_device *dev) ++{ ++ struct mlx5e_priv *priv = netdev_priv(dev); ++ ++ return priv->channels.params.num_channels; ++} ++ + static int mlx5e_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info, + u32 *rule_locs) + { + struct mlx5e_priv *priv = netdev_priv(dev); + +- /* ETHTOOL_GRXRINGS is needed by ethtool -x which is not part +- * of rxnfc. We keep this logic out of mlx5e_ethtool_get_rxnfc, +- * to avoid breaking "ethtool -x" when mlx5e_ethtool_get_rxnfc +- * is compiled out via CONFIG_MLX5_EN_RXNFC=n. +- */ +- if (info->cmd == ETHTOOL_GRXRINGS) { +- info->data = priv->channels.params.num_channels; +- return 0; +- } +- + return mlx5e_ethtool_get_rxnfc(priv, info, rule_locs); + } + +@@ -2770,6 +2767,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = { + .remove_rxfh_context = mlx5e_remove_rxfh_context, + .get_rxnfc = mlx5e_get_rxnfc, + .set_rxnfc = mlx5e_set_rxnfc, ++ .get_rx_ring_count = mlx5e_get_rx_ring_count, + .get_tunable = mlx5e_get_tunable, + .set_tunable = mlx5e_set_tunable, + .get_pause_stats = mlx5e_get_pause_stats, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c +index 4b3430ac3905..3b2f54ca30a8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c +@@ -266,21 +266,18 @@ static int mlx5i_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd) + return mlx5e_ethtool_set_rxnfc(priv, cmd); + } + ++static u32 mlx5i_get_rx_ring_count(struct net_device *dev) ++{ ++ struct mlx5e_priv *priv = mlx5i_epriv(dev); ++ ++ return priv->channels.params.num_channels; ++} ++ + static int mlx5i_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info, + u32 *rule_locs) + { + struct mlx5e_priv *priv = mlx5i_epriv(dev); + +- /* ETHTOOL_GRXRINGS is needed by ethtool -x which is not part +- * of rxnfc. We keep this logic out of mlx5e_ethtool_get_rxnfc, +- * to avoid breaking "ethtool -x" when mlx5e_ethtool_get_rxnfc +- * is compiled out via CONFIG_MLX5_EN_RXNFC=n. +- */ +- if (info->cmd == ETHTOOL_GRXRINGS) { +- info->data = priv->channels.params.num_channels; +- return 0; +- } +- + return mlx5e_ethtool_get_rxnfc(priv, info, rule_locs); + } + +@@ -304,6 +301,7 @@ const struct ethtool_ops mlx5i_ethtool_ops = { + .set_rxfh_fields = mlx5i_set_rxfh_fields, + .get_rxnfc = mlx5i_get_rxnfc, + .set_rxnfc = mlx5i_set_rxnfc, ++ .get_rx_ring_count = mlx5i_get_rx_ring_count, + .get_link_ksettings = mlx5i_get_link_ksettings, + .get_link = ethtool_op_get_link, + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1623-net-mlx5-refactor-eeprom-query-error-handling-to-return-stat.patch b/SOURCES/1623-net-mlx5-refactor-eeprom-query-error-handling-to-return-stat.patch new file mode 100644 index 000000000..b642a9927 --- /dev/null +++ b/SOURCES/1623-net-mlx5-refactor-eeprom-query-error-handling-to-return-stat.patch @@ -0,0 +1,234 @@ +From 4da9c7c34673ba1c1c8e3c8b6a85a632db909986 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] net/mlx5: Refactor EEPROM query error handling to return + status separately + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2e4c44b12f4da60d3e8dcbc1ccf38bb28a878050 +Author: Gal Pressman +Date: Mon Nov 17 23:42:05 2025 +0200 + + net/mlx5: Refactor EEPROM query error handling to return status separately + + Matthew and Jakub reported [1] issues where inventory automation tools + are calling EEPROM query repeatedly on a port that doesn't have an SFP + connected, resulting in millions of error prints. + + Move MCIA register status extraction from the query functions to the + callers, allowing use of extack reporting instead of a dmesg print when + using the netlink API. + + [1] https://lore.kernel.org/netdev/20251028194011.39877-1-mattc@purestorage.com/ + + Cc: Matthew W Carlis + Signed-off-by: Gal Pressman + Reviewed-by: Jianbo Liu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763415729-1238421-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index 61760af93969..6e12bd196ec9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -2026,7 +2026,7 @@ static int mlx5e_get_module_info(struct net_device *netdev, + int size_read = 0; + u8 data[4] = {0}; + +- size_read = mlx5_query_module_eeprom(dev, 0, 2, data); ++ size_read = mlx5_query_module_eeprom(dev, 0, 2, data, NULL); + if (size_read < 2) + return -EIO; + +@@ -2068,6 +2068,7 @@ static int mlx5e_get_module_eeprom(struct net_device *netdev, + struct mlx5_core_dev *mdev = priv->mdev; + int offset = ee->offset; + int size_read; ++ u8 status = 0; + int i = 0; + + if (!ee->len) +@@ -2077,15 +2078,15 @@ static int mlx5e_get_module_eeprom(struct net_device *netdev, + + while (i < ee->len) { + size_read = mlx5_query_module_eeprom(mdev, offset, ee->len - i, +- data + i); +- ++ data + i, &status); + if (!size_read) + /* Done reading */ + return 0; + + if (size_read < 0) { +- netdev_err(priv->netdev, "%s: mlx5_query_eeprom failed:0x%x\n", +- __func__, size_read); ++ netdev_err(netdev, ++ "%s: mlx5_query_eeprom failed:0x%x, status %u\n", ++ __func__, size_read, status); + return size_read; + } + +@@ -2105,6 +2106,7 @@ static int mlx5e_get_module_eeprom_by_page(struct net_device *netdev, + struct mlx5_core_dev *mdev = priv->mdev; + u8 *data = page_data->data; + int size_read; ++ u8 status = 0; + int i = 0; + + if (!page_data->length) +@@ -2118,7 +2120,8 @@ static int mlx5e_get_module_eeprom_by_page(struct net_device *netdev, + query.page = page_data->page; + while (i < page_data->length) { + query.size = page_data->length - i; +- size_read = mlx5_query_module_eeprom_by_page(mdev, &query, data + i); ++ size_read = mlx5_query_module_eeprom_by_page(mdev, &query, ++ data + i, &status); + + /* Done reading, return how many bytes was read */ + if (!size_read) +@@ -2127,8 +2130,8 @@ static int mlx5e_get_module_eeprom_by_page(struct net_device *netdev, + if (size_read < 0) { + NL_SET_ERR_MSG_FMT_MOD( + extack, +- "Query module eeprom by page failed, read %u bytes, err %d", +- i, size_read); ++ "Query module eeprom by page failed, read %u bytes, err %d, status %u", ++ i, size_read, status); + return size_read; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index acef7d0ffa09..cfebc110c02f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -357,11 +357,11 @@ int mlx5_set_port_fcs(struct mlx5_core_dev *mdev, u8 enable); + void mlx5_query_port_fcs(struct mlx5_core_dev *mdev, bool *supported, + bool *enabled); + int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, +- u16 offset, u16 size, u8 *data); ++ u16 offset, u16 size, u8 *data, u8 *status); + int + mlx5_query_module_eeprom_by_page(struct mlx5_core_dev *dev, + struct mlx5_module_eeprom_query_params *params, +- u8 *data); ++ u8 *data, u8 *status); + + int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out); + int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c +index aa9f2b0a77d3..e4b1dfafb41f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c +@@ -289,11 +289,11 @@ int mlx5_query_module_num(struct mlx5_core_dev *dev, int *module_num) + } + + static int mlx5_query_module_id(struct mlx5_core_dev *dev, int module_num, +- u8 *module_id) ++ u8 *module_id, u8 *status) + { + u32 in[MLX5_ST_SZ_DW(mcia_reg)] = {}; + u32 out[MLX5_ST_SZ_DW(mcia_reg)]; +- int err, status; ++ int err; + u8 *ptr; + + MLX5_SET(mcia_reg, in, i2c_device_address, MLX5_I2C_ADDR_LOW); +@@ -308,12 +308,12 @@ static int mlx5_query_module_id(struct mlx5_core_dev *dev, int module_num, + if (err) + return err; + +- status = MLX5_GET(mcia_reg, out, status); +- if (status) { +- mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n", +- status); ++ if (MLX5_GET(mcia_reg, out, status)) { ++ if (status) ++ *status = MLX5_GET(mcia_reg, out, status); + return -EIO; + } ++ + ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0); + + *module_id = ptr[0]; +@@ -370,13 +370,14 @@ static int mlx5_mcia_max_bytes(struct mlx5_core_dev *dev) + } + + static int mlx5_query_mcia(struct mlx5_core_dev *dev, +- struct mlx5_module_eeprom_query_params *params, u8 *data) ++ struct mlx5_module_eeprom_query_params *params, ++ u8 *data, u8 *status) + { + u32 in[MLX5_ST_SZ_DW(mcia_reg)] = {}; + u32 out[MLX5_ST_SZ_DW(mcia_reg)]; +- int status, err; + void *ptr; + u16 size; ++ int err; + + size = min_t(int, params->size, mlx5_mcia_max_bytes(dev)); + +@@ -392,12 +393,9 @@ static int mlx5_query_mcia(struct mlx5_core_dev *dev, + if (err) + return err; + +- status = MLX5_GET(mcia_reg, out, status); +- if (status) { +- mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n", +- status); ++ *status = MLX5_GET(mcia_reg, out, status); ++ if (*status) + return -EIO; +- } + + ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0); + memcpy(data, ptr, size); +@@ -406,7 +404,7 @@ static int mlx5_query_mcia(struct mlx5_core_dev *dev, + } + + int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, +- u16 offset, u16 size, u8 *data) ++ u16 offset, u16 size, u8 *data, u8 *status) + { + struct mlx5_module_eeprom_query_params query = {0}; + u8 module_id; +@@ -416,7 +414,8 @@ int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, + if (err) + return err; + +- err = mlx5_query_module_id(dev, query.module_number, &module_id); ++ err = mlx5_query_module_id(dev, query.module_number, &module_id, ++ status); + if (err) + return err; + +@@ -441,12 +440,12 @@ int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, + query.size = size; + query.offset = offset; + +- return mlx5_query_mcia(dev, &query, data); ++ return mlx5_query_mcia(dev, &query, data, status); + } + + int mlx5_query_module_eeprom_by_page(struct mlx5_core_dev *dev, + struct mlx5_module_eeprom_query_params *params, +- u8 *data) ++ u8 *data, u8 *status) + { + int err; + +@@ -460,7 +459,7 @@ int mlx5_query_module_eeprom_by_page(struct mlx5_core_dev *dev, + return -EINVAL; + } + +- return mlx5_query_mcia(dev, params, data); ++ return mlx5_query_mcia(dev, params, data, status); + } + + static int mlx5_query_port_pvlc(struct mlx5_core_dev *dev, u32 *pvlc, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1624-net-mlx5e-recover-sq-on-excessive-ptp-tx-timestamp-delta.patch b/SOURCES/1624-net-mlx5e-recover-sq-on-excessive-ptp-tx-timestamp-delta.patch new file mode 100644 index 000000000..52c7d827e --- /dev/null +++ b/SOURCES/1624-net-mlx5e-recover-sq-on-excessive-ptp-tx-timestamp-delta.patch @@ -0,0 +1,126 @@ +From 37ec5748002c5268991e309587e2a16f8a434b25 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] net/mlx5e: Recover SQ on excessive PTP TX timestamp delta + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 391dad2e686f214932f769847cc8603a7df389eb +Author: Carolina Jubran +Date: Mon Nov 17 23:42:06 2025 +0200 + + net/mlx5e: Recover SQ on excessive PTP TX timestamp delta + + Extend the TX timestamp handler to recover the SQ when the difference + between the port and CQE TX timestamps is abnormally large. + + The current logic aborts timestamp delivery if the delta exceeds + 1/128 seconds, which matches the maximum expected packet interval in + ptp4l. A larger delta makes the timestamps unreliable. + + This change adds recovery if the delta exceeds 0.5 seconds. Such a + large gap should not occur in normal operation and indicates that + firmware is stuck or metadata tracking is out of sync, leading to stale + or mismatched timestamps. Recovering the SQ ensures forward progress + and avoids silently dropping invalid timestamps. + + The timestamp handler now takes mlx5e_ptpsq directly to access both CQ + stats and the recovery state. + + Signed-off-by: Carolina Jubran + Reviewed-by: Shahar Shitrit + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763415729-1238421-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +index 92b57e3aaa85..bd58c1771ac0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +@@ -81,7 +81,7 @@ static struct mlx5e_skb_cb_hwtstamp *mlx5e_skb_cb_get_hwts(struct sk_buff *skb) + } + + static void mlx5e_skb_cb_hwtstamp_tx(struct sk_buff *skb, +- struct mlx5e_ptp_cq_stats *cq_stats) ++ struct mlx5e_ptpsq *ptpsq) + { + struct skb_shared_hwtstamps hwts = {}; + ktime_t diff; +@@ -91,8 +91,17 @@ static void mlx5e_skb_cb_hwtstamp_tx(struct sk_buff *skb, + + /* Maximal allowed diff is 1 / 128 second */ + if (diff > (NSEC_PER_SEC >> 7)) { +- cq_stats->abort++; +- cq_stats->abort_abs_diff_ns += diff; ++ struct mlx5e_txqsq *sq = &ptpsq->txqsq; ++ ++ ptpsq->cq_stats->abort++; ++ ptpsq->cq_stats->abort_abs_diff_ns += diff; ++ if (diff > (NSEC_PER_SEC >> 1) && ++ !test_and_set_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) { ++ netdev_warn(sq->channel->netdev, ++ "PTP TX timestamp difference between CQE and port exceeds threshold: %lld ns, recovering SQ %u\n", ++ (s64)diff, sq->sqn); ++ queue_work(sq->priv->wq, &ptpsq->report_unhealthy_work); ++ } + return; + } + +@@ -102,7 +111,7 @@ static void mlx5e_skb_cb_hwtstamp_tx(struct sk_buff *skb, + + void mlx5e_skb_cb_hwtstamp_handler(struct sk_buff *skb, int hwtstamp_type, + ktime_t hwtstamp, +- struct mlx5e_ptp_cq_stats *cq_stats) ++ struct mlx5e_ptpsq *ptpsq) + { + switch (hwtstamp_type) { + case (MLX5E_SKB_CB_CQE_HWTSTAMP): +@@ -120,7 +129,7 @@ void mlx5e_skb_cb_hwtstamp_handler(struct sk_buff *skb, int hwtstamp_type, + !mlx5e_skb_cb_get_hwts(skb)->port_hwtstamp) + return; + +- mlx5e_skb_cb_hwtstamp_tx(skb, cq_stats); ++ mlx5e_skb_cb_hwtstamp_tx(skb, ptpsq); + memset(skb->cb, 0, sizeof(struct mlx5e_skb_cb_hwtstamp)); + } + +@@ -208,7 +217,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq, + + hwtstamp = mlx5e_cqe_ts_to_ns(sq->ptp_cyc2time, sq->clock, get_cqe_ts(cqe)); + mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_PORT_HWTSTAMP, +- hwtstamp, ptpsq->cq_stats); ++ hwtstamp, ptpsq); + ptpsq->cq_stats->cqe++; + + mlx5e_ptpsq_mark_ts_cqes_undelivered(ptpsq, hwtstamp); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +index 1c0e0a86a9ac..2a457a2ed707 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +@@ -147,7 +147,7 @@ enum { + + void mlx5e_skb_cb_hwtstamp_handler(struct sk_buff *skb, int hwtstamp_type, + ktime_t hwtstamp, +- struct mlx5e_ptp_cq_stats *cq_stats); ++ struct mlx5e_ptpsq *ptpsq); + + void mlx5e_skb_cb_hwtstamp_init(struct sk_buff *skb); + #endif /* __MLX5_EN_PTP_H__ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +index 7ffc1cc7aa7d..6245d2f82afe 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +@@ -749,7 +749,7 @@ static void mlx5e_consume_skb(struct mlx5e_txqsq *sq, struct sk_buff *skb, + hwts.hwtstamp = mlx5e_cqe_ts_to_ns(sq->ptp_cyc2time, sq->clock, ts); + if (sq->ptpsq) { + mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_CQE_HWTSTAMP, +- hwts.hwtstamp, sq->ptpsq->cq_stats); ++ hwts.hwtstamp, sq->ptpsq); + } else { + skb_tstamp_tx(skb, &hwts); + sq->stats->timestamps++; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1625-net-mlx5-remove-redundant-bw-share-minimal-value-assignment.patch b/SOURCES/1625-net-mlx5-remove-redundant-bw-share-minimal-value-assignment.patch new file mode 100644 index 000000000..75d5c77bc --- /dev/null +++ b/SOURCES/1625-net-mlx5-remove-redundant-bw-share-minimal-value-assignment.patch @@ -0,0 +1,49 @@ +From 92bdadb99a3dd8e447494b29a2303172c244fa18 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:29 -0400 +Subject: [PATCH] net/mlx5: Remove redundant bw_share minimal value assignment + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ea3270351c792632db5722ea3ca83b468cebb531 +Author: Carolina Jubran +Date: Mon Nov 17 23:42:07 2025 +0200 + + net/mlx5: Remove redundant bw_share minimal value assignment + + Remove unnecessary logic that sets bw_share to minimal value, when + parent has bw_share configured but nodes don't have min_rate. + + This check is redundant because the parent bandwidth acts as the upper + bound regardless, and the firmware always enforces the topmost + bandwidth constraint. + + Signed-off-by: Carolina Jubran + Reviewed-by: Cosmin Ratiu + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763415729-1238421-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 56e6f54b1e2e..4278bcb04c72 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -341,13 +341,6 @@ static u32 esw_qos_calculate_min_rate_divider(struct mlx5_eswitch *esw, + if (max_guarantee) + return max_t(u32, max_guarantee / fw_max_bw_share, 1); + +- /* If nodes max min_rate divider is 0 but their parent has bw_share +- * configured, then set bw_share for nodes to minimal value. +- */ +- +- if (parent && parent->bw_share) +- return 1; +- + /* If the node nodes has min_rate configured, a divider of 0 sets all + * nodes' bw_share to 0, effectively disabling min guarantees. + */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1626-net-mlx5-abort-new-commands-if-all-command-slots-are-stalled.patch b/SOURCES/1626-net-mlx5-abort-new-commands-if-all-command-slots-are-stalled.patch new file mode 100644 index 000000000..0b689ab7e --- /dev/null +++ b/SOURCES/1626-net-mlx5-abort-new-commands-if-all-command-slots-are-stalled.patch @@ -0,0 +1,134 @@ +From 3014c5aefcf7f51a5cab8c54d694939d8a5580de Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Abort new commands if all command slots are stalled + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit fbb9933666e31f84c62e9620e9ec4d220ee31ab4 +Author: Saeed Mahameed +Date: Mon Nov 17 23:42:08 2025 +0200 + + net/mlx5: Abort new commands if all command slots are stalled + + In case of a FW issue, FW might be not responding to FW commands, + causing kernel lockout for a long period of time, e.g. rtnl_lock held + while ethtool is trying to collect stats waiting for FW to respond to + multiple commands, when all of them will timeout. + + While there's no immediate indication of the FW lockout, we can safely + assume that something is wrong when all command slots are busy and in + a timeout state and no FW completion was received on any of them. + + In such case, start immediately failing new commands. + + Signed-off-by: Saeed Mahameed + Reviewed-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763415729-1238421-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +index 722282cebce9..5b08e5ffe0e2 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +@@ -181,6 +181,7 @@ static int cmd_alloc_index(struct mlx5_cmd *cmd, struct mlx5_cmd_work_ent *ent) + static void cmd_free_index(struct mlx5_cmd *cmd, int idx) + { + lockdep_assert_held(&cmd->alloc_lock); ++ cmd->ent_arr[idx] = NULL; + set_bit(idx, &cmd->vars.bitmask); + } + +@@ -1200,6 +1201,44 @@ static int wait_func(struct mlx5_core_dev *dev, struct mlx5_cmd_work_ent *ent) + return err; + } + ++/* Check if all command slots are stalled (timed out and not recovered). ++ * returns true if all slots timed out on a recent command and have not been ++ * completed by FW yet. (stalled state) ++ * false otherwise (at least one slot is not stalled). ++ * ++ * In such odd situation "all_stalled", this serves as a protection mechanism ++ * to avoid blocking the kernel for long periods of time in case FW is not ++ * responding to commands. ++ */ ++static bool mlx5_cmd_all_stalled(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_cmd *cmd = &dev->cmd; ++ bool all_stalled = true; ++ unsigned long flags; ++ int i; ++ ++ spin_lock_irqsave(&cmd->alloc_lock, flags); ++ ++ /* at least one command slot is free */ ++ if (bitmap_weight(&cmd->vars.bitmask, cmd->vars.max_reg_cmds) > 0) { ++ all_stalled = false; ++ goto out; ++ } ++ ++ for_each_clear_bit(i, &cmd->vars.bitmask, cmd->vars.max_reg_cmds) { ++ struct mlx5_cmd_work_ent *ent = dev->cmd.ent_arr[i]; ++ ++ if (!test_bit(MLX5_CMD_ENT_STATE_TIMEDOUT, &ent->state)) { ++ all_stalled = false; ++ break; ++ } ++ } ++out: ++ spin_unlock_irqrestore(&cmd->alloc_lock, flags); ++ ++ return all_stalled; ++} ++ + /* Notes: + * 1. Callback functions may not sleep + * 2. page queue commands do not support asynchrous completion +@@ -1230,6 +1269,15 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in, + if (callback && page_queue) + return -EINVAL; + ++ if (!page_queue && mlx5_cmd_all_stalled(dev)) { ++ mlx5_core_err_rl(dev, ++ "All CMD slots are stalled, aborting command\n"); ++ /* there's no reason to wait and block the whole kernel if FW ++ * isn't currently responding to all slots, fail immediately ++ */ ++ return -EAGAIN; ++ } ++ + ent = cmd_alloc_ent(cmd, in, out, uout, uout_size, + callback, context, page_queue); + if (IS_ERR(ent)) +@@ -1700,6 +1748,13 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force + if (test_bit(i, &vector)) { + ent = cmd->ent_arr[i]; + ++ if (forced && ent->ret == -ETIMEDOUT) ++ set_bit(MLX5_CMD_ENT_STATE_TIMEDOUT, ++ &ent->state); ++ else if (!forced) /* real FW completion */ ++ clear_bit(MLX5_CMD_ENT_STATE_TIMEDOUT, ++ &ent->state); ++ + /* if we already completed the command, ignore it */ + if (!test_and_clear_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, + &ent->state)) { +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 046396269ccf..7aec53371cf0 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -819,6 +819,7 @@ typedef void (*mlx5_cmd_cbk_t)(int status, void *context); + + enum { + MLX5_CMD_ENT_STATE_PENDING_COMP, ++ MLX5_CMD_ENT_STATE_TIMEDOUT, + }; + + struct mlx5_cmd_work_ent { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1627-net-mlx5-use-eopnotsupp-instead-of-enotsupp.patch b/SOURCES/1627-net-mlx5-use-eopnotsupp-instead-of-enotsupp.patch new file mode 100644 index 000000000..ac5b25cb6 --- /dev/null +++ b/SOURCES/1627-net-mlx5-use-eopnotsupp-instead-of-enotsupp.patch @@ -0,0 +1,113 @@ +From c9316c275fc464f96c66f97cfcef95809cb67940 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Use EOPNOTSUPP instead of ENOTSUPP + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 70ca239b612cd154c9828fe4d0093fb9bd02a6c7 +Author: Tariq Toukan +Date: Mon Nov 17 23:42:09 2025 +0200 + + net/mlx5: Use EOPNOTSUPP instead of ENOTSUPP + + Per Documentation/dev-tools/checkpatch.rst, ENOTSUPP is not a standard + error code and should be avoided. EOPNOTSUPP should be used instead. + + Signed-off-by: Tariq Toukan + Reviewed-by: Gal Pressman + Link: https://patch.msgid.link/1763415729-1238421-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +index 080e7eab52c7..7bcf822a89f9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +@@ -54,7 +54,7 @@ static int mlx5_query_mtrc_caps(struct mlx5_fw_tracer *tracer) + + if (!MLX5_GET(mtrc_cap, out, trace_to_memory)) { + mlx5_core_dbg(dev, "FWTracer: Device does not support logging traces to memory\n"); +- return -ENOTSUPP; ++ return -EOPNOTSUPP; + } + + tracer->trc_ver = MLX5_GET(mtrc_cap, out, trc_ver); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +index 79916f1abd14..63bdef5b4ba5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +@@ -704,7 +704,7 @@ static int validate_flow(struct mlx5e_priv *priv, + num_tuples += ret; + break; + default: +- return -ENOTSUPP; ++ return -EOPNOTSUPP; + } + if ((fs->flow_type & FLOW_EXT)) { + ret = validate_vlan(fs); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c +index e5c1012921d2..1ec61164e6b5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c +@@ -211,7 +211,7 @@ int mlx5_fpga_device_start(struct mlx5_core_dev *mdev) + max_num_qps = MLX5_CAP_FPGA(mdev, shell_caps.max_num_qps); + if (!max_num_qps) { + mlx5_fpga_err(fdev, "FPGA reports 0 QPs in SHELL_CAPS\n"); +- err = -ENOTSUPP; ++ err = -EOPNOTSUPP; + goto out; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c +index d55e15c1f380..304912637c35 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c +@@ -149,7 +149,7 @@ struct mlx5_vxlan *mlx5_vxlan_create(struct mlx5_core_dev *mdev) + struct mlx5_vxlan *vxlan; + + if (!MLX5_CAP_ETH(mdev, tunnel_stateless_vxlan) || !mlx5_core_is_pf(mdev)) +- return ERR_PTR(-ENOTSUPP); ++ return ERR_PTR(-EOPNOTSUPP); + + vxlan = kzalloc(sizeof(*vxlan), GFP_KERNEL); + if (!vxlan) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c +index 65740bb68b09..e8c67ed9f748 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c +@@ -410,7 +410,7 @@ static int dr_domain_caps_init(struct mlx5_core_dev *mdev, + switch (dmn->type) { + case MLX5DR_DOMAIN_TYPE_NIC_RX: + if (!DR_DOMAIN_SW_STEERING_SUPPORTED(dmn, rx)) +- return -ENOTSUPP; ++ return -EOPNOTSUPP; + + dmn->info.supp_sw_steering = true; + dmn->info.rx.type = DR_DOMAIN_NIC_TYPE_RX; +@@ -419,7 +419,7 @@ static int dr_domain_caps_init(struct mlx5_core_dev *mdev, + break; + case MLX5DR_DOMAIN_TYPE_NIC_TX: + if (!DR_DOMAIN_SW_STEERING_SUPPORTED(dmn, tx)) +- return -ENOTSUPP; ++ return -EOPNOTSUPP; + + dmn->info.supp_sw_steering = true; + dmn->info.tx.type = DR_DOMAIN_NIC_TYPE_TX; +@@ -428,10 +428,10 @@ static int dr_domain_caps_init(struct mlx5_core_dev *mdev, + break; + case MLX5DR_DOMAIN_TYPE_FDB: + if (!dmn->info.caps.eswitch_manager) +- return -ENOTSUPP; ++ return -EOPNOTSUPP; + + if (!DR_DOMAIN_SW_STEERING_SUPPORTED(dmn, fdb)) +- return -ENOTSUPP; ++ return -EOPNOTSUPP; + + dmn->info.rx.type = DR_DOMAIN_NIC_TYPE_RX; + dmn->info.tx.type = DR_DOMAIN_NIC_TYPE_TX; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1628-net-mlx5-initialize-events-outside-devlink-lock.patch b/SOURCES/1628-net-mlx5-initialize-events-outside-devlink-lock.patch new file mode 100644 index 000000000..26d68cc69 --- /dev/null +++ b/SOURCES/1628-net-mlx5-initialize-events-outside-devlink-lock.patch @@ -0,0 +1,116 @@ +From 1c5f23db129a2e1115918a8cf4b6cf5062350f2e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Initialize events outside devlink lock + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b6b03097f9826db72aeb3f751774c5e9edd9a5b3 +Author: Cosmin Ratiu +Date: Sun Nov 16 22:45:35 2025 +0200 + + net/mlx5: Initialize events outside devlink lock + + Move event init/cleanup outside of mlx5_init_one() / mlx5_uninit_one() + and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. + + By doing this, we avoid the events being reinitialized on devlink reload + and, more importantly, the events->sw_nh notifier chain becomes + available earlier in the init procedure, which will be used in + subsequent patches. This makes sense because the events struct is pure + software, independent of any HW details. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763325940-1231508-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index b0d8d9888629..f73e1b5e13e3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1010,16 +1010,10 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + goto err_irq_cleanup; + } + +- err = mlx5_events_init(dev); +- if (err) { +- mlx5_core_err(dev, "failed to initialize events\n"); +- goto err_eq_cleanup; +- } +- + err = mlx5_fw_reset_init(dev); + if (err) { + mlx5_core_err(dev, "failed to initialize fw reset events\n"); +- goto err_events_cleanup; ++ goto err_eq_cleanup; + } + + mlx5_cq_debugfs_init(dev); +@@ -1121,8 +1115,6 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + mlx5_cleanup_reserved_gids(dev); + mlx5_cq_debugfs_cleanup(dev); + mlx5_fw_reset_cleanup(dev); +-err_events_cleanup: +- mlx5_events_cleanup(dev); + err_eq_cleanup: + mlx5_eq_table_cleanup(dev); + err_irq_cleanup: +@@ -1155,7 +1147,6 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev) + mlx5_cleanup_reserved_gids(dev); + mlx5_cq_debugfs_cleanup(dev); + mlx5_fw_reset_cleanup(dev); +- mlx5_events_cleanup(dev); + mlx5_eq_table_cleanup(dev); + mlx5_irq_table_cleanup(dev); + mlx5_devcom_unregister_device(dev->priv.devc); +@@ -1833,6 +1824,24 @@ static int vhca_id_show(struct seq_file *file, void *priv) + + DEFINE_SHOW_ATTRIBUTE(vhca_id); + ++static int mlx5_notifiers_init(struct mlx5_core_dev *dev) ++{ ++ int err; ++ ++ err = mlx5_events_init(dev); ++ if (err) { ++ mlx5_core_err(dev, "failed to initialize events\n"); ++ return err; ++ } ++ ++ return 0; ++} ++ ++static void mlx5_notifiers_cleanup(struct mlx5_core_dev *dev) ++{ ++ mlx5_events_cleanup(dev); ++} ++ + int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx) + { + struct mlx5_priv *priv = &dev->priv; +@@ -1888,6 +1897,10 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx) + if (err) + goto err_hca_caps; + ++ err = mlx5_notifiers_init(dev); ++ if (err) ++ goto err_hca_caps; ++ + /* The conjunction of sw_vhca_id with sw_owner_id will be a global + * unique id per function which uses mlx5_core. + * Those values are supplied to FW as part of the init HCA command to +@@ -1930,6 +1943,7 @@ void mlx5_mdev_uninit(struct mlx5_core_dev *dev) + if (priv->sw_vhca_id > 0) + ida_free(&sw_vhca_ida, dev->priv.sw_vhca_id); + ++ mlx5_notifiers_cleanup(dev); + mlx5_hca_caps_free(dev); + mlx5_adev_cleanup(dev); + mlx5_pagealloc_cleanup(dev); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1629-net-mlx5-move-the-esw-mode-notifier-chain-outside-the-devlin.patch b/SOURCES/1629-net-mlx5-move-the-esw-mode-notifier-chain-outside-the-devlin.patch new file mode 100644 index 000000000..92d29c205 --- /dev/null +++ b/SOURCES/1629-net-mlx5-move-the-esw-mode-notifier-chain-outside-the-devlin.patch @@ -0,0 +1,156 @@ +From 6cd2c32c9bfb5ce6ce06065133fe15058050cef4 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Move the esw mode notifier chain outside the + devlink lock + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3fee828789b1cf294a8fc83ad8a37f644c174fae +Author: Cosmin Ratiu +Date: Sun Nov 16 22:45:36 2025 +0200 + + net/mlx5: Move the esw mode notifier chain outside the devlink lock + + The esw mode change notifier chain is initialized/cleaned up in + mlx5_init_one() / mlx5_uninit_one() with the devlink lock held. + + Move the notifier head from the eswitch struct into mlx5_priv directly, + and initialize it outside the critical section. This will allow notifier + registration to happen earlier in the init procedure in subsequent + patches. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763325940-1231508-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index 25af8bd7f077..3adf2b1cd26a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -1474,7 +1474,7 @@ static void mlx5_esw_mode_change_notify(struct mlx5_eswitch *esw, u16 mode) + + info.new_mode = mode; + +- blocking_notifier_call_chain(&esw->n_head, 0, &info); ++ blocking_notifier_call_chain(&esw->dev->priv.esw_n_head, 0, &info); + } + + static int mlx5_esw_egress_acls_init(struct mlx5_core_dev *dev) +@@ -2050,7 +2050,6 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) + esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC; + else + esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE; +- BLOCKING_INIT_NOTIFIER_HEAD(&esw->n_head); + + esw_info(dev, + "Total vports %d, per vport: max uc(%d) max mc(%d)\n", +@@ -2379,14 +2378,16 @@ bool mlx5_esw_multipath_prereq(struct mlx5_core_dev *dev0, + dev1->priv.eswitch->mode == MLX5_ESWITCH_OFFLOADS); + } + +-int mlx5_esw_event_notifier_register(struct mlx5_eswitch *esw, struct notifier_block *nb) ++int mlx5_esw_event_notifier_register(struct mlx5_core_dev *dev, ++ struct notifier_block *nb) + { +- return blocking_notifier_chain_register(&esw->n_head, nb); ++ return blocking_notifier_chain_register(&dev->priv.esw_n_head, nb); + } + +-void mlx5_esw_event_notifier_unregister(struct mlx5_eswitch *esw, struct notifier_block *nb) ++void mlx5_esw_event_notifier_unregister(struct mlx5_core_dev *dev, ++ struct notifier_block *nb) + { +- blocking_notifier_chain_unregister(&esw->n_head, nb); ++ blocking_notifier_chain_unregister(&dev->priv.esw_n_head, nb); + } + + /** +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index beaec450a734..ad1073f7b79f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -403,7 +403,6 @@ struct mlx5_eswitch { + struct { + u32 large_group_num; + } params; +- struct blocking_notifier_head n_head; + struct xarray paired; + struct mlx5_devcom_comp_dev *devcom; + u16 enabled_ipsec_vf_count; +@@ -864,8 +863,10 @@ struct mlx5_esw_event_info { + u16 new_mode; + }; + +-int mlx5_esw_event_notifier_register(struct mlx5_eswitch *esw, struct notifier_block *n); +-void mlx5_esw_event_notifier_unregister(struct mlx5_eswitch *esw, struct notifier_block *n); ++int mlx5_esw_event_notifier_register(struct mlx5_core_dev *dev, ++ struct notifier_block *n); ++void mlx5_esw_event_notifier_unregister(struct mlx5_core_dev *dev, ++ struct notifier_block *n); + + bool mlx5_esw_hold(struct mlx5_core_dev *dev); + void mlx5_esw_release(struct mlx5_core_dev *dev); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index f73e1b5e13e3..eb1bb6ecbff3 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1834,6 +1834,8 @@ static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + return err; + } + ++ BLOCKING_INIT_NOTIFIER_HEAD(&dev->priv.esw_n_head); ++ + return 0; + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +index 3304f25cc805..2ece4983d33f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +@@ -481,7 +481,7 @@ int mlx5_sf_table_init(struct mlx5_core_dev *dev) + xa_init(&table->function_ids); + dev->priv.sf_table = table; + table->esw_nb.notifier_call = mlx5_sf_esw_event; +- err = mlx5_esw_event_notifier_register(dev->priv.eswitch, &table->esw_nb); ++ err = mlx5_esw_event_notifier_register(dev, &table->esw_nb); + if (err) + goto reg_err; + +@@ -496,7 +496,7 @@ int mlx5_sf_table_init(struct mlx5_core_dev *dev) + return 0; + + vhca_err: +- mlx5_esw_event_notifier_unregister(dev->priv.eswitch, &table->esw_nb); ++ mlx5_esw_event_notifier_unregister(dev, &table->esw_nb); + reg_err: + mutex_destroy(&table->sf_state_lock); + kfree(table); +@@ -513,7 +513,7 @@ void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev) + + mlx5_blocking_notifier_unregister(dev, &table->mdev_nb); + mlx5_vhca_event_notifier_unregister(table->dev, &table->vhca_nb); +- mlx5_esw_event_notifier_unregister(dev->priv.eswitch, &table->esw_nb); ++ mlx5_esw_event_notifier_unregister(dev, &table->esw_nb); + mutex_destroy(&table->sf_state_lock); + WARN_ON(!xa_empty(&table->function_ids)); + kfree(table); +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 7aec53371cf0..9a4a5112a59e 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -599,6 +599,7 @@ struct mlx5_priv { + + struct mlx5_flow_steering *steering; + struct mlx5_mpfs *mpfs; ++ struct blocking_notifier_head esw_n_head; + struct mlx5_eswitch *eswitch; + struct mlx5_core_sriov sriov; + struct mlx5_lag *lag; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1630-net-mlx5-move-the-vhca-event-notifier-outside-of-the-devlink.patch b/SOURCES/1630-net-mlx5-move-the-vhca-event-notifier-outside-of-the-devlink.patch new file mode 100644 index 000000000..261f61c09 --- /dev/null +++ b/SOURCES/1630-net-mlx5-move-the-vhca-event-notifier-outside-of-the-devlink.patch @@ -0,0 +1,303 @@ +From 8b70fca3cb47e9d8e587a3bcd5500848d3733c5d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Move the vhca event notifier outside of the devlink + lock + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d3a356db853bc2dfb51034eacafd41aca7dd4c37 +Author: Cosmin Ratiu +Date: Sun Nov 16 22:45:37 2025 +0200 + + net/mlx5: Move the vhca event notifier outside of the devlink lock + + The vhca event notifier consists of an atomic notifier for vhca state + changes (used for SF events), multiple workqueues and a blocking + notifier chain for delivering the vhca state change events for further + processing. + + This patch moves the vhca notifier head outside of mlx5_init_one() / + mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() + functions. + + This allows called notifiers to grab the PF devlink lock which was + previously impossible because it would create a circular lock + dependency. + + mlx5_vhca_event_stop() is now called earlier in the cleanup phase and + flushes the workqueues to ensure that after the call, there are no + pending events. This simplifies the cleanup flow for vhca event + consumers. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763325940-1231508-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index eb1bb6ecbff3..6adaa1514dad 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1438,12 +1438,12 @@ static void mlx5_unload(struct mlx5_core_dev *dev) + { + mlx5_eswitch_disable(dev->priv.eswitch); + mlx5_devlink_traps_unregister(priv_to_devlink(dev)); ++ mlx5_vhca_event_stop(dev); + mlx5_sf_dev_table_destroy(dev); + mlx5_sriov_detach(dev); + mlx5_lag_remove_mdev(dev); + mlx5_ec_cleanup(dev); + mlx5_sf_hw_table_destroy(dev); +- mlx5_vhca_event_stop(dev); + mlx5_fs_core_cleanup(dev); + mlx5_fpga_device_stop(dev); + mlx5_rsc_dump_cleanup(dev); +@@ -1835,6 +1835,7 @@ static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + } + + BLOCKING_INIT_NOTIFIER_HEAD(&dev->priv.esw_n_head); ++ mlx5_vhca_state_notifier_init(dev); + + return 0; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +index 99219ea52c4b..a68a8ee24dce 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +@@ -381,7 +381,6 @@ void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev) + + mlx5_sf_dev_destroy_active_works(table); + mlx5_vhca_event_notifier_unregister(dev, &table->nb); +- mlx5_vhca_event_work_queues_flush(dev); + + /* Now that event handler is not running, it is safe to destroy + * the sf device without race. +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c +index 1f613320fe07..a14b1aa5fb5a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c +@@ -389,6 +389,7 @@ void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev) + return; + + mlx5_vhca_event_notifier_unregister(dev, &table->vhca_nb); ++ + /* Dealloc SFs whose firmware event has been missed. */ + mlx5_sf_hw_table_dealloc_all(table); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c +index cda01ba441ae..b04cf6cf8956 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c +@@ -9,15 +9,9 @@ + #define CREATE_TRACE_POINTS + #include "diag/vhca_tracepoint.h" + +-struct mlx5_vhca_state_notifier { +- struct mlx5_core_dev *dev; +- struct mlx5_nb nb; +- struct blocking_notifier_head n_head; +-}; +- + struct mlx5_vhca_event_work { + struct work_struct work; +- struct mlx5_vhca_state_notifier *notifier; ++ struct mlx5_core_dev *dev; + struct mlx5_vhca_state_event event; + }; + +@@ -95,16 +89,14 @@ mlx5_vhca_event_notify(struct mlx5_core_dev *dev, struct mlx5_vhca_state_event * + mlx5_vhca_event_arm(dev, event->function_id); + trace_mlx5_sf_vhca_event(dev, event); + +- blocking_notifier_call_chain(&dev->priv.vhca_state_notifier->n_head, 0, event); ++ blocking_notifier_call_chain(&dev->priv.vhca_state_n_head, 0, event); + } + + static void mlx5_vhca_state_work_handler(struct work_struct *_work) + { + struct mlx5_vhca_event_work *work = container_of(_work, struct mlx5_vhca_event_work, work); +- struct mlx5_vhca_state_notifier *notifier = work->notifier; +- struct mlx5_core_dev *dev = notifier->dev; + +- mlx5_vhca_event_notify(dev, &work->event); ++ mlx5_vhca_event_notify(work->dev, &work->event); + kfree(work); + } + +@@ -116,8 +108,8 @@ void mlx5_vhca_events_work_enqueue(struct mlx5_core_dev *dev, int idx, struct wo + static int + mlx5_vhca_state_change_notifier(struct notifier_block *nb, unsigned long type, void *data) + { +- struct mlx5_vhca_state_notifier *notifier = +- mlx5_nb_cof(nb, struct mlx5_vhca_state_notifier, nb); ++ struct mlx5_core_dev *dev = mlx5_nb_cof(nb, struct mlx5_core_dev, ++ priv.vhca_state_nb); + struct mlx5_vhca_event_work *work; + struct mlx5_eqe *eqe = data; + int wq_idx; +@@ -126,10 +118,10 @@ mlx5_vhca_state_change_notifier(struct notifier_block *nb, unsigned long type, v + if (!work) + return NOTIFY_DONE; + INIT_WORK(&work->work, &mlx5_vhca_state_work_handler); +- work->notifier = notifier; ++ work->dev = dev; + work->event.function_id = be16_to_cpu(eqe->data.vhca_state.function_id); + wq_idx = work->event.function_id % MLX5_DEV_MAX_WQS; +- mlx5_vhca_events_work_enqueue(notifier->dev, wq_idx, &work->work); ++ mlx5_vhca_events_work_enqueue(dev, wq_idx, &work->work); + return NOTIFY_OK; + } + +@@ -145,9 +137,15 @@ void mlx5_vhca_state_cap_handle(struct mlx5_core_dev *dev, void *set_hca_cap) + MLX5_SET(cmd_hca_cap, set_hca_cap, event_on_vhca_state_teardown_request, 1); + } + ++void mlx5_vhca_state_notifier_init(struct mlx5_core_dev *dev) ++{ ++ BLOCKING_INIT_NOTIFIER_HEAD(&dev->priv.vhca_state_n_head); ++ MLX5_NB_INIT(&dev->priv.vhca_state_nb, mlx5_vhca_state_change_notifier, ++ VHCA_STATE_CHANGE); ++} ++ + int mlx5_vhca_event_init(struct mlx5_core_dev *dev) + { +- struct mlx5_vhca_state_notifier *notifier; + char wq_name[MLX5_CMD_WQ_MAX_NAME]; + struct mlx5_vhca_events *events; + int err, i; +@@ -160,7 +158,6 @@ int mlx5_vhca_event_init(struct mlx5_core_dev *dev) + return -ENOMEM; + + events->dev = dev; +- dev->priv.vhca_events = events; + for (i = 0; i < MLX5_DEV_MAX_WQS; i++) { + snprintf(wq_name, MLX5_CMD_WQ_MAX_NAME, "mlx5_vhca_event%d", i); + events->handler[i].wq = create_singlethread_workqueue(wq_name); +@@ -169,20 +166,10 @@ int mlx5_vhca_event_init(struct mlx5_core_dev *dev) + goto err_create_wq; + } + } ++ dev->priv.vhca_events = events; + +- notifier = kzalloc(sizeof(*notifier), GFP_KERNEL); +- if (!notifier) { +- err = -ENOMEM; +- goto err_notifier; +- } +- +- dev->priv.vhca_state_notifier = notifier; +- notifier->dev = dev; +- BLOCKING_INIT_NOTIFIER_HEAD(¬ifier->n_head); +- MLX5_NB_INIT(¬ifier->nb, mlx5_vhca_state_change_notifier, VHCA_STATE_CHANGE); + return 0; + +-err_notifier: + err_create_wq: + for (--i; i >= 0; i--) + destroy_workqueue(events->handler[i].wq); +@@ -211,8 +198,6 @@ void mlx5_vhca_event_cleanup(struct mlx5_core_dev *dev) + if (!mlx5_vhca_event_supported(dev)) + return; + +- kfree(dev->priv.vhca_state_notifier); +- dev->priv.vhca_state_notifier = NULL; + vhca_events = dev->priv.vhca_events; + for (i = 0; i < MLX5_DEV_MAX_WQS; i++) + destroy_workqueue(vhca_events->handler[i].wq); +@@ -221,34 +206,30 @@ void mlx5_vhca_event_cleanup(struct mlx5_core_dev *dev) + + void mlx5_vhca_event_start(struct mlx5_core_dev *dev) + { +- struct mlx5_vhca_state_notifier *notifier; +- +- if (!dev->priv.vhca_state_notifier) ++ if (!mlx5_vhca_event_supported(dev)) + return; + +- notifier = dev->priv.vhca_state_notifier; +- mlx5_eq_notifier_register(dev, ¬ifier->nb); ++ mlx5_eq_notifier_register(dev, &dev->priv.vhca_state_nb); + } + + void mlx5_vhca_event_stop(struct mlx5_core_dev *dev) + { +- struct mlx5_vhca_state_notifier *notifier; +- +- if (!dev->priv.vhca_state_notifier) ++ if (!mlx5_vhca_event_supported(dev)) + return; + +- notifier = dev->priv.vhca_state_notifier; +- mlx5_eq_notifier_unregister(dev, ¬ifier->nb); ++ mlx5_eq_notifier_unregister(dev, &dev->priv.vhca_state_nb); ++ ++ /* Flush workqueues of all pending events. */ ++ mlx5_vhca_event_work_queues_flush(dev); + } + + int mlx5_vhca_event_notifier_register(struct mlx5_core_dev *dev, struct notifier_block *nb) + { +- if (!dev->priv.vhca_state_notifier) +- return -EOPNOTSUPP; +- return blocking_notifier_chain_register(&dev->priv.vhca_state_notifier->n_head, nb); ++ return blocking_notifier_chain_register(&dev->priv.vhca_state_n_head, ++ nb); + } + + void mlx5_vhca_event_notifier_unregister(struct mlx5_core_dev *dev, struct notifier_block *nb) + { +- blocking_notifier_chain_unregister(&dev->priv.vhca_state_notifier->n_head, nb); ++ blocking_notifier_chain_unregister(&dev->priv.vhca_state_n_head, nb); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h +index 1725ba64f8af..52790423874c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h +@@ -18,6 +18,7 @@ static inline bool mlx5_vhca_event_supported(const struct mlx5_core_dev *dev) + } + + void mlx5_vhca_state_cap_handle(struct mlx5_core_dev *dev, void *set_hca_cap); ++void mlx5_vhca_state_notifier_init(struct mlx5_core_dev *dev); + int mlx5_vhca_event_init(struct mlx5_core_dev *dev); + void mlx5_vhca_event_cleanup(struct mlx5_core_dev *dev); + void mlx5_vhca_event_start(struct mlx5_core_dev *dev); +@@ -37,6 +38,10 @@ static inline void mlx5_vhca_state_cap_handle(struct mlx5_core_dev *dev, void *s + { + } + ++static inline void mlx5_vhca_state_notifier_init(struct mlx5_core_dev *dev) ++{ ++} ++ + static inline int mlx5_vhca_event_init(struct mlx5_core_dev *dev) + { + return 0; +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 9a4a5112a59e..88afb2788dc9 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -488,7 +488,6 @@ struct mlx5_devcom_dev; + struct mlx5_fw_reset; + struct mlx5_eq_table; + struct mlx5_irq_table; +-struct mlx5_vhca_state_notifier; + struct mlx5_sf_dev_table; + struct mlx5_sf_hw_table; + struct mlx5_sf_table; +@@ -615,7 +614,8 @@ struct mlx5_priv { + struct mlx5_bfreg_data bfregs; + struct mlx5_sq_bfreg bfreg; + #ifdef CONFIG_MLX5_SF +- struct mlx5_vhca_state_notifier *vhca_state_notifier; ++ struct mlx5_nb vhca_state_nb; ++ struct blocking_notifier_head vhca_state_n_head; + struct mlx5_sf_dev_table *sf_dev_table; + struct mlx5_core_dev *parent_mdev; + #endif +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1631-net-mlx5-move-the-sf-hw-table-notifier-outside-the-devlink-l.patch b/SOURCES/1631-net-mlx5-move-the-sf-hw-table-notifier-outside-the-devlink-l.patch new file mode 100644 index 000000000..5dbd74984 --- /dev/null +++ b/SOURCES/1631-net-mlx5-move-the-sf-hw-table-notifier-outside-the-devlink-l.patch @@ -0,0 +1,295 @@ +From 53ae24ef64e0191f8ec9348bd4dc4f98e939bfc4 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Move the SF HW table notifier outside the devlink + lock + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e63c9c5f0a4802deea81a48c2c40d0af56153e8a +Author: Cosmin Ratiu +Date: Sun Nov 16 22:45:38 2025 +0200 + + net/mlx5: Move the SF HW table notifier outside the devlink lock + + Move the SF HW table notifier registration/unregistration outside of + mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / + mlx5_mdev_uninit() functions. + + This is only done for non-SFs, since SFs do not have a SF HW table + themselves. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763325940-1231508-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 6adaa1514dad..91cb5b45300f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1377,12 +1377,6 @@ static int mlx5_load(struct mlx5_core_dev *dev) + + mlx5_vhca_event_start(dev); + +- err = mlx5_sf_hw_table_create(dev); +- if (err) { +- mlx5_core_err(dev, "sf table create failed %d\n", err); +- goto err_vhca; +- } +- + err = mlx5_ec_init(dev); + if (err) { + mlx5_core_err(dev, "Failed to init embedded CPU\n"); +@@ -1411,8 +1405,6 @@ static int mlx5_load(struct mlx5_core_dev *dev) + mlx5_lag_remove_mdev(dev); + mlx5_ec_cleanup(dev); + err_ec: +- mlx5_sf_hw_table_destroy(dev); +-err_vhca: + mlx5_vhca_event_stop(dev); + err_set_hca: + mlx5_fs_core_cleanup(dev); +@@ -1837,11 +1829,20 @@ static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + BLOCKING_INIT_NOTIFIER_HEAD(&dev->priv.esw_n_head); + mlx5_vhca_state_notifier_init(dev); + ++ err = mlx5_sf_hw_notifier_init(dev); ++ if (err) ++ goto err_sf_hw_notifier; ++ + return 0; ++ ++err_sf_hw_notifier: ++ mlx5_events_cleanup(dev); ++ return err; + } + + static void mlx5_notifiers_cleanup(struct mlx5_core_dev *dev) + { ++ mlx5_sf_hw_notifier_cleanup(dev); + mlx5_events_cleanup(dev); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c +index a14b1aa5fb5a..bd968f3b3855 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c +@@ -30,9 +30,7 @@ enum mlx5_sf_hwc_index { + }; + + struct mlx5_sf_hw_table { +- struct mlx5_core_dev *dev; + struct mutex table_lock; /* Serializes sf deletion and vhca state change handler. */ +- struct notifier_block vhca_nb; + struct mlx5_sf_hwc_table hwc[MLX5_SF_HWC_MAX]; + }; + +@@ -71,14 +69,16 @@ mlx5_sf_table_fn_to_hwc(struct mlx5_sf_hw_table *table, u16 fn_id) + return NULL; + } + +-static int mlx5_sf_hw_table_id_alloc(struct mlx5_sf_hw_table *table, u32 controller, ++static int mlx5_sf_hw_table_id_alloc(struct mlx5_core_dev *dev, ++ struct mlx5_sf_hw_table *table, ++ u32 controller, + u32 usr_sfnum) + { + struct mlx5_sf_hwc_table *hwc; + int free_idx = -1; + int i; + +- hwc = mlx5_sf_controller_to_hwc(table->dev, controller); ++ hwc = mlx5_sf_controller_to_hwc(dev, controller); + if (!hwc->sfs) + return -ENOSPC; + +@@ -100,11 +100,13 @@ static int mlx5_sf_hw_table_id_alloc(struct mlx5_sf_hw_table *table, u32 control + return free_idx; + } + +-static void mlx5_sf_hw_table_id_free(struct mlx5_sf_hw_table *table, u32 controller, int id) ++static void mlx5_sf_hw_table_id_free(struct mlx5_core_dev *dev, ++ struct mlx5_sf_hw_table *table, ++ u32 controller, int id) + { + struct mlx5_sf_hwc_table *hwc; + +- hwc = mlx5_sf_controller_to_hwc(table->dev, controller); ++ hwc = mlx5_sf_controller_to_hwc(dev, controller); + hwc->sfs[id].allocated = false; + hwc->sfs[id].pending_delete = false; + } +@@ -120,7 +122,7 @@ int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 controller, u32 usr + return -EOPNOTSUPP; + + mutex_lock(&table->table_lock); +- sw_id = mlx5_sf_hw_table_id_alloc(table, controller, usr_sfnum); ++ sw_id = mlx5_sf_hw_table_id_alloc(dev, table, controller, usr_sfnum); + if (sw_id < 0) { + err = sw_id; + goto exist_err; +@@ -151,7 +153,7 @@ int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 controller, u32 usr + vhca_err: + mlx5_cmd_dealloc_sf(dev, hw_fn_id); + err: +- mlx5_sf_hw_table_id_free(table, controller, sw_id); ++ mlx5_sf_hw_table_id_free(dev, table, controller, sw_id); + exist_err: + mutex_unlock(&table->table_lock); + return err; +@@ -165,7 +167,7 @@ void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u32 controller, u16 id) + mutex_lock(&table->table_lock); + hw_fn_id = mlx5_sf_sw_to_hw_id(dev, controller, id); + mlx5_cmd_dealloc_sf(dev, hw_fn_id); +- mlx5_sf_hw_table_id_free(table, controller, id); ++ mlx5_sf_hw_table_id_free(dev, table, controller, id); + mutex_unlock(&table->table_lock); + } + +@@ -216,10 +218,12 @@ static void mlx5_sf_hw_table_hwc_dealloc_all(struct mlx5_core_dev *dev, + } + } + +-static void mlx5_sf_hw_table_dealloc_all(struct mlx5_sf_hw_table *table) ++static void mlx5_sf_hw_table_dealloc_all(struct mlx5_core_dev *dev, ++ struct mlx5_sf_hw_table *table) + { +- mlx5_sf_hw_table_hwc_dealloc_all(table->dev, &table->hwc[MLX5_SF_HWC_EXTERNAL]); +- mlx5_sf_hw_table_hwc_dealloc_all(table->dev, &table->hwc[MLX5_SF_HWC_LOCAL]); ++ mlx5_sf_hw_table_hwc_dealloc_all(dev, ++ &table->hwc[MLX5_SF_HWC_EXTERNAL]); ++ mlx5_sf_hw_table_hwc_dealloc_all(dev, &table->hwc[MLX5_SF_HWC_LOCAL]); + } + + static int mlx5_sf_hw_table_hwc_init(struct mlx5_sf_hwc_table *hwc, u16 max_fn, u16 base_id) +@@ -301,7 +305,6 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) + } + + mutex_init(&table->table_lock); +- table->dev = dev; + dev->priv.sf_hw_table = table; + + base_id = mlx5_sf_start_function_id(dev); +@@ -338,19 +341,22 @@ void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev) + mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_LOCAL]); + mutex_destroy(&table->table_lock); + kfree(table); ++ dev->priv.sf_hw_table = NULL; + res_unregister: + mlx5_sf_hw_table_res_unregister(dev); + } + + static int mlx5_sf_hw_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data) + { +- struct mlx5_sf_hw_table *table = container_of(nb, struct mlx5_sf_hw_table, vhca_nb); ++ struct mlx5_core_dev *dev = container_of(nb, struct mlx5_core_dev, ++ priv.sf_hw_table_vhca_nb); ++ struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; + const struct mlx5_vhca_state_event *event = data; + struct mlx5_sf_hwc_table *hwc; + struct mlx5_sf_hw *sf_hw; + u16 sw_id; + +- if (event->new_vhca_state != MLX5_VHCA_STATE_ALLOCATED) ++ if (!table || event->new_vhca_state != MLX5_VHCA_STATE_ALLOCATED) + return 0; + + hwc = mlx5_sf_table_fn_to_hwc(table, event->function_id); +@@ -365,20 +371,28 @@ static int mlx5_sf_hw_vhca_event(struct notifier_block *nb, unsigned long opcode + * Hence recycle the sf hardware id for reuse. + */ + if (sf_hw->allocated && sf_hw->pending_delete) +- mlx5_sf_hw_table_hwc_sf_free(table->dev, hwc, sw_id); ++ mlx5_sf_hw_table_hwc_sf_free(dev, hwc, sw_id); + mutex_unlock(&table->table_lock); + return 0; + } + +-int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev) ++int mlx5_sf_hw_notifier_init(struct mlx5_core_dev *dev) + { +- struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; +- +- if (!table) ++ if (mlx5_core_is_sf(dev)) + return 0; + +- table->vhca_nb.notifier_call = mlx5_sf_hw_vhca_event; +- return mlx5_vhca_event_notifier_register(dev, &table->vhca_nb); ++ dev->priv.sf_hw_table_vhca_nb.notifier_call = mlx5_sf_hw_vhca_event; ++ return mlx5_vhca_event_notifier_register(dev, ++ &dev->priv.sf_hw_table_vhca_nb); ++} ++ ++void mlx5_sf_hw_notifier_cleanup(struct mlx5_core_dev *dev) ++{ ++ if (mlx5_core_is_sf(dev)) ++ return; ++ ++ mlx5_vhca_event_notifier_unregister(dev, ++ &dev->priv.sf_hw_table_vhca_nb); + } + + void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev) +@@ -388,10 +402,8 @@ void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev) + if (!table) + return; + +- mlx5_vhca_event_notifier_unregister(dev, &table->vhca_nb); +- + /* Dealloc SFs whose firmware event has been missed. */ +- mlx5_sf_hw_table_dealloc_all(table); ++ mlx5_sf_hw_table_dealloc_all(dev, table); + } + + bool mlx5_sf_hw_table_supported(const struct mlx5_core_dev *dev) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h +index 89559a37997a..3922dacffae8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h +@@ -12,7 +12,8 @@ + int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev); + void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev); + +-int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev); ++int mlx5_sf_hw_notifier_init(struct mlx5_core_dev *dev); ++void mlx5_sf_hw_notifier_cleanup(struct mlx5_core_dev *dev); + void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev); + + int mlx5_sf_table_init(struct mlx5_core_dev *dev); +@@ -44,11 +45,15 @@ static inline void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev) + { + } + +-static inline int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev) ++static inline int mlx5_sf_hw_notifier_init(struct mlx5_core_dev *dev) + { + return 0; + } + ++static inline void mlx5_sf_hw_notifier_cleanup(struct mlx5_core_dev *dev) ++{ ++} ++ + static inline void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev) + { + } +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 88afb2788dc9..d6c5bcebdaca 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -620,6 +620,7 @@ struct mlx5_priv { + struct mlx5_core_dev *parent_mdev; + #endif + #ifdef CONFIG_MLX5_SF_MANAGER ++ struct notifier_block sf_hw_table_vhca_nb; + struct mlx5_sf_hw_table *sf_hw_table; + struct mlx5_sf_table *sf_table; + #endif +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1632-net-mlx5-move-the-sf-table-notifiers-outside-the-devlink-loc.patch b/SOURCES/1632-net-mlx5-move-the-sf-table-notifiers-outside-the-devlink-loc.patch new file mode 100644 index 000000000..041d51210 --- /dev/null +++ b/SOURCES/1632-net-mlx5-move-the-sf-table-notifiers-outside-the-devlink-loc.patch @@ -0,0 +1,278 @@ +From 0501aeebaf52fad09275228fd5f63cccc4a0b9ba Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Move the SF table notifiers outside the devlink + lock + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d4a0acbd94c2a93bf308a9fde9ab6719f5d98c7a +Author: Cosmin Ratiu +Date: Sun Nov 16 22:45:39 2025 +0200 + + net/mlx5: Move the SF table notifiers outside the devlink lock + + Move the SF table notifiers registration/unregistration outside of + mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / + mlx5_mdev_uninit() functions. + + This is only done for non-SFs, since SFs do not have a SF table + themselves and thus don't need notifiers. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763325940-1231508-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 91cb5b45300f..240c30d380f5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1833,8 +1833,14 @@ static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + if (err) + goto err_sf_hw_notifier; + ++ err = mlx5_sf_notifiers_init(dev); ++ if (err) ++ goto err_sf_notifiers; ++ + return 0; + ++err_sf_notifiers: ++ mlx5_sf_hw_notifier_cleanup(dev); + err_sf_hw_notifier: + mlx5_events_cleanup(dev); + return err; +@@ -1842,6 +1848,7 @@ static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + + static void mlx5_notifiers_cleanup(struct mlx5_core_dev *dev) + { ++ mlx5_sf_notifiers_cleanup(dev); + mlx5_sf_hw_notifier_cleanup(dev); + mlx5_events_cleanup(dev); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +index 2ece4983d33f..b82323b8449e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +@@ -31,9 +31,6 @@ struct mlx5_sf_table { + struct mlx5_core_dev *dev; /* To refer from notifier context. */ + struct xarray function_ids; /* function id based lookup. */ + struct mutex sf_state_lock; /* Serializes sf state among user cmds & vhca event handler. */ +- struct notifier_block esw_nb; +- struct notifier_block vhca_nb; +- struct notifier_block mdev_nb; + }; + + static struct mlx5_sf * +@@ -391,11 +388,16 @@ static bool mlx5_sf_state_update_check(const struct mlx5_sf *sf, u8 new_state) + + static int mlx5_sf_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data) + { +- struct mlx5_sf_table *table = container_of(nb, struct mlx5_sf_table, vhca_nb); ++ struct mlx5_core_dev *dev = container_of(nb, struct mlx5_core_dev, ++ priv.sf_table_vhca_nb); ++ struct mlx5_sf_table *table = dev->priv.sf_table; + const struct mlx5_vhca_state_event *event = data; + bool update = false; + struct mlx5_sf *sf; + ++ if (!table) ++ return 0; ++ + mutex_lock(&table->sf_state_lock); + sf = mlx5_sf_lookup_by_function_id(table, event->function_id); + if (!sf) +@@ -407,7 +409,7 @@ static int mlx5_sf_vhca_event(struct notifier_block *nb, unsigned long opcode, v + update = mlx5_sf_state_update_check(sf, event->new_vhca_state); + if (update) + sf->hw_state = event->new_vhca_state; +- trace_mlx5_sf_update_state(table->dev, sf->port_index, sf->controller, ++ trace_mlx5_sf_update_state(dev, sf->port_index, sf->controller, + sf->hw_fn_id, sf->hw_state); + unlock: + mutex_unlock(&table->sf_state_lock); +@@ -425,12 +427,16 @@ static void mlx5_sf_del_all(struct mlx5_sf_table *table) + + static int mlx5_sf_esw_event(struct notifier_block *nb, unsigned long event, void *data) + { +- struct mlx5_sf_table *table = container_of(nb, struct mlx5_sf_table, esw_nb); ++ struct mlx5_core_dev *dev = container_of(nb, struct mlx5_core_dev, ++ priv.sf_table_esw_nb); + const struct mlx5_esw_event_info *mode = data; + ++ if (!dev->priv.sf_table) ++ return 0; ++ + switch (mode->new_mode) { + case MLX5_ESWITCH_LEGACY: +- mlx5_sf_del_all(table); ++ mlx5_sf_del_all(dev->priv.sf_table); + break; + default: + break; +@@ -441,15 +447,16 @@ static int mlx5_sf_esw_event(struct notifier_block *nb, unsigned long event, voi + + static int mlx5_sf_mdev_event(struct notifier_block *nb, unsigned long event, void *data) + { +- struct mlx5_sf_table *table = container_of(nb, struct mlx5_sf_table, mdev_nb); ++ struct mlx5_core_dev *dev = container_of(nb, struct mlx5_core_dev, ++ priv.sf_table_mdev_nb); + struct mlx5_sf_peer_devlink_event_ctx *event_ctx = data; ++ struct mlx5_sf_table *table = dev->priv.sf_table; + int ret = NOTIFY_DONE; + struct mlx5_sf *sf; + +- if (event != MLX5_DRIVER_EVENT_SF_PEER_DEVLINK) ++ if (!table || event != MLX5_DRIVER_EVENT_SF_PEER_DEVLINK) + return NOTIFY_DONE; + +- + mutex_lock(&table->sf_state_lock); + sf = mlx5_sf_lookup_by_function_id(table, event_ctx->fn_id); + if (!sf) +@@ -464,10 +471,40 @@ static int mlx5_sf_mdev_event(struct notifier_block *nb, unsigned long event, vo + return ret; + } + ++int mlx5_sf_notifiers_init(struct mlx5_core_dev *dev) ++{ ++ int err; ++ ++ if (mlx5_core_is_sf(dev)) ++ return 0; ++ ++ dev->priv.sf_table_esw_nb.notifier_call = mlx5_sf_esw_event; ++ err = mlx5_esw_event_notifier_register(dev, &dev->priv.sf_table_esw_nb); ++ if (err) ++ return err; ++ ++ dev->priv.sf_table_vhca_nb.notifier_call = mlx5_sf_vhca_event; ++ err = mlx5_vhca_event_notifier_register(dev, ++ &dev->priv.sf_table_vhca_nb); ++ if (err) ++ goto vhca_err; ++ ++ dev->priv.sf_table_mdev_nb.notifier_call = mlx5_sf_mdev_event; ++ err = mlx5_blocking_notifier_register(dev, &dev->priv.sf_table_mdev_nb); ++ if (err) ++ goto mdev_err; ++ ++ return 0; ++mdev_err: ++ mlx5_vhca_event_notifier_unregister(dev, &dev->priv.sf_table_vhca_nb); ++vhca_err: ++ mlx5_esw_event_notifier_unregister(dev, &dev->priv.sf_table_esw_nb); ++ return err; ++} ++ + int mlx5_sf_table_init(struct mlx5_core_dev *dev) + { + struct mlx5_sf_table *table; +- int err; + + if (!mlx5_sf_table_supported(dev) || !mlx5_vhca_event_supported(dev)) + return 0; +@@ -480,28 +517,18 @@ int mlx5_sf_table_init(struct mlx5_core_dev *dev) + table->dev = dev; + xa_init(&table->function_ids); + dev->priv.sf_table = table; +- table->esw_nb.notifier_call = mlx5_sf_esw_event; +- err = mlx5_esw_event_notifier_register(dev, &table->esw_nb); +- if (err) +- goto reg_err; +- +- table->vhca_nb.notifier_call = mlx5_sf_vhca_event; +- err = mlx5_vhca_event_notifier_register(table->dev, &table->vhca_nb); +- if (err) +- goto vhca_err; +- +- table->mdev_nb.notifier_call = mlx5_sf_mdev_event; +- mlx5_blocking_notifier_register(dev, &table->mdev_nb); + + return 0; ++} + +-vhca_err: +- mlx5_esw_event_notifier_unregister(dev, &table->esw_nb); +-reg_err: +- mutex_destroy(&table->sf_state_lock); +- kfree(table); +- dev->priv.sf_table = NULL; +- return err; ++void mlx5_sf_notifiers_cleanup(struct mlx5_core_dev *dev) ++{ ++ if (mlx5_core_is_sf(dev)) ++ return; ++ ++ mlx5_blocking_notifier_unregister(dev, &dev->priv.sf_table_mdev_nb); ++ mlx5_vhca_event_notifier_unregister(dev, &dev->priv.sf_table_vhca_nb); ++ mlx5_esw_event_notifier_unregister(dev, &dev->priv.sf_table_esw_nb); + } + + void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev) +@@ -511,9 +538,6 @@ void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev) + if (!table) + return; + +- mlx5_blocking_notifier_unregister(dev, &table->mdev_nb); +- mlx5_vhca_event_notifier_unregister(table->dev, &table->vhca_nb); +- mlx5_esw_event_notifier_unregister(dev, &table->esw_nb); + mutex_destroy(&table->sf_state_lock); + WARN_ON(!xa_empty(&table->function_ids)); + kfree(table); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h +index 3922dacffae8..d8a934a0e968 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h +@@ -16,7 +16,9 @@ int mlx5_sf_hw_notifier_init(struct mlx5_core_dev *dev); + void mlx5_sf_hw_notifier_cleanup(struct mlx5_core_dev *dev); + void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev); + ++int mlx5_sf_notifiers_init(struct mlx5_core_dev *dev); + int mlx5_sf_table_init(struct mlx5_core_dev *dev); ++void mlx5_sf_notifiers_cleanup(struct mlx5_core_dev *dev); + void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev); + bool mlx5_sf_table_empty(const struct mlx5_core_dev *dev); + +@@ -58,11 +60,20 @@ static inline void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev) + { + } + ++static inline int mlx5_sf_notifiers_init(struct mlx5_core_dev *dev) ++{ ++ return 0; ++} ++ + static inline int mlx5_sf_table_init(struct mlx5_core_dev *dev) + { + return 0; + } + ++static inline void mlx5_sf_notifiers_cleanup(struct mlx5_core_dev *dev) ++{ ++} ++ + static inline void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev) + { + } +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index d6c5bcebdaca..6af62047a614 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -622,6 +622,9 @@ struct mlx5_priv { + #ifdef CONFIG_MLX5_SF_MANAGER + struct notifier_block sf_hw_table_vhca_nb; + struct mlx5_sf_hw_table *sf_hw_table; ++ struct notifier_block sf_table_esw_nb; ++ struct notifier_block sf_table_vhca_nb; ++ struct notifier_block sf_table_mdev_nb; + struct mlx5_sf_table *sf_table; + #endif + struct blocking_notifier_head lag_nh; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1633-net-mlx5-move-sf-dev-table-notifier-registration-outside-the.patch b/SOURCES/1633-net-mlx5-move-sf-dev-table-notifier-registration-outside-the.patch new file mode 100644 index 000000000..694771c39 --- /dev/null +++ b/SOURCES/1633-net-mlx5-move-sf-dev-table-notifier-registration-outside-the.patch @@ -0,0 +1,241 @@ +From 8eac199fc546e3d296c918e5395ce4e5b94c2254 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5: Move SF dev table notifier registration outside the + PF devlink lock + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 64ad6470c882fcaecfa4a1da96ea94de7ca0dc80 +Author: Cosmin Ratiu +Date: Sun Nov 16 22:45:40 2025 +0200 + + net/mlx5: Move SF dev table notifier registration outside the PF devlink lock + + This completes the previous patches by moving notifier registration for + SF dev tables outside the devlink locked critical section in + mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / + mlx5_mdev_uninit() functions. + + This is only done for non-SFs, since SFs do not have a SF HW table + themselves. + + After this patch, notifiers can grab the PF devlink lock (soon to be + necessary) without creating a locking cycle. + + Signed-off-by: Cosmin Ratiu + Reviewed-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1763325940-1231508-7-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 240c30d380f5..a0f937c29891 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1837,8 +1837,14 @@ static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + if (err) + goto err_sf_notifiers; + ++ err = mlx5_sf_dev_notifier_init(dev); ++ if (err) ++ goto err_sf_dev_notifier; ++ + return 0; + ++err_sf_dev_notifier: ++ mlx5_sf_notifiers_cleanup(dev); + err_sf_notifiers: + mlx5_sf_hw_notifier_cleanup(dev); + err_sf_hw_notifier: +@@ -1848,6 +1854,7 @@ static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + + static void mlx5_notifiers_cleanup(struct mlx5_core_dev *dev) + { ++ mlx5_sf_dev_notifier_cleanup(dev); + mlx5_sf_notifiers_cleanup(dev); + mlx5_sf_hw_notifier_cleanup(dev); + mlx5_events_cleanup(dev); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +index a68a8ee24dce..f310bde3d11f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +@@ -16,7 +16,6 @@ struct mlx5_sf_dev_table { + struct xarray devices; + phys_addr_t base_address; + u64 sf_bar_length; +- struct notifier_block nb; + struct workqueue_struct *active_wq; + struct work_struct work; + u8 stop_active_wq:1; +@@ -156,18 +155,23 @@ static void mlx5_sf_dev_del(struct mlx5_core_dev *dev, struct mlx5_sf_dev *sf_de + static int + mlx5_sf_dev_state_change_handler(struct notifier_block *nb, unsigned long event_code, void *data) + { +- struct mlx5_sf_dev_table *table = container_of(nb, struct mlx5_sf_dev_table, nb); ++ struct mlx5_core_dev *dev = container_of(nb, struct mlx5_core_dev, ++ priv.sf_dev_nb); ++ struct mlx5_sf_dev_table *table = dev->priv.sf_dev_table; + const struct mlx5_vhca_state_event *event = data; + struct mlx5_sf_dev *sf_dev; + u16 max_functions; + u16 sf_index; + u16 base_id; + +- max_functions = mlx5_sf_max_functions(table->dev); ++ if (!table) ++ return 0; ++ ++ max_functions = mlx5_sf_max_functions(dev); + if (!max_functions) + return 0; + +- base_id = mlx5_sf_start_function_id(table->dev); ++ base_id = mlx5_sf_start_function_id(dev); + if (event->function_id < base_id || event->function_id >= (base_id + max_functions)) + return 0; + +@@ -177,19 +181,19 @@ mlx5_sf_dev_state_change_handler(struct notifier_block *nb, unsigned long event_ + case MLX5_VHCA_STATE_INVALID: + case MLX5_VHCA_STATE_ALLOCATED: + if (sf_dev) +- mlx5_sf_dev_del(table->dev, sf_dev, sf_index); ++ mlx5_sf_dev_del(dev, sf_dev, sf_index); + break; + case MLX5_VHCA_STATE_TEARDOWN_REQUEST: + if (sf_dev) +- mlx5_sf_dev_del(table->dev, sf_dev, sf_index); ++ mlx5_sf_dev_del(dev, sf_dev, sf_index); + else +- mlx5_core_err(table->dev, ++ mlx5_core_err(dev, + "SF DEV: teardown state for invalid dev index=%d sfnum=0x%x\n", + sf_index, event->sw_function_id); + break; + case MLX5_VHCA_STATE_ACTIVE: + if (!sf_dev) +- mlx5_sf_dev_add(table->dev, sf_index, event->function_id, ++ mlx5_sf_dev_add(dev, sf_index, event->function_id, + event->sw_function_id); + break; + default: +@@ -315,6 +319,15 @@ static void mlx5_sf_dev_destroy_active_works(struct mlx5_sf_dev_table *table) + } + } + ++int mlx5_sf_dev_notifier_init(struct mlx5_core_dev *dev) ++{ ++ if (mlx5_core_is_sf(dev)) ++ return 0; ++ ++ dev->priv.sf_dev_nb.notifier_call = mlx5_sf_dev_state_change_handler; ++ return mlx5_vhca_event_notifier_register(dev, &dev->priv.sf_dev_nb); ++} ++ + void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev) + { + struct mlx5_sf_dev_table *table; +@@ -329,17 +342,12 @@ void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev) + goto table_err; + } + +- table->nb.notifier_call = mlx5_sf_dev_state_change_handler; + table->dev = dev; + table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12); + table->base_address = pci_resource_start(dev->pdev, 2); + xa_init(&table->devices); + dev->priv.sf_dev_table = table; + +- err = mlx5_vhca_event_notifier_register(dev, &table->nb); +- if (err) +- goto vhca_err; +- + err = mlx5_sf_dev_create_active_works(table); + if (err) + goto add_active_err; +@@ -351,10 +359,8 @@ void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev) + + arm_err: + mlx5_sf_dev_destroy_active_works(table); +-add_active_err: +- mlx5_vhca_event_notifier_unregister(dev, &table->nb); + mlx5_vhca_event_work_queues_flush(dev); +-vhca_err: ++add_active_err: + kfree(table); + dev->priv.sf_dev_table = NULL; + table_err: +@@ -372,6 +378,14 @@ static void mlx5_sf_dev_destroy_all(struct mlx5_sf_dev_table *table) + } + } + ++void mlx5_sf_dev_notifier_cleanup(struct mlx5_core_dev *dev) ++{ ++ if (mlx5_core_is_sf(dev)) ++ return; ++ ++ mlx5_vhca_event_notifier_unregister(dev, &dev->priv.sf_dev_nb); ++} ++ + void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev) + { + struct mlx5_sf_dev_table *table = dev->priv.sf_dev_table; +@@ -380,7 +394,6 @@ void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev) + return; + + mlx5_sf_dev_destroy_active_works(table); +- mlx5_vhca_event_notifier_unregister(dev, &table->nb); + + /* Now that event handler is not running, it is safe to destroy + * the sf device without race. +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h +index b99131e95e37..3ab0449c770c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h +@@ -25,7 +25,9 @@ struct mlx5_sf_peer_devlink_event_ctx { + int err; + }; + ++int mlx5_sf_dev_notifier_init(struct mlx5_core_dev *dev); + void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev); ++void mlx5_sf_dev_notifier_cleanup(struct mlx5_core_dev *dev); + void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev); + + int mlx5_sf_driver_register(void); +@@ -35,10 +37,19 @@ bool mlx5_sf_dev_allocated(const struct mlx5_core_dev *dev); + + #else + ++static inline int mlx5_sf_dev_notifier_init(struct mlx5_core_dev *dev) ++{ ++ return 0; ++} ++ + static inline void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev) + { + } + ++static inline void mlx5_sf_dev_notifier_cleanup(struct mlx5_core_dev *dev) ++{ ++} ++ + static inline void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev) + { + } +diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h +index 6af62047a614..1c54aa6f74fb 100644 +--- a/include/linux/mlx5/driver.h ++++ b/include/linux/mlx5/driver.h +@@ -616,6 +616,7 @@ struct mlx5_priv { + #ifdef CONFIG_MLX5_SF + struct mlx5_nb vhca_state_nb; + struct blocking_notifier_head vhca_state_n_head; ++ struct notifier_block sf_dev_nb; + struct mlx5_sf_dev_table *sf_dev_table; + struct mlx5_core_dev *parent_mdev; + #endif +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1634-net-mlx5e-use-u64-instead-of-u64-in-ieee-setmaxrate.patch b/SOURCES/1634-net-mlx5e-use-u64-instead-of-u64-in-ieee-setmaxrate.patch new file mode 100644 index 000000000..d962229ff --- /dev/null +++ b/SOURCES/1634-net-mlx5e-use-u64-instead-of-u64-in-ieee-setmaxrate.patch @@ -0,0 +1,42 @@ +From b3d0adbb647eb3a4fbbf0cb2bb76437601a97598 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:30 -0400 +Subject: [PATCH] net/mlx5e: Use u64 instead of __u64 in ieee_setmaxrate + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e1de33c377b685298da406bb4838bd9814194f96 +Author: Gal Pressman +Date: Sun Nov 30 12:25:31 2025 +0200 + + net/mlx5e: Use u64 instead of __u64 in ieee_setmaxrate + + Change upper_limit_mbps/gbps from __u64 to u64 to follow kernel coding + conventions. + + Signed-off-by: Gal Pressman + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1764498334-1327918-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index 84e700777941..b06b65167787 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -595,8 +595,8 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + struct mlx5_core_dev *mdev = priv->mdev; + u8 max_bw_value[IEEE_8021QAZ_MAX_TCS]; + u8 max_bw_unit[IEEE_8021QAZ_MAX_TCS]; +- __u64 upper_limit_mbps; +- __u64 upper_limit_gbps; ++ u64 upper_limit_mbps; ++ u64 upper_limit_gbps; + int i; + struct { + int scale; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1635-net-mlx5e-rename-upper-limit-mbps-to-upper-limit-100mbps.patch b/SOURCES/1635-net-mlx5e-rename-upper-limit-mbps-to-upper-limit-100mbps.patch new file mode 100644 index 000000000..d6c174764 --- /dev/null +++ b/SOURCES/1635-net-mlx5e-rename-upper-limit-mbps-to-upper-limit-100mbps.patch @@ -0,0 +1,58 @@ +From c0b5bcdd57bf923bb87df30c72552385313befd4 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:31 -0400 +Subject: [PATCH] net/mlx5e: Rename upper_limit_mbps to upper_limit_100mbps + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e1098bb02f2d9a85a127aecad6378e4f159acce5 +Author: Gal Pressman +Date: Sun Nov 30 12:25:32 2025 +0200 + + net/mlx5e: Rename upper_limit_mbps to upper_limit_100mbps + + Clarify that the limit represents the threshold for using 100 Mbps + units rather than a general Mbps limit. + + Signed-off-by: Gal Pressman + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1764498334-1327918-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index b06b65167787..892aae41fe2f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -595,7 +595,7 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + struct mlx5_core_dev *mdev = priv->mdev; + u8 max_bw_value[IEEE_8021QAZ_MAX_TCS]; + u8 max_bw_unit[IEEE_8021QAZ_MAX_TCS]; +- u64 upper_limit_mbps; ++ u64 upper_limit_100mbps; + u64 upper_limit_gbps; + int i; + struct { +@@ -614,7 +614,7 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + + memset(max_bw_value, 0, sizeof(max_bw_value)); + memset(max_bw_unit, 0, sizeof(max_bw_unit)); +- upper_limit_mbps = 255 * MLX5E_100MB; ++ upper_limit_100mbps = 255 * MLX5E_100MB; + upper_limit_gbps = 255 * MLX5E_1GB; + + for (i = 0; i <= mlx5_max_tc(mdev); i++) { +@@ -622,7 +622,7 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + max_bw_unit[i] = MLX5_BW_NO_LIMIT; + continue; + } +- if (maxrate->tc_maxrate[i] <= upper_limit_mbps) { ++ if (maxrate->tc_maxrate[i] <= upper_limit_100mbps) { + max_bw_value[i] = div_u64(maxrate->tc_maxrate[i], + MLX5E_100MB); + max_bw_value[i] = max_bw_value[i] ? max_bw_value[i] : 1; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1636-net-mlx5e-use-u8-max-instead-of-hard-coded-magic-number.patch b/SOURCES/1636-net-mlx5e-use-u8-max-instead-of-hard-coded-magic-number.patch new file mode 100644 index 000000000..79a66fc12 --- /dev/null +++ b/SOURCES/1636-net-mlx5e-use-u8-max-instead-of-hard-coded-magic-number.patch @@ -0,0 +1,42 @@ +From 3547ea962b6e8d4d768dc25d5e63ddadb6cb30b2 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:31 -0400 +Subject: [PATCH] net/mlx5e: Use U8_MAX instead of hard coded magic number + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 53f7a771285182be7bfba6d59ccdd0d47fc1a097 +Author: Gal Pressman +Date: Sun Nov 30 12:25:33 2025 +0200 + + net/mlx5e: Use U8_MAX instead of hard coded magic number + + Replace hard coded 255 magic number with U8_MAX (the register field is 8 + bits). + + Signed-off-by: Gal Pressman + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1764498334-1327918-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index 892aae41fe2f..9229e94b9909 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -614,8 +614,8 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + + memset(max_bw_value, 0, sizeof(max_bw_value)); + memset(max_bw_unit, 0, sizeof(max_bw_unit)); +- upper_limit_100mbps = 255 * MLX5E_100MB; +- upper_limit_gbps = 255 * MLX5E_1GB; ++ upper_limit_100mbps = U8_MAX * MLX5E_100MB; ++ upper_limit_gbps = U8_MAX * MLX5E_1GB; + + for (i = 0; i <= mlx5_max_tc(mdev); i++) { + if (!maxrate->tc_maxrate[i]) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1637-net-mlx5e-use-standard-unit-definitions-for-bandwidth-conver.patch b/SOURCES/1637-net-mlx5e-use-standard-unit-definitions-for-bandwidth-conver.patch new file mode 100644 index 000000000..debb43026 --- /dev/null +++ b/SOURCES/1637-net-mlx5e-use-standard-unit-definitions-for-bandwidth-conver.patch @@ -0,0 +1,95 @@ +From 796a108303c5f16c626de7c80ee67ca9442bd13e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:31 -0400 +Subject: [PATCH] net/mlx5e: Use standard unit definitions for bandwidth + conversion + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 87a5112bfc406a72c6bf1e8cdef9b0169dc1df6a +Author: Gal Pressman +Date: Sun Nov 30 12:25:34 2025 +0200 + + net/mlx5e: Use standard unit definitions for bandwidth conversion + + MLX5E_100MB and MLX5E_1GB defines are confusing, MLX5E_100MB is not + equal to 100 * MEGA, and MLX5E_1GB is not equal to one GIGA, as they + hide the Kbps rate conversion required for ieee_maxrate. + + Replace hardcoded bandwidth conversion values with standard unit + definitions from linux/units.h. Rename MLX5E_100MB/MLX5E_1GB to + MLX5E_100MB_TO_KB/MLX5E_1GB_TO_KB to clarify these are conversion + factors to Kbps, not absolute bandwidth values. + + Signed-off-by: Gal Pressman + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1764498334-1327918-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +index 9229e94b9909..585ac619e5e8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +@@ -31,14 +31,15 @@ + */ + #include + #include ++#include + #include "en.h" + #include "en/port.h" + #include "en/port_buffer.h" + + #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */ + +-#define MLX5E_100MB (100000) +-#define MLX5E_1GB (1000000) ++#define MLX5E_100MB_TO_KB (100 * MEGA / KILO) ++#define MLX5E_1GB_TO_KB (GIGA / KILO) + + #define MLX5E_CEE_STATE_UP 1 + #define MLX5E_CEE_STATE_DOWN 0 +@@ -572,10 +573,10 @@ static int mlx5e_dcbnl_ieee_getmaxrate(struct net_device *netdev, + for (i = 0; i <= mlx5_max_tc(mdev); i++) { + switch (max_bw_unit[i]) { + case MLX5_100_MBPS_UNIT: +- maxrate->tc_maxrate[i] = max_bw_value[i] * MLX5E_100MB; ++ maxrate->tc_maxrate[i] = max_bw_value[i] * MLX5E_100MB_TO_KB; + break; + case MLX5_GBPS_UNIT: +- maxrate->tc_maxrate[i] = max_bw_value[i] * MLX5E_1GB; ++ maxrate->tc_maxrate[i] = max_bw_value[i] * MLX5E_1GB_TO_KB; + break; + case MLX5_BW_NO_LIMIT: + break; +@@ -614,8 +615,8 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + + memset(max_bw_value, 0, sizeof(max_bw_value)); + memset(max_bw_unit, 0, sizeof(max_bw_unit)); +- upper_limit_100mbps = U8_MAX * MLX5E_100MB; +- upper_limit_gbps = U8_MAX * MLX5E_1GB; ++ upper_limit_100mbps = U8_MAX * MLX5E_100MB_TO_KB; ++ upper_limit_gbps = U8_MAX * MLX5E_1GB_TO_KB; + + for (i = 0; i <= mlx5_max_tc(mdev); i++) { + if (!maxrate->tc_maxrate[i]) { +@@ -624,12 +625,12 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, + } + if (maxrate->tc_maxrate[i] <= upper_limit_100mbps) { + max_bw_value[i] = div_u64(maxrate->tc_maxrate[i], +- MLX5E_100MB); ++ MLX5E_100MB_TO_KB); + max_bw_value[i] = max_bw_value[i] ? max_bw_value[i] : 1; + max_bw_unit[i] = MLX5_100_MBPS_UNIT; + } else if (maxrate->tc_maxrate[i] <= upper_limit_gbps) { + max_bw_value[i] = div_u64(maxrate->tc_maxrate[i], +- MLX5E_1GB); ++ MLX5E_1GB_TO_KB); + max_bw_unit[i] = MLX5_GBPS_UNIT; + } else { + netdev_err(netdev, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1638-net-mlx5e-update-xdp-features-in-switch-channels.patch b/SOURCES/1638-net-mlx5e-update-xdp-features-in-switch-channels.patch new file mode 100644 index 000000000..f89db9aec --- /dev/null +++ b/SOURCES/1638-net-mlx5e-update-xdp-features-in-switch-channels.patch @@ -0,0 +1,150 @@ +From 9e854a91756c19006bbdf08d8b42d112a2f5f1b1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 14:59:31 -0400 +Subject: [PATCH] net/mlx5e: Update XDP features in switch channels + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 96a8395061358adcd4b6a4f0f4c8989ec69e8659 +Author: Tariq Toukan +Date: Sun Nov 30 12:13:36 2025 +0200 + + net/mlx5e: Update XDP features in switch channels + + The XDP features state might depend of the state of other features, like + HW-LRO / HW-GRO. + + In general, move the re-evaluation announcement of the XDP features + (xdp_set_features_flag_locked) into the flow where configuration gets + changed. There's no point in updating them elsewhere. + + This is a more appropriate place, as this modifies the announced + features while channels are inactive, which avoids the small interval + between channel activation and the proper setting of the XDP features. + + Signed-off-by: Tariq Toukan + Reviewed-by: Dragos Tatulea + Reviewed-by: William Tu + Link: https://patch.msgid.link/1764497617-1326331-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 85f940869968..a1d33c78aedd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -1242,7 +1242,7 @@ void mlx5e_netdev_attach_nic_profile(struct mlx5e_priv *priv); + void mlx5e_set_netdev_mtu_boundaries(struct mlx5e_priv *priv); + void mlx5e_build_nic_params(struct mlx5e_priv *priv, struct mlx5e_xsk *xsk, u16 mtu); + +-void mlx5e_set_xdp_feature(struct net_device *netdev); ++void mlx5e_set_xdp_feature(struct mlx5e_priv *priv); + netdev_features_t mlx5e_features_check(struct sk_buff *skb, + struct net_device *netdev, + netdev_features_t features); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +index 6e12bd196ec9..12ecb949bcc5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +@@ -2293,7 +2293,6 @@ static int set_pflag_rx_striding_rq(struct net_device *netdev, bool enable) + struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5_core_dev *mdev = priv->mdev; + struct mlx5e_params new_params; +- int err; + + if (enable) { + /* Checking the regular RQ here; mlx5e_validate_xsk_param called +@@ -2314,14 +2313,7 @@ static int set_pflag_rx_striding_rq(struct net_device *netdev, bool enable) + MLX5E_SET_PFLAG(&new_params, MLX5E_PFLAG_RX_STRIDING_RQ, enable); + mlx5e_set_rq_type(mdev, &new_params); + +- err = mlx5e_safe_switch_params(priv, &new_params, NULL, NULL, true); +- if (err) +- return err; +- +- /* update XDP supported features */ +- mlx5e_set_xdp_feature(netdev); +- +- return 0; ++ return mlx5e_safe_switch_params(priv, &new_params, NULL, NULL, true); + } + + static int set_pflag_rx_no_csum_complete(struct net_device *netdev, bool enable) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 261b96e41d7e..7c1f458a61f5 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -3337,6 +3337,7 @@ static int mlx5e_switch_priv_params(struct mlx5e_priv *priv, + } + } + ++ mlx5e_set_xdp_feature(priv); + return 0; + } + +@@ -3368,6 +3369,7 @@ static int mlx5e_switch_priv_channels(struct mlx5e_priv *priv, + } + } + ++ mlx5e_set_xdp_feature(priv); + if (!MLX5_CAP_GEN(priv->mdev, tis_tir_td_order)) + mlx5e_close_channels(old_chs); + priv->profile->update_rx(priv); +@@ -4376,10 +4378,10 @@ static int mlx5e_handle_feature(struct net_device *netdev, + return 0; + } + +-void mlx5e_set_xdp_feature(struct net_device *netdev) ++void mlx5e_set_xdp_feature(struct mlx5e_priv *priv) + { +- struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5e_params *params = &priv->channels.params; ++ struct net_device *netdev = priv->netdev; + xdp_features_t val; + + if (!netdev->netdev_ops->ndo_bpf || +@@ -4428,9 +4430,6 @@ int mlx5e_set_features(struct net_device *netdev, netdev_features_t features) + return -EINVAL; + } + +- /* update XDP supported features */ +- mlx5e_set_xdp_feature(netdev); +- + return 0; + } + +@@ -5805,7 +5804,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) + netdev->priv_flags |= IFF_UNICAST_FLT; + + netif_set_tso_max_size(netdev, GSO_MAX_SIZE); +- mlx5e_set_xdp_feature(netdev); ++ mlx5e_set_xdp_feature(priv); + mlx5e_set_netdev_dev_addr(netdev); + mlx5e_macsec_build_netdev(priv); + mlx5e_ipsec_build_netdev(priv); +@@ -5898,7 +5897,7 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev, + rtnl_lock(); + + /* update XDP supported features */ +- mlx5e_set_xdp_feature(netdev); ++ mlx5e_set_xdp_feature(priv); + + if (take_rtnl) + rtnl_unlock(); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +index d19d743a88ae..8490e2039f7f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +@@ -866,7 +866,7 @@ static void mlx5e_build_rep_params(struct net_device *netdev) + if (take_rtnl) + rtnl_lock(); + /* update XDP supported features */ +- mlx5e_set_xdp_feature(netdev); ++ mlx5e_set_xdp_feature(priv); + if (take_rtnl) + rtnl_unlock(); + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1639-net-mlx5e-support-xdp-target-xmit-with-dummy-program.patch b/SOURCES/1639-net-mlx5e-support-xdp-target-xmit-with-dummy-program.patch new file mode 100644 index 000000000..7ac8f0c9c --- /dev/null +++ b/SOURCES/1639-net-mlx5e-support-xdp-target-xmit-with-dummy-program.patch @@ -0,0 +1,88 @@ +From debb682ed668b08f9b59751749f9d83c3a496622 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:05:59 -0400 +Subject: [PATCH] net/mlx5e: Support XDP target xmit with dummy program + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Context diff due to the missing of the following commit: +8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking") + +commit d4aa0cc9bd31f3e0cd5f067d649bf39135e4b46b +Author: Tariq Toukan +Date: Sun Nov 30 12:13:37 2025 +0200 + + net/mlx5e: Support XDP target xmit with dummy program + + Save per-channel resources in default, in device and host memory. + + As no better API exist, make the XDP-redirect-target SQ available by + loading a dummy XDP program. + + This improves the latency of interface up/down operations when feature + is disabled. + + Perf numbers: + NIC: Connect-X7. + Setup: 248 channels, default mtu and rx/tx ring sizes. + + Interface up + down: + Before: 2.246 secs + After: 1.798 secs (-0.448 sec) + + Saves ~1.8 msec per channel. + + Signed-off-by: Tariq Toukan + Reviewed-by: Dragos Tatulea + Reviewed-by: William Tu + Link: https://patch.msgid.link/1764497617-1326331-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 7c1f458a61f5..e69a67aa54f4 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -2584,7 +2584,7 @@ static int mlx5e_open_queues(struct mlx5e_channel *c, + if (err) + goto err_close_icosq_cq; + +- if (netdev_ops->ndo_xdp_xmit) { ++ if (netdev_ops->ndo_xdp_xmit && c->xdp) { + c->xdpsq = mlx5e_open_xdpredirect_sq(c, params, cparam, &ccp); + if (IS_ERR(c->xdpsq)) { + err = PTR_ERR(c->xdpsq); +@@ -4382,19 +4382,18 @@ void mlx5e_set_xdp_feature(struct mlx5e_priv *priv) + { + struct mlx5e_params *params = &priv->channels.params; + struct net_device *netdev = priv->netdev; +- xdp_features_t val; ++ xdp_features_t val = 0; + +- if (!netdev->netdev_ops->ndo_bpf || +- params->packet_merge.type != MLX5E_PACKET_MERGE_NONE) { +- xdp_clear_features_flag(netdev); +- return; +- } ++ if (netdev->netdev_ops->ndo_bpf && ++ params->packet_merge.type == MLX5E_PACKET_MERGE_NONE) ++ val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | ++ NETDEV_XDP_ACT_XSK_ZEROCOPY | ++ NETDEV_XDP_ACT_RX_SG; ++ ++ if (netdev->netdev_ops->ndo_xdp_xmit && params->xdp_prog) ++ val |= NETDEV_XDP_ACT_NDO_XMIT | ++ NETDEV_XDP_ACT_NDO_XMIT_SG; + +- val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | +- NETDEV_XDP_ACT_XSK_ZEROCOPY | +- NETDEV_XDP_ACT_RX_SG | +- NETDEV_XDP_ACT_NDO_XMIT | +- NETDEV_XDP_ACT_NDO_XMIT_SG; + xdp_set_features_flag(netdev, val); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1640-net-mlx5-make-enable-mpesw-idempotent.patch b/SOURCES/1640-net-mlx5-make-enable-mpesw-idempotent.patch new file mode 100644 index 000000000..3222f10e8 --- /dev/null +++ b/SOURCES/1640-net-mlx5-make-enable-mpesw-idempotent.patch @@ -0,0 +1,60 @@ +From bf157e13863b52747166fe7dd75eb43def8ee477 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:57 -0400 +Subject: [PATCH] net/mlx5: make enable_mpesw idempotent + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit cd7671ef4cf2edf73cd2a3dca3a2f522a4525bf5 +Author: Moshe Shemesh +Date: Mon Dec 1 17:13:27 2025 +0200 + + net/mlx5: make enable_mpesw idempotent + + The enable_mpesw() function returns -EINVAL if ldev->mode is not + MLX5_LAG_MODE_NONE. This means attempting to enable MPESW mode when it's + already enabled will fail. In contrast, disable_mpesw() properly checks + if the mode is MLX5_LAG_MODE_MPESW before proceeding, making it + naturally idempotent and safe to call multiple times. + + Fix enable_mpesw() to return success if mpesw is already enabled. + + Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode") + Signed-off-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1764602008-1334866-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c +index aad52d3a90e6..2d86af8f0d9b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c +@@ -67,12 +67,19 @@ static int mlx5_mpesw_metadata_set(struct mlx5_lag *ldev) + + static int enable_mpesw(struct mlx5_lag *ldev) + { +- int idx = mlx5_lag_get_dev_index_by_seq(ldev, MLX5_LAG_P1); + struct mlx5_core_dev *dev0; + int err; ++ int idx; + int i; + +- if (idx < 0 || ldev->mode != MLX5_LAG_MODE_NONE) ++ if (ldev->mode == MLX5_LAG_MODE_MPESW) ++ return 0; ++ ++ if (ldev->mode != MLX5_LAG_MODE_NONE) ++ return -EINVAL; ++ ++ idx = mlx5_lag_get_dev_index_by_seq(ldev, MLX5_LAG_P1); ++ if (idx < 0) + return -EINVAL; + + dev0 = ldev->pf[idx].dev; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1641-net-mlx5-fix-double-unregister-of-hca-ports-component.patch b/SOURCES/1641-net-mlx5-fix-double-unregister-of-hca-ports-component.patch new file mode 100644 index 000000000..a508db2c5 --- /dev/null +++ b/SOURCES/1641-net-mlx5-fix-double-unregister-of-hca-ports-component.patch @@ -0,0 +1,84 @@ +From 494a5454615c502d757998af84c0ddbbea83c72e Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:57 -0400 +Subject: [PATCH] net/mlx5: Fix double unregister of HCA_PORTS component + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6a107cfe9c99a079e578a4c5eb70038101a3599f +Author: Gerd Bayer +Date: Tue Dec 2 12:12:57 2025 +0100 + + net/mlx5: Fix double unregister of HCA_PORTS component + + Clear hca_devcom_comp in device's private data after unregistering it in + LAG teardown. Otherwise a slightly lagging second pass through + mlx5_unload_one() might try to unregister it again and trip over + use-after-free. + + On s390 almost all PCI level recovery events trigger two passes through + mxl5_unload_one() - one through the poll_health() method and one through + mlx5_pci_err_detected() as callback from generic PCI error recovery. + While testing PCI error recovery paths with more kernel debug features + enabled, this issue reproducibly led to kernel panics with the following + call chain: + + Unable to handle kernel pointer dereference in virtual kernel address space + Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI + Fault in home space mode while using kernel ASCE. + AS:00000000705c4007 R3:0000000000000024 + Oops: 0038 ilc:3 [#1]SMP + + CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted + 6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT + + Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0) + R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 + Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000 + 0000000000000000 0000000000000000 0000000000000001 0000000000000000 + 0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100 + 0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8 + Krnl Code: 0000020fc86aa1cc: e3b003400004 lg %r11,832 + 0000020fc86aa1d2: a7840211 brc 8,0000020fc86aa5f4 + *0000020fc86aa1d6: c09000df0b25 larl %r9,0000020fca28b820 + >0000020fc86aa1dc: d50790002000 clc 0(8,%r9),0(%r2) + 0000020fc86aa1e2: a7840209 brc 8,0000020fc86aa5f4 + 0000020fc86aa1e6: c0e001100401 larl %r14,0000020fca8aa9e8 + 0000020fc86aa1ec: c01000e25a00 larl %r1,0000020fca2f55ec + 0000020fc86aa1f2: a7eb00e8 aghi %r14,232 + + Call Trace: + __lock_acquire+0x5c/0x15f0 + lock_acquire.part.0+0xf8/0x270 + lock_acquire+0xb0/0x1b0 + down_write+0x5a/0x250 + mlx5_detach_device+0x42/0x110 [mlx5_core] + mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core] + mlx5_unload_one+0x42/0x60 [mlx5_core] + mlx5_pci_err_detected+0x94/0x150 [mlx5_core] + zpci_event_attempt_error_recovery+0xcc/0x388 + + Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer") + Signed-off-by: Gerd Bayer + Reviewed-by: Moshe Shemesh + Acked-by: Tariq Toukan + Link: https://patch.msgid.link/20251202-fix_lag-v1-1-59e8177ffce0@linux.ibm.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +index 1ac933cd8f02..a459a30f36ca 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +@@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev) + static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev) + { + mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp); ++ dev->priv.hca_devcom_comp = NULL; + } + + static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1642-net-mlx5-fw-reset-clear-reset-requested-on-drain-fw-reset.patch b/SOURCES/1642-net-mlx5-fw-reset-clear-reset-requested-on-drain-fw-reset.patch new file mode 100644 index 000000000..35e21fa4b --- /dev/null +++ b/SOURCES/1642-net-mlx5-fw-reset-clear-reset-requested-on-drain-fw-reset.patch @@ -0,0 +1,47 @@ +From 1eb2fb402f2e35219048887c0cecd272e3d3e0bb Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:57 -0400 +Subject: [PATCH] net/mlx5: fw reset, clear reset requested on drain_fw_reset + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 89a898d63f6f588acf5c104c65c94a38b68c69a6 +Author: Moshe Shemesh +Date: Tue Dec 9 14:56:09 2025 +0200 + + net/mlx5: fw reset, clear reset requested on drain_fw_reset + + drain_fw_reset() waits for ongoing firmware reset events and blocks new + event handling, but does not clear the reset requested flag, and may + keep sync reset polling. + + To fix it, call mlx5_sync_reset_clear_reset_requested() to clear the + flag, stop sync reset polling, and resume health polling, ensuring + health issues are still detected after the firmware reset drain. + + Fixes: 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") + Signed-off-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1765284977-1363052-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +index 89e399606877..33df0418e575 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +@@ -843,7 +843,8 @@ void mlx5_drain_fw_reset(struct mlx5_core_dev *dev) + cancel_work_sync(&fw_reset->reset_reload_work); + cancel_work_sync(&fw_reset->reset_now_work); + cancel_work_sync(&fw_reset->reset_abort_work); +- cancel_delayed_work(&fw_reset->reset_timeout_work); ++ if (test_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) ++ mlx5_sync_reset_clear_reset_requested(dev, true); + } + + static const struct devlink_param mlx5_fw_reset_devlink_params[] = { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1643-net-mlx5-drain-firmware-reset-in-shutdown-callback.patch b/SOURCES/1643-net-mlx5-drain-firmware-reset-in-shutdown-callback.patch new file mode 100644 index 000000000..bc27c0b07 --- /dev/null +++ b/SOURCES/1643-net-mlx5-drain-firmware-reset-in-shutdown-callback.patch @@ -0,0 +1,40 @@ +From 264b47175d28baa833a99a484e91d6ba85034fcb Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:57 -0400 +Subject: [PATCH] net/mlx5: Drain firmware reset in shutdown callback + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5846a365fc6476b02d6766963cf0985520f0385f +Author: Moshe Shemesh +Date: Tue Dec 9 14:56:10 2025 +0200 + + net/mlx5: Drain firmware reset in shutdown callback + + Invoke drain_fw_reset() in the shutdown callback to ensure all + firmware reset handling is completed before shutdown proceeds. + + Fixes: 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") + Signed-off-by: Moshe Shemesh + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1765284977-1363052-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index a0f937c29891..bb794c276b7f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -2232,6 +2232,7 @@ static void shutdown(struct pci_dev *pdev) + + mlx5_core_info(dev, "Shutdown was called\n"); + set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state); ++ mlx5_drain_fw_reset(dev); + mlx5_drain_health_wq(dev); + err = mlx5_try_fast_unload(dev); + if (err) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1644-net-mlx5-fw-tracer-validate-format-string-parameters.patch b/SOURCES/1644-net-mlx5-fw-tracer-validate-format-string-parameters.patch new file mode 100644 index 000000000..dcb0f8e41 --- /dev/null +++ b/SOURCES/1644-net-mlx5-fw-tracer-validate-format-string-parameters.patch @@ -0,0 +1,196 @@ +From f790fae387e151437057cc8a223b1fba282aecfa Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:58 -0400 +Subject: [PATCH] net/mlx5: fw_tracer, Validate format string parameters + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b35966042d20b14e2d83330049f77deec5229749 +Author: Shay Drory +Date: Tue Dec 9 14:56:11 2025 +0200 + + net/mlx5: fw_tracer, Validate format string parameters + + Add validation for format string parameters in the firmware tracer to + prevent potential security vulnerabilities and crashes from malformed + format strings received from firmware. + + The firmware tracer receives format strings from the device firmware and + uses them to format trace messages. Without proper validation, bad + firmware could provide format strings with invalid format specifiers + (e.g., %s, %p, %n) that could lead to crashes, or other undefined + behavior. + + Add mlx5_tracer_validate_params() to validate that all format specifiers + in trace strings are limited to safe integer/hex formats (%x, %d, %i, + %u, %llx, %lx, etc.). Reject strings containing other format types that + could be used to access arbitrary memory or cause crashes. + Invalid format strings are added to the trace output for visibility with + "BAD_FORMAT: " prefix. + + Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support") + Signed-off-by: Shay Drory + Reviewed-by: Moshe Shemesh + Reported-by: Breno Leitao + Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/ + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1765284977-1363052-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +index 7bcf822a89f9..b415dfe5de45 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +@@ -33,6 +33,7 @@ + #include "lib/eq.h" + #include "fw_tracer.h" + #include "fw_tracer_tracepoint.h" ++#include + + static int mlx5_query_mtrc_caps(struct mlx5_fw_tracer *tracer) + { +@@ -358,6 +359,43 @@ static const char *VAL_PARM = "%llx"; + static const char *REPLACE_64_VAL_PARM = "%x%x"; + static const char *PARAM_CHAR = "%"; + ++static bool mlx5_is_valid_spec(const char *str) ++{ ++ /* Parse format specifiers to find the actual type. ++ * Structure: %[flags][width][.precision][length]type ++ * Skip flags, width, precision & length. ++ */ ++ while (isdigit(*str) || *str == '#' || *str == '.' || *str == 'l') ++ str++; ++ ++ /* Check if it's a valid integer/hex specifier: ++ * Valid formats: %x, %d, %i, %u, etc. ++ */ ++ if (*str != 'x' && *str != 'X' && *str != 'd' && *str != 'i' && ++ *str != 'u' && *str != 'c') ++ return false; ++ ++ return true; ++} ++ ++static bool mlx5_tracer_validate_params(const char *str) ++{ ++ const char *substr = str; ++ ++ if (!str) ++ return false; ++ ++ substr = strstr(substr, PARAM_CHAR); ++ while (substr) { ++ if (!mlx5_is_valid_spec(substr + 1)) ++ return false; ++ ++ substr = strstr(substr + 1, PARAM_CHAR); ++ } ++ ++ return true; ++} ++ + static int mlx5_tracer_message_hash(u32 message_id) + { + return jhash_1word(message_id, 0) & (MESSAGE_HASH_SIZE - 1); +@@ -419,6 +457,10 @@ static int mlx5_tracer_get_num_of_params(char *str) + char *substr, *pstr = str; + int num_of_params = 0; + ++ /* Validate that all parameters are valid before processing */ ++ if (!mlx5_tracer_validate_params(str)) ++ return -EINVAL; ++ + /* replace %llx with %x%x */ + substr = strstr(pstr, VAL_PARM); + while (substr) { +@@ -570,14 +612,17 @@ void mlx5_tracer_print_trace(struct tracer_string_format *str_frmt, + { + char tmp[512]; + +- snprintf(tmp, sizeof(tmp), str_frmt->string, +- str_frmt->params[0], +- str_frmt->params[1], +- str_frmt->params[2], +- str_frmt->params[3], +- str_frmt->params[4], +- str_frmt->params[5], +- str_frmt->params[6]); ++ if (str_frmt->invalid_string) ++ snprintf(tmp, sizeof(tmp), "BAD_FORMAT: %s", str_frmt->string); ++ else ++ snprintf(tmp, sizeof(tmp), str_frmt->string, ++ str_frmt->params[0], ++ str_frmt->params[1], ++ str_frmt->params[2], ++ str_frmt->params[3], ++ str_frmt->params[4], ++ str_frmt->params[5], ++ str_frmt->params[6]); + + trace_mlx5_fw(dev->tracer, trace_timestamp, str_frmt->lost, + str_frmt->event_id, tmp); +@@ -609,6 +654,13 @@ static int mlx5_tracer_handle_raw_string(struct mlx5_fw_tracer *tracer, + return 0; + } + ++static void mlx5_tracer_handle_bad_format_string(struct mlx5_fw_tracer *tracer, ++ struct tracer_string_format *cur_string) ++{ ++ cur_string->invalid_string = true; ++ list_add_tail(&cur_string->list, &tracer->ready_strings_list); ++} ++ + static int mlx5_tracer_handle_string_trace(struct mlx5_fw_tracer *tracer, + struct tracer_event *tracer_event) + { +@@ -619,12 +671,18 @@ static int mlx5_tracer_handle_string_trace(struct mlx5_fw_tracer *tracer, + if (!cur_string) + return mlx5_tracer_handle_raw_string(tracer, tracer_event); + +- cur_string->num_of_params = mlx5_tracer_get_num_of_params(cur_string->string); +- cur_string->last_param_num = 0; + cur_string->event_id = tracer_event->event_id; + cur_string->tmsn = tracer_event->string_event.tmsn; + cur_string->timestamp = tracer_event->string_event.timestamp; + cur_string->lost = tracer_event->lost_event; ++ cur_string->last_param_num = 0; ++ cur_string->num_of_params = mlx5_tracer_get_num_of_params(cur_string->string); ++ if (cur_string->num_of_params < 0) { ++ pr_debug("%s Invalid format string parameters\n", ++ __func__); ++ mlx5_tracer_handle_bad_format_string(tracer, cur_string); ++ return 0; ++ } + if (cur_string->num_of_params == 0) /* trace with no params */ + list_add_tail(&cur_string->list, &tracer->ready_strings_list); + } else { +@@ -634,6 +692,11 @@ static int mlx5_tracer_handle_string_trace(struct mlx5_fw_tracer *tracer, + __func__, tracer_event->string_event.tmsn); + return mlx5_tracer_handle_raw_string(tracer, tracer_event); + } ++ if (cur_string->num_of_params < 0) { ++ pr_debug("%s string parameter of invalid string, dumping\n", ++ __func__); ++ return 0; ++ } + cur_string->last_param_num += 1; + if (cur_string->last_param_num > TRACER_MAX_PARAMS) { + pr_debug("%s Number of params exceeds the max (%d)\n", +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h +index 5c548bb74f07..30d0bcba8847 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h +@@ -125,6 +125,7 @@ struct tracer_string_format { + struct list_head list; + u32 timestamp; + bool lost; ++ bool invalid_string; + }; + + enum mlx5_fw_tracer_ownership_state { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1645-net-mlx5-fw-tracer-handle-escaped-percent-properly.patch b/SOURCES/1645-net-mlx5-fw-tracer-handle-escaped-percent-properly.patch new file mode 100644 index 000000000..b44f97176 --- /dev/null +++ b/SOURCES/1645-net-mlx5-fw-tracer-handle-escaped-percent-properly.patch @@ -0,0 +1,86 @@ +From 5439ccdfeefa1a7ccd88e3d1ed4a075004cdac03 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:58 -0400 +Subject: [PATCH] net/mlx5: fw_tracer, Handle escaped percent properly + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c0289f67f7d6a0dfba0e92cfe661a5c70c8c6e92 +Author: Shay Drory +Date: Tue Dec 9 14:56:12 2025 +0200 + + net/mlx5: fw_tracer, Handle escaped percent properly + + The firmware tracer's format string validation and parameter counting + did not properly handle escaped percent signs (%%). This caused + fw_tracer to count more parameters when trace format strings contained + literal percent characters. + + To fix it, allow %% to pass string validation and skip %% sequences when + counting parameters since they represent literal percent signs rather + than format specifiers. + + Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support") + Signed-off-by: Shay Drory + Reported-by: Breno Leitao + Reviewed-by: Moshe Shemesh + Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/ + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1765284977-1363052-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +index b415dfe5de45..6b4ec457ce22 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +@@ -368,11 +368,11 @@ static bool mlx5_is_valid_spec(const char *str) + while (isdigit(*str) || *str == '#' || *str == '.' || *str == 'l') + str++; + +- /* Check if it's a valid integer/hex specifier: ++ /* Check if it's a valid integer/hex specifier or %%: + * Valid formats: %x, %d, %i, %u, etc. + */ + if (*str != 'x' && *str != 'X' && *str != 'd' && *str != 'i' && +- *str != 'u' && *str != 'c') ++ *str != 'u' && *str != 'c' && *str != '%') + return false; + + return true; +@@ -390,7 +390,11 @@ static bool mlx5_tracer_validate_params(const char *str) + if (!mlx5_is_valid_spec(substr + 1)) + return false; + +- substr = strstr(substr + 1, PARAM_CHAR); ++ if (*(substr + 1) == '%') ++ substr = strstr(substr + 2, PARAM_CHAR); ++ else ++ substr = strstr(substr + 1, PARAM_CHAR); ++ + } + + return true; +@@ -469,11 +473,15 @@ static int mlx5_tracer_get_num_of_params(char *str) + substr = strstr(pstr, VAL_PARM); + } + +- /* count all the % characters */ ++ /* count all the % characters, but skip %% (escaped percent) */ + substr = strstr(str, PARAM_CHAR); + while (substr) { +- num_of_params += 1; +- str = substr + 1; ++ if (*(substr + 1) != '%') { ++ num_of_params += 1; ++ str = substr + 1; ++ } else { ++ str = substr + 2; ++ } + substr = strstr(str, PARAM_CHAR); + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1646-net-mlx5-serialize-firmware-reset-with-devlink.patch b/SOURCES/1646-net-mlx5-serialize-firmware-reset-with-devlink.patch new file mode 100644 index 000000000..98c8966d3 --- /dev/null +++ b/SOURCES/1646-net-mlx5-serialize-firmware-reset-with-devlink.patch @@ -0,0 +1,208 @@ +From 377d452a9758e2011a101d4919bb498a14e4075b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:58 -0400 +Subject: [PATCH] net/mlx5: Serialize firmware reset with devlink + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 367e501f8b095eca08d2eb0ba4ccea5b5e82c169 +Author: Shay Drory +Date: Tue Dec 9 14:56:13 2025 +0200 + + net/mlx5: Serialize firmware reset with devlink + + The firmware reset mechanism can be triggered by asynchronous events, + which may race with other devlink operations like devlink reload or + devlink dev eswitch set, potentially leading to inconsistent states. + + This patch addresses the race by using the devl_lock to serialize the + firmware reset against other devlink operations. When a reset is + requested, the driver attempts to acquire the lock. If successful, it + sets a flag to block devlink reload or eswitch changes, ACKs the reset + to firmware and then releases the lock. If the lock is already held by + another operation, the driver NACKs the firmware reset request, + indicating that the reset cannot proceed. + + Firmware reset does not keep the devl_lock and instead uses an internal + firmware reset bit. This is because firmware resets can be triggered by + asynchronous events, and processed in different threads. It is illegal + and unsafe to acquire a lock in one thread and attempt to release it in + another, as lock ownership is intrinsically thread-specific. + + This change ensures that firmware resets and other devlink operations + are mutually exclusive during the critical reset request phase, + preventing race conditions. + + Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") + Signed-off-by: Shay Drory + Reviewed-by: Mateusz Berezecki + Reviewed-by: Moshe Shemesh + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1765284977-1363052-6-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +index c204c707b850..9fb39f42a670 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +@@ -197,6 +197,11 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change, + struct pci_dev *pdev = dev->pdev; + int ret = 0; + ++ if (mlx5_fw_reset_in_progress(dev)) { ++ NL_SET_ERR_MSG_MOD(extack, "Can't reload during firmware reset"); ++ return -EBUSY; ++ } ++ + if (mlx5_dev_is_lightweight(dev)) { + if (action != DEVLINK_RELOAD_ACTION_DRIVER_REINIT) + return -EOPNOTSUPP; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 8ebca0d17f65..575b12079933 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -52,6 +52,7 @@ + #include "devlink.h" + #include "lag/lag.h" + #include "en/tc/post_meter.h" ++#include "fw_reset.h" + + /* There are two match-all miss flows, one for unicast dst mac and + * one for multicast. +@@ -3990,6 +3991,11 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + if (IS_ERR(esw)) + return PTR_ERR(esw); + ++ if (mlx5_fw_reset_in_progress(esw->dev)) { ++ NL_SET_ERR_MSG_MOD(extack, "Can't change eswitch mode during firmware reset"); ++ return -EBUSY; ++ } ++ + if (esw_mode_from_devlink(mode, &mlx5_mode)) + return -EINVAL; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +index 33df0418e575..4544f1968f73 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +@@ -15,6 +15,7 @@ enum { + MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS, + MLX5_FW_RESET_FLAGS_RELOAD_REQUIRED, + MLX5_FW_RESET_FLAGS_UNLOAD_EVENT, ++ MLX5_FW_RESET_FLAGS_RESET_IN_PROGRESS, + }; + + struct mlx5_fw_reset { +@@ -127,6 +128,16 @@ int mlx5_fw_reset_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_ty + return mlx5_reg_mfrl_query(dev, reset_level, reset_type, NULL, NULL); + } + ++bool mlx5_fw_reset_in_progress(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; ++ ++ if (!fw_reset) ++ return false; ++ ++ return test_bit(MLX5_FW_RESET_FLAGS_RESET_IN_PROGRESS, &fw_reset->reset_flags); ++} ++ + static int mlx5_fw_reset_get_reset_method(struct mlx5_core_dev *dev, + u8 *reset_method) + { +@@ -242,6 +253,8 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev) + BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE)); + devl_unlock(devlink); + } ++ ++ clear_bit(MLX5_FW_RESET_FLAGS_RESET_IN_PROGRESS, &fw_reset->reset_flags); + } + + static void mlx5_stop_sync_reset_poll(struct mlx5_core_dev *dev) +@@ -461,27 +474,48 @@ static void mlx5_sync_reset_request_event(struct work_struct *work) + struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset, + reset_request_work); + struct mlx5_core_dev *dev = fw_reset->dev; ++ bool nack_request = false; ++ struct devlink *devlink; + int err; + + err = mlx5_fw_reset_get_reset_method(dev, &fw_reset->reset_method); +- if (err) ++ if (err) { ++ nack_request = true; + mlx5_core_warn(dev, "Failed reading MFRL, err %d\n", err); ++ } else if (!mlx5_is_reset_now_capable(dev, fw_reset->reset_method) || ++ test_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, ++ &fw_reset->reset_flags)) { ++ nack_request = true; ++ } + +- if (err || test_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags) || +- !mlx5_is_reset_now_capable(dev, fw_reset->reset_method)) { ++ devlink = priv_to_devlink(dev); ++ /* For external resets, try to acquire devl_lock. Skip if devlink reset is ++ * pending (lock already held) ++ */ ++ if (nack_request || ++ (!test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, ++ &fw_reset->reset_flags) && ++ !devl_trylock(devlink))) { + err = mlx5_fw_reset_set_reset_sync_nack(dev); + mlx5_core_warn(dev, "PCI Sync FW Update Reset Nack %s", + err ? "Failed" : "Sent"); + return; + } ++ + if (mlx5_sync_reset_set_reset_requested(dev)) +- return; ++ goto unlock; ++ ++ set_bit(MLX5_FW_RESET_FLAGS_RESET_IN_PROGRESS, &fw_reset->reset_flags); + + err = mlx5_fw_reset_set_reset_sync_ack(dev); + if (err) + mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack Failed. Error code: %d\n", err); + else + mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack. Device reset is expected.\n"); ++ ++unlock: ++ if (!test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) ++ devl_unlock(devlink); + } + + static int mlx5_pci_link_toggle(struct mlx5_core_dev *dev, u16 dev_id) +@@ -721,6 +755,8 @@ static void mlx5_sync_reset_abort_event(struct work_struct *work) + + if (mlx5_sync_reset_clear_reset_requested(dev, true)) + return; ++ ++ clear_bit(MLX5_FW_RESET_FLAGS_RESET_IN_PROGRESS, &fw_reset->reset_flags); + mlx5_core_warn(dev, "PCI Sync FW Update Reset Aborted.\n"); + } + +@@ -757,6 +793,7 @@ static void mlx5_sync_reset_timeout_work(struct work_struct *work) + + if (mlx5_sync_reset_clear_reset_requested(dev, true)) + return; ++ clear_bit(MLX5_FW_RESET_FLAGS_RESET_IN_PROGRESS, &fw_reset->reset_flags); + mlx5_core_warn(dev, "PCI Sync FW Update Reset Timeout.\n"); + } + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h +index d5b28525c960..2d96b2adc1cd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h +@@ -10,6 +10,7 @@ int mlx5_fw_reset_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_ty + int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel, + struct netlink_ext_ack *extack); + int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev); ++bool mlx5_fw_reset_in_progress(struct mlx5_core_dev *dev); + + int mlx5_fw_reset_wait_reset_done(struct mlx5_core_dev *dev); + void mlx5_sync_reset_unload_flow(struct mlx5_core_dev *dev, bool locked); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1647-net-mlx5e-use-ip6-dst-lookup-instead-of-ipv6-dst-lookup-flow.patch b/SOURCES/1647-net-mlx5e-use-ip6-dst-lookup-instead-of-ipv6-dst-lookup-flow.patch new file mode 100644 index 000000000..fbd0aae55 --- /dev/null +++ b/SOURCES/1647-net-mlx5e-use-ip6-dst-lookup-instead-of-ipv6-dst-lookup-flow.patch @@ -0,0 +1,52 @@ +From 1ae4c52b8db64bdfb7c3b877a224c57927935879 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:58 -0400 +Subject: [PATCH] net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow + for MAC init + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit e35d7da8dd9e55b37c3e8ab548f6793af0c2ab49 +Author: Jianbo Liu +Date: Tue Dec 9 14:56:14 2025 +0200 + + net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init + + Replace ipv6_stub->ipv6_dst_lookup_flow() with ip6_dst_lookup() in + mlx5e_ipsec_init_macs() since IPsec transformations are not needed + during Security Association setup - only basic routing information is + required for nexthop MAC address resolution. + + This resolves an issue where XfrmOutNoStates error counter would be + incremented when xfrm policy is configured before xfrm state, as the + IPsec-aware routing function would attempt policy checks during SA + initialization. + + Fixes: 71670f766b8f ("net/mlx5e: Support routed networks during IPsec MACs initialization") + Signed-off-by: Jianbo Liu + Reviewed-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1765284977-1363052-7-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +index 35d9530037a6..6c79b9cea2ef 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +@@ -342,9 +342,8 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, + rt_dst_entry = &rt->dst; + break; + case AF_INET6: +- rt_dst_entry = ipv6_stub->ipv6_dst_lookup_flow( +- dev_net(netdev), NULL, &fl6, NULL); +- if (IS_ERR(rt_dst_entry)) ++ if (!IS_ENABLED(CONFIG_IPV6) || ++ ip6_dst_lookup(dev_net(netdev), NULL, &rt_dst_entry, &fl6)) + goto neigh; + break; + default: +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1648-net-mlx5e-trigger-neighbor-resolution-for-unresolved-destina.patch b/SOURCES/1648-net-mlx5e-trigger-neighbor-resolution-for-unresolved-destina.patch new file mode 100644 index 000000000..48b79a037 --- /dev/null +++ b/SOURCES/1648-net-mlx5e-trigger-neighbor-resolution-for-unresolved-destina.patch @@ -0,0 +1,63 @@ +From 67b865f6f8239b5ab05970b5be18c101fe92f285 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:58 -0400 +Subject: [PATCH] net/mlx5e: Trigger neighbor resolution for unresolved + destinations + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 9ab89bde13e5251e1d0507e1cc426edcdfe19142 +Author: Jianbo Liu +Date: Tue Dec 9 14:56:15 2025 +0200 + + net/mlx5e: Trigger neighbor resolution for unresolved destinations + + When initializing the MAC addresses for an outbound IPsec packet offload + rule in mlx5e_ipsec_init_macs, the call to dst_neigh_lookup is used to + find the next-hop neighbor (typically the gateway in tunnel mode). + This call might create a new neighbor entry if one doesn't already + exist. This newly created entry starts in the INCOMPLETE state, as the + kernel hasn't yet sent an ARP or NDISC probe to resolve the MAC + address. In this case, neigh_ha_snapshot will correctly return an + all-zero MAC address. + + IPsec packet offload requires the actual next-hop MAC address to + program the rule correctly. If the neighbor state is INCOMPLETE when + the rule is created, the hardware rule is programmed with an all-zero + destination MAC address. Packets sent using this rule will be + subsequently dropped by the receiving network infrastructure or host. + + This patch adds a check specifically for the outbound offload path. If + neigh_ha_snapshot returns an all-zero MAC address, it proactively + calls neigh_event_send(n, NULL). This ensures the kernel immediately + sends the initial ARP or NDISC probe if one isn't already pending, + accelerating the resolution process. This helps prevent the hardware + rule from being programmed with an invalid MAC address and avoids + packet drops due to unresolved neighbors. + + Fixes: 71670f766b8f ("net/mlx5e: Support routed networks during IPsec MACs initialization") + Signed-off-by: Jianbo Liu + Reviewed-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1765284977-1363052-8-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +index 6c79b9cea2ef..a8fb4bec369c 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +@@ -358,6 +358,9 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, + + neigh_ha_snapshot(addr, n, netdev); + ether_addr_copy(dst, addr); ++ if (attrs->dir == XFRM_DEV_OFFLOAD_OUT && ++ is_zero_ether_addr(addr)) ++ neigh_event_send(n, NULL); + dst_release(rt_dst_entry); + neigh_release(n); + return; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1649-net-mlx5e-do-not-update-bql-of-old-txqs-during-channel-recon.patch b/SOURCES/1649-net-mlx5e-do-not-update-bql-of-old-txqs-during-channel-recon.patch new file mode 100644 index 000000000..8338a6f05 --- /dev/null +++ b/SOURCES/1649-net-mlx5e-do-not-update-bql-of-old-txqs-during-channel-recon.patch @@ -0,0 +1,67 @@ +From 6c09bd8f4ec1299227af49385e6855bda80b85c1 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:58 -0400 +Subject: [PATCH] net/mlx5e: Do not update BQL of old txqs during channel + reconfiguration + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit c8591decd9dbf395cb8ae398e70b0438fdd24aee +Author: Tariq Toukan +Date: Tue Dec 9 14:56:16 2025 +0200 + + net/mlx5e: Do not update BQL of old txqs during channel reconfiguration + + During channel reconfiguration (e.g., ethtool private flags changes), + the driver can trigger a kernel BUG_ON in dql_completed() with the error + "kernel BUG at lib/dynamic_queue_limits.c:99". + + The issue occurs in the following sequence: + + During mlx5e_safe_switch_params(), old channels are deactivated via + mlx5e_deactivate_txqsq(). New channels are created and activated, taking + ownership of the netdev_queues and their BQL state. + + When old channels are closed via mlx5e_close_txqsq(), there may be + pending TX descriptors (sq->cc != sq->pc) that were in-flight during the + deactivation. + + mlx5e_free_txqsq_descs() frees these pending descriptors and attempts to + complete them via netdev_tx_completed_queue(). + + However, the BQL state (dql->num_queued and dql->num_completed) have + been reset in mlx5e_activate_txqsq and belong to the new queue owner, + leading to dql->num_queued - dql->num_completed < nbytes. + + This triggers BUG_ON(count > num_queued - num_completed) in + dql_completed(). + + Fixes: 3b88a535a8e1 ("net/mlx5e: Defer channels closure to reduce interface down time") + Signed-off-by: Tariq Toukan + Signed-off-by: William Tu + Reviewed-by: Dragos Tatulea + Link: https://patch.msgid.link/1765284977-1363052-9-git-send-email-tariqt@nvidia.com + Signed-off-by: Paolo Abeni + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +index 6245d2f82afe..5d12f19dfe8a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +@@ -933,7 +933,11 @@ void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq) + sq->dma_fifo_cc = dma_fifo_cc; + sq->cc = sqcc; + +- netdev_tx_completed_queue(sq->txq, npkts, nbytes); ++ /* Do not update BQL for TXQs that got replaced by new active ones, as ++ * netdev_tx_reset_queue() is called for them in mlx5e_activate_txqsq(). ++ */ ++ if (sq == sq->priv->txq2sq[sq->txq_ix]) ++ netdev_tx_completed_queue(sq->txq, npkts, nbytes); + } + + #ifdef CONFIG_MLX5_CORE_IPOIB +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1650-net-mlx5-lag-multipath-give-priority-for-routes-with-smaller.patch b/SOURCES/1650-net-mlx5-lag-multipath-give-priority-for-routes-with-smaller.patch new file mode 100644 index 000000000..79b28cd58 --- /dev/null +++ b/SOURCES/1650-net-mlx5-lag-multipath-give-priority-for-routes-with-smaller.patch @@ -0,0 +1,59 @@ +From ee464f702ba7e81cee0eecabd8e982448af4cdba Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 15:08:58 -0400 +Subject: [PATCH] net/mlx5: Lag, multipath, give priority for routes with + smaller network prefix + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 31057979cdadfee9f934746fd84046b43506ba61 +Author: Patrisious Haddad +Date: Thu Dec 25 15:27:13 2025 +0200 + + net/mlx5: Lag, multipath, give priority for routes with smaller network prefix + + Today multipath offload is controlled by a single route and the route + controlling is selected if it meets one of the following criteria: + 1. No controlling route is set. + 2. New route destination is the same as old one. + 3. New route metric is lower than old route metric. + + This can cause unwanted behaviour in case a new route is added + with a smaller network prefix which should get the priority. + + Fix this by adding a new criteria to give priority to new route with + a smaller network prefix. + + Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry") + Signed-off-by: Patrisious Haddad + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20251225132717.358820-2-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c +index aee17fcf3b36..cdc99fe5c956 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c +@@ -173,10 +173,15 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, unsigned long event, + } + + /* Handle multipath entry with lower priority value */ +- if (mp->fib.mfi && mp->fib.mfi != fi && ++ if (mp->fib.mfi && + (mp->fib.dst != fen_info->dst || mp->fib.dst_len != fen_info->dst_len) && +- fi->fib_priority >= mp->fib.priority) ++ mp->fib.dst_len <= fen_info->dst_len && ++ !(mp->fib.dst_len == fen_info->dst_len && ++ fi->fib_priority < mp->fib.priority)) { ++ mlx5_core_dbg(ldev->pf[idx].dev, ++ "Multipath entry with lower priority was rejected\n"); + return; ++ } + + nh_dev0 = mlx5_lag_get_next_fib_dev(ldev, fi, NULL); + nh_dev1 = mlx5_lag_get_next_fib_dev(ldev, fi, nh_dev0); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1651-net-mlx5e-fix-null-pointer-dereference-in-ioctl-module-eepro.patch b/SOURCES/1651-net-mlx5e-fix-null-pointer-dereference-in-ioctl-module-eepro.patch new file mode 100644 index 000000000..a28be6b3f --- /dev/null +++ b/SOURCES/1651-net-mlx5e-fix-null-pointer-dereference-in-ioctl-module-eepro.patch @@ -0,0 +1,53 @@ +From 8667c2e8fe140d8da277af968148cadaf9eafc87 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:10:11 -0400 +Subject: [PATCH] net/mlx5e: Fix NULL pointer dereference in ioctl module + EEPROM query + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 7d36a4a8bf62dc508bc6bb4b59727aec25064ca5 +Author: Gal Pressman +Date: Thu Dec 25 15:27:15 2025 +0200 + + net/mlx5e: Fix NULL pointer dereference in ioctl module EEPROM query + + The mlx5_query_mcia() function unconditionally dereferences the status + pointer to store the MCIA register status value. + However, mlx5e_get_module_id() passes NULL since it doesn't need the + status value. + + Add a NULL check before dereferencing the status pointer to prevent a + NULL pointer dereference. + + Fixes: 2e4c44b12f4d ("net/mlx5: Refactor EEPROM query error handling to return status separately") + Signed-off-by: Gal Pressman + Reviewed-by: Tariq Toukan + Reviewed-by: Dragos Tatulea + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20251225132717.358820-4-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c +index e4b1dfafb41f..3e52579bc2af 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c +@@ -393,9 +393,11 @@ static int mlx5_query_mcia(struct mlx5_core_dev *dev, + if (err) + return err; + +- *status = MLX5_GET(mcia_reg, out, status); +- if (*status) ++ if (MLX5_GET(mcia_reg, out, status)) { ++ if (status) ++ *status = MLX5_GET(mcia_reg, out, status); + return -EIO; ++ } + + ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0); + memcpy(data, ptr, size); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1652-net-mlx5e-don-t-print-error-message-due-to-invalid-module.patch b/SOURCES/1652-net-mlx5e-don-t-print-error-message-due-to-invalid-module.patch new file mode 100644 index 000000000..e58a23ddd --- /dev/null +++ b/SOURCES/1652-net-mlx5e-don-t-print-error-message-due-to-invalid-module.patch @@ -0,0 +1,52 @@ +From 765f3c5f2352a80f786fe737bff899ec2c4c6389 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:10:11 -0400 +Subject: [PATCH] net/mlx5e: Don't print error message due to invalid module + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 144297e2a24e3e54aee1180ec21120ea38822b97 +Author: Gal Pressman +Date: Thu Dec 25 15:27:16 2025 +0200 + + net/mlx5e: Don't print error message due to invalid module + + Dumping module EEPROM on newer modules is supported through the netlink + interface only. + + Querying with old userspace ethtool (or other tools, such as 'lshw') + which still uses the ioctl interface results in an error message that + could flood dmesg (in addition to the expected error return value). + The original message was added under the assumption that the driver + should be able to handle all module types, but now that such flows are + easily triggered from userspace, it doesn't serve its purpose. + + Change the log level of the print in mlx5_query_module_eeprom() to + debug. + + Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM") + Signed-off-by: Gal Pressman + Reviewed-by: Tariq Toukan + Signed-off-by: Mark Bloch + Link: https://patch.msgid.link/20251225132717.358820-5-mbloch@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c +index 3e52579bc2af..959b568c4da9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c +@@ -431,7 +431,8 @@ int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, + mlx5_qsfp_eeprom_params_set(&query.i2c_address, &query.page, &offset); + break; + default: +- mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", module_id); ++ mlx5_core_dbg(dev, "Module ID not recognized: 0x%x\n", ++ module_id); + return -EINVAL; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1653-net-mlx5e-fix-crash-on-profile-change-rollback-failure.patch b/SOURCES/1653-net-mlx5e-fix-crash-on-profile-change-rollback-failure.patch new file mode 100644 index 000000000..0f1494e96 --- /dev/null +++ b/SOURCES/1653-net-mlx5e-fix-crash-on-profile-change-rollback-failure.patch @@ -0,0 +1,232 @@ +From 0e08840d988b62eac0ef37571ee3e7f2c41c94ea Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:10:38 -0400 +Subject: [PATCH] net/mlx5e: Fix crash on profile change rollback failure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 4dadc4077e3f77d6d31e199a925fc7a705e7adeb +Author: Saeed Mahameed +Date: Thu Jan 8 13:26:54 2026 -0800 + + net/mlx5e: Fix crash on profile change rollback failure + + mlx5e_netdev_change_profile can fail to attach a new profile and can + fail to rollback to old profile, in such case, we could end up with a + dangling netdev with a fully reset netdev_priv. A retry to change + profile, e.g. another attempt to call mlx5e_netdev_change_profile via + switchdev mode change, will crash trying to access the now NULL + priv->mdev. + + This fix allows mlx5e_netdev_change_profile() to handle previous + failures and an empty priv, by not assuming priv is valid. + + Pass netdev and mdev to all flows requiring + mlx5e_netdev_change_profile() and avoid passing priv. + In mlx5e_netdev_change_profile() check if current priv is valid, and if + not, just attach the new profile without trying to access the old one. + + This fixes the following oops, when enabling switchdev mode for the 2nd + time after first time failure: + + ## Enabling switchdev mode first time: + + mlx5_core 0012:03:00.1: E-Switch: Supported tc chains and prios offload + workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR + mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 + mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12 + workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR + mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 + mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12 + ^^^^^^^^ + mlx5_core 0000:00:03.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) + + ## retry: Enabling switchdev mode 2nd time: + + mlx5_core 0000:00:03.0: E-Switch: Supported tc chains and prios offload + BUG: kernel NULL pointer dereference, address: 0000000000000038 + #PF: supervisor read access in kernel mode + #PF: error_code(0x0000) - not-present page + PGD 0 P4D 0 + Oops: Oops: 0000 [#1] SMP NOPTI + CPU: 13 UID: 0 PID: 520 Comm: devlink Not tainted 6.18.0-rc4+ #91 PREEMPT(voluntary) + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 + RIP: 0010:mlx5e_detach_netdev+0x3c/0x90 + Code: 50 00 00 f0 80 4f 78 02 48 8b bf e8 07 00 00 48 85 ff 74 16 48 8b 73 78 48 d1 ee 83 e6 01 83 f6 01 40 0f b6 f6 e8 c4 42 00 00 <48> 8b 45 38 48 85 c0 74 08 48 89 df e8 cc 47 40 1e 48 8b bb f0 07 + RSP: 0018:ffffc90000673890 EFLAGS: 00010246 + RAX: 0000000000000000 RBX: ffff8881036a89c0 RCX: 0000000000000000 + RDX: ffff888113f63800 RSI: ffffffff822fe720 RDI: 0000000000000000 + RBP: 0000000000000000 R08: 0000000000002dcd R09: 0000000000000000 + R10: ffffc900006738e8 R11: 00000000ffffffff R12: 0000000000000000 + R13: 0000000000000000 R14: ffff8881036a89c0 R15: 0000000000000000 + FS: 00007fdfb8384740(0000) GS:ffff88856a9d6000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000038 CR3: 0000000112ae0005 CR4: 0000000000370ef0 + Call Trace: + + mlx5e_netdev_change_profile+0x45/0xb0 + mlx5e_vport_rep_load+0x27b/0x2d0 + mlx5_esw_offloads_rep_load+0x72/0xf0 + esw_offloads_enable+0x5d0/0x970 + mlx5_eswitch_enable_locked+0x349/0x430 + ? is_mp_supported+0x57/0xb0 + mlx5_devlink_eswitch_mode_set+0x26b/0x430 + devlink_nl_eswitch_set_doit+0x6f/0xf0 + genl_family_rcv_msg_doit+0xe8/0x140 + genl_rcv_msg+0x18b/0x290 + ? __pfx_devlink_nl_pre_doit+0x10/0x10 + ? __pfx_devlink_nl_eswitch_set_doit+0x10/0x10 + ? __pfx_devlink_nl_post_doit+0x10/0x10 + ? __pfx_genl_rcv_msg+0x10/0x10 + netlink_rcv_skb+0x52/0x100 + genl_rcv+0x28/0x40 + netlink_unicast+0x282/0x3e0 + ? __alloc_skb+0xd6/0x190 + netlink_sendmsg+0x1f7/0x430 + __sys_sendto+0x213/0x220 + ? __sys_recvmsg+0x6a/0xd0 + __x64_sys_sendto+0x24/0x30 + do_syscall_64+0x50/0x1f0 + entry_SYSCALL_64_after_hwframe+0x76/0x7e + RIP: 0033:0x7fdfb8495047 + + Fixes: c4d7eb57687f ("net/mxl5e: Add change profile method") + Signed-off-by: Saeed Mahameed + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20260108212657.25090-2-saeed@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index a1d33c78aedd..80c8a37c8e8e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -1236,9 +1236,12 @@ mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *prof + int mlx5e_attach_netdev(struct mlx5e_priv *priv); + void mlx5e_detach_netdev(struct mlx5e_priv *priv); + void mlx5e_destroy_netdev(struct mlx5e_priv *priv); +-int mlx5e_netdev_change_profile(struct mlx5e_priv *priv, +- const struct mlx5e_profile *new_profile, void *new_ppriv); +-void mlx5e_netdev_attach_nic_profile(struct mlx5e_priv *priv); ++int mlx5e_netdev_change_profile(struct net_device *netdev, ++ struct mlx5_core_dev *mdev, ++ const struct mlx5e_profile *new_profile, ++ void *new_ppriv); ++void mlx5e_netdev_attach_nic_profile(struct net_device *netdev, ++ struct mlx5_core_dev *mdev); + void mlx5e_set_netdev_mtu_boundaries(struct mlx5e_priv *priv); + void mlx5e_build_nic_params(struct mlx5e_priv *priv, struct mlx5e_xsk *xsk, u16 mtu); + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index e69a67aa54f4..b2d90097186e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -6505,19 +6505,28 @@ mlx5e_netdev_attach_profile(struct net_device *netdev, struct mlx5_core_dev *mde + return err; + } + +-int mlx5e_netdev_change_profile(struct mlx5e_priv *priv, +- const struct mlx5e_profile *new_profile, void *new_ppriv) ++int mlx5e_netdev_change_profile(struct net_device *netdev, ++ struct mlx5_core_dev *mdev, ++ const struct mlx5e_profile *new_profile, ++ void *new_ppriv) + { +- const struct mlx5e_profile *orig_profile = priv->profile; +- struct net_device *netdev = priv->netdev; +- struct mlx5_core_dev *mdev = priv->mdev; +- void *orig_ppriv = priv->ppriv; ++ struct mlx5e_priv *priv = netdev_priv(netdev); ++ const struct mlx5e_profile *orig_profile; + int err, rollback_err; ++ void *orig_ppriv; + +- /* cleanup old profile */ +- mlx5e_detach_netdev(priv); +- priv->profile->cleanup(priv); +- mlx5e_priv_cleanup(priv); ++ orig_profile = priv->profile; ++ orig_ppriv = priv->ppriv; ++ ++ /* NULL could happen if previous change_profile failed to rollback */ ++ if (priv->profile) { ++ WARN_ON_ONCE(priv->mdev != mdev); ++ /* cleanup old profile */ ++ mlx5e_detach_netdev(priv); ++ priv->profile->cleanup(priv); ++ mlx5e_priv_cleanup(priv); ++ } ++ /* priv members are not valid from this point ... */ + + if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) { + mlx5e_netdev_init_profile(netdev, mdev, new_profile, new_ppriv); +@@ -6534,16 +6543,25 @@ int mlx5e_netdev_change_profile(struct mlx5e_priv *priv, + return 0; + + rollback: ++ if (!orig_profile) { ++ netdev_warn(netdev, "no original profile to rollback to\n"); ++ priv->profile = NULL; ++ return err; ++ } ++ + rollback_err = mlx5e_netdev_attach_profile(netdev, mdev, orig_profile, orig_ppriv); +- if (rollback_err) +- netdev_err(netdev, "%s: failed to rollback to orig profile, %d\n", +- __func__, rollback_err); ++ if (rollback_err) { ++ netdev_err(netdev, "failed to rollback to orig profile, %d\n", ++ rollback_err); ++ priv->profile = NULL; ++ } + return err; + } + +-void mlx5e_netdev_attach_nic_profile(struct mlx5e_priv *priv) ++void mlx5e_netdev_attach_nic_profile(struct net_device *netdev, ++ struct mlx5_core_dev *mdev) + { +- mlx5e_netdev_change_profile(priv, &mlx5e_nic_profile, NULL); ++ mlx5e_netdev_change_profile(netdev, mdev, &mlx5e_nic_profile, NULL); + } + + void mlx5e_destroy_netdev(struct mlx5e_priv *priv) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +index 8490e2039f7f..2b1b6e094ba8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +@@ -1501,17 +1501,16 @@ mlx5e_vport_uplink_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep * + { + struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep); + struct net_device *netdev; +- struct mlx5e_priv *priv; + int err; + + netdev = mlx5_uplink_netdev_get(dev); + if (!netdev) + return 0; + +- priv = netdev_priv(netdev); +- rpriv->netdev = priv->netdev; +- err = mlx5e_netdev_change_profile(priv, &mlx5e_uplink_rep_profile, +- rpriv); ++ /* must not use netdev_priv(netdev), it might not be initialized yet */ ++ rpriv->netdev = netdev; ++ err = mlx5e_netdev_change_profile(netdev, dev, ++ &mlx5e_uplink_rep_profile, rpriv); + mlx5_uplink_netdev_put(dev, netdev); + return err; + } +@@ -1539,7 +1538,7 @@ mlx5e_vport_uplink_rep_unload(struct mlx5e_rep_priv *rpriv) + if (!(priv->mdev->priv.flags & MLX5_PRIV_FLAGS_SWITCH_LEGACY)) + unregister_netdev(netdev); + +- mlx5e_netdev_attach_nic_profile(priv); ++ mlx5e_netdev_attach_nic_profile(netdev, priv->mdev); + } + + static int +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1654-net-mlx5e-don-t-store-mlx5e-priv-in-mlx5e-dev-devlink-priv.patch b/SOURCES/1654-net-mlx5e-don-t-store-mlx5e-priv-in-mlx5e-dev-devlink-priv.patch new file mode 100644 index 000000000..5f8b18f7c --- /dev/null +++ b/SOURCES/1654-net-mlx5e-don-t-store-mlx5e-priv-in-mlx5e-dev-devlink-priv.patch @@ -0,0 +1,152 @@ +From 166d3efdac38dc1e7edca6f2762c68714219ccb7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:10:38 -0400 +Subject: [PATCH] net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 123eda2e5b1638e298e3a66bb1e64a8da92de5e1 +Author: Saeed Mahameed +Date: Thu Jan 8 13:26:55 2026 -0800 + + net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv + + mlx5e_priv is an unstable structure that can be memset(0) if profile + attaching fails, mlx5e_priv in mlx5e_dev devlink private is used to + reference the netdev and mdev associated with that struct. Instead, + store netdev directly into mlx5e_dev and get mdev from the containing + mlx5_adev aux device structure. + + This fixes a kernel oops in mlx5e_remove when switchdev mode fails due + to change profile failure. + + $ devlink dev eswitch set pci/0000:00:03.0 mode switchdev + Error: mlx5_core: Failed setting eswitch to offloads. + dmesg: + workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR + mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 + mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12 + workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR + mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 + mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12 + + $ devlink dev reload pci/0000:00:03.0 ==> oops + + BUG: kernel NULL pointer dereference, address: 0000000000000520 + #PF: supervisor read access in kernel mode + #PF: error_code(0x0000) - not-present page + PGD 0 P4D 0 + Oops: Oops: 0000 [#1] SMP NOPTI + CPU: 3 UID: 0 PID: 521 Comm: devlink Not tainted 6.18.0-rc5+ #117 PREEMPT(voluntary) + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 + RIP: 0010:mlx5e_remove+0x68/0x130 + RSP: 0018:ffffc900034838f0 EFLAGS: 00010246 + RAX: ffff88810283c380 RBX: ffff888101874400 RCX: ffffffff826ffc45 + RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 + RBP: ffff888102d789c0 R08: ffff8881007137f0 R09: ffff888100264e10 + R10: ffffc90003483898 R11: ffffc900034838a0 R12: ffff888100d261a0 + R13: ffff888100d261a0 R14: ffff8881018749a0 R15: ffff888101874400 + FS: 00007f8565fea740(0000) GS:ffff88856a759000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000520 CR3: 000000010b11a004 CR4: 0000000000370ef0 + Call Trace: + + device_release_driver_internal+0x19c/0x200 + bus_remove_device+0xc6/0x130 + device_del+0x160/0x3d0 + ? devl_param_driverinit_value_get+0x2d/0x90 + mlx5_detach_device+0x89/0xe0 + mlx5_unload_one_devl_locked+0x3a/0x70 + mlx5_devlink_reload_down+0xc8/0x220 + devlink_reload+0x7d/0x260 + devlink_nl_reload_doit+0x45b/0x5a0 + genl_family_rcv_msg_doit+0xe8/0x140 + + Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device") + Fixes: c4d7eb57687f ("net/mxl5e: Add change profile method") + Signed-off-by: Saeed Mahameed + Link: https://patch.msgid.link/20260108212657.25090-3-saeed@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 80c8a37c8e8e..bae15968cb37 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -955,7 +955,7 @@ struct mlx5e_priv { + }; + + struct mlx5e_dev { +- struct mlx5e_priv *priv; ++ struct net_device *netdev; + struct devlink_port dl_port; + }; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index b2d90097186e..d91bf655d291 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -6576,8 +6576,8 @@ static int _mlx5e_resume(struct auxiliary_device *adev) + { + struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev); + struct mlx5e_dev *mlx5e_dev = auxiliary_get_drvdata(adev); +- struct mlx5e_priv *priv = mlx5e_dev->priv; +- struct net_device *netdev = priv->netdev; ++ struct mlx5e_priv *priv = netdev_priv(mlx5e_dev->netdev); ++ struct net_device *netdev = mlx5e_dev->netdev; + struct mlx5_core_dev *mdev = edev->mdev; + struct mlx5_core_dev *pos, *to; + int err, i; +@@ -6623,10 +6623,11 @@ static int mlx5e_resume(struct auxiliary_device *adev) + + static int _mlx5e_suspend(struct auxiliary_device *adev, bool pre_netdev_reg) + { ++ struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev); + struct mlx5e_dev *mlx5e_dev = auxiliary_get_drvdata(adev); +- struct mlx5e_priv *priv = mlx5e_dev->priv; +- struct net_device *netdev = priv->netdev; +- struct mlx5_core_dev *mdev = priv->mdev; ++ struct mlx5e_priv *priv = netdev_priv(mlx5e_dev->netdev); ++ struct net_device *netdev = mlx5e_dev->netdev; ++ struct mlx5_core_dev *mdev = edev->mdev; + struct mlx5_core_dev *pos; + int i; + +@@ -6687,11 +6688,11 @@ static int _mlx5e_probe(struct auxiliary_device *adev) + goto err_devlink_port_unregister; + } + SET_NETDEV_DEVLINK_PORT(netdev, &mlx5e_dev->dl_port); ++ mlx5e_dev->netdev = netdev; + + mlx5e_build_nic_netdev(netdev); + + priv = netdev_priv(netdev); +- mlx5e_dev->priv = priv; + + priv->profile = profile; + priv->ppriv = NULL; +@@ -6754,7 +6755,8 @@ static void _mlx5e_remove(struct auxiliary_device *adev) + { + struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev); + struct mlx5e_dev *mlx5e_dev = auxiliary_get_drvdata(adev); +- struct mlx5e_priv *priv = mlx5e_dev->priv; ++ struct net_device *netdev = mlx5e_dev->netdev; ++ struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5_core_dev *mdev = edev->mdev; + + mlx5_core_uplink_netdev_set(mdev, NULL); +@@ -6763,8 +6765,8 @@ static void _mlx5e_remove(struct auxiliary_device *adev) + * if it's from legacy mode. If from switchdev mode, it + * is already unregistered before changing to NIC profile. + */ +- if (priv->netdev->reg_state == NETREG_REGISTERED) { +- unregister_netdev(priv->netdev); ++ if (netdev->reg_state == NETREG_REGISTERED) { ++ unregister_netdev(netdev); + _mlx5e_suspend(adev, false); + } else { + struct mlx5_core_dev *pos; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1655-net-mlx5e-pass-netdev-to-mlx5e-destroy-netdev-instead-of-pri.patch b/SOURCES/1655-net-mlx5e-pass-netdev-to-mlx5e-destroy-netdev-instead-of-pri.patch new file mode 100644 index 000000000..5f7b21fc2 --- /dev/null +++ b/SOURCES/1655-net-mlx5e-pass-netdev-to-mlx5e-destroy-netdev-instead-of-pri.patch @@ -0,0 +1,163 @@ +From 6bf769a0587bdc4c10766c7846eb147c44ebfef8 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:10:38 -0400 +Subject: [PATCH] net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of + priv + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 4ef8512e1427111f7ba92b4a847d181ff0aeec42 +Author: Saeed Mahameed +Date: Thu Jan 8 13:26:56 2026 -0800 + + net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv + + mlx5e_priv is an unstable structure that can be memset(0) if profile + attaching fails. + + Pass netdev to mlx5e_destroy_netdev() to guarantee it will work on a + valid netdev. + + On mlx5e_remove: Check validity of priv->profile, before attempting + to cleanup any resources that might be not there. + + This fixes a kernel oops in mlx5e_remove when switchdev mode fails due + to change profile failure. + + $ devlink dev eswitch set pci/0000:00:03.0 mode switchdev + Error: mlx5_core: Failed setting eswitch to offloads. + dmesg: + workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR + mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 + mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12 + workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR + mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 + mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12 + + $ devlink dev reload pci/0000:00:03.0 ==> oops + + BUG: kernel NULL pointer dereference, address: 0000000000000370 + PGD 0 P4D 0 + Oops: Oops: 0000 [#1] SMP NOPTI + CPU: 15 UID: 0 PID: 520 Comm: devlink Not tainted 6.18.0-rc5+ #115 PREEMPT(voluntary) + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 + RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 + RSP: 0018:ffffc9000083f8b8 EFLAGS: 00010286 + RAX: ffff8881126fc380 RBX: ffff8881015ac400 RCX: ffffffff826ffc45 + RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881035109c0 + RBP: ffff8881035109c0 R08: ffff888101e3e838 R09: ffff888100264e10 + R10: ffffc9000083f898 R11: ffffc9000083f8a0 R12: ffff888101b921a0 + R13: ffff888101b921a0 R14: ffff8881015ac9a0 R15: ffff8881015ac400 + FS: 00007f789a3c8740(0000) GS:ffff88856aa59000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000370 CR3: 000000010b6c0001 CR4: 0000000000370ef0 + Call Trace: + + mlx5e_remove+0x57/0x110 + device_release_driver_internal+0x19c/0x200 + bus_remove_device+0xc6/0x130 + device_del+0x160/0x3d0 + ? devl_param_driverinit_value_get+0x2d/0x90 + mlx5_detach_device+0x89/0xe0 + mlx5_unload_one_devl_locked+0x3a/0x70 + mlx5_devlink_reload_down+0xc8/0x220 + devlink_reload+0x7d/0x260 + devlink_nl_reload_doit+0x45b/0x5a0 + genl_family_rcv_msg_doit+0xe8/0x140 + + Fixes: c4d7eb57687f ("net/mxl5e: Add change profile method") + Signed-off-by: Saeed Mahameed + Reviewed-by: Shay Drori + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20260108212657.25090-4-saeed@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index bae15968cb37..32224bd1a0e7 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -1235,7 +1235,7 @@ struct net_device * + mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *profile); + int mlx5e_attach_netdev(struct mlx5e_priv *priv); + void mlx5e_detach_netdev(struct mlx5e_priv *priv); +-void mlx5e_destroy_netdev(struct mlx5e_priv *priv); ++void mlx5e_destroy_netdev(struct net_device *netdev); + int mlx5e_netdev_change_profile(struct net_device *netdev, + struct mlx5_core_dev *mdev, + const struct mlx5e_profile *new_profile, +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index d91bf655d291..7e3618dfa888 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -6564,11 +6564,12 @@ void mlx5e_netdev_attach_nic_profile(struct net_device *netdev, + mlx5e_netdev_change_profile(netdev, mdev, &mlx5e_nic_profile, NULL); + } + +-void mlx5e_destroy_netdev(struct mlx5e_priv *priv) ++void mlx5e_destroy_netdev(struct net_device *netdev) + { +- struct net_device *netdev = priv->netdev; ++ struct mlx5e_priv *priv = netdev_priv(netdev); + +- mlx5e_priv_cleanup(priv); ++ if (priv->profile) ++ mlx5e_priv_cleanup(priv); + free_netdev(netdev); + } + +@@ -6725,7 +6726,7 @@ static int _mlx5e_probe(struct auxiliary_device *adev) + err_profile_cleanup: + profile->cleanup(priv); + err_destroy_netdev: +- mlx5e_destroy_netdev(priv); ++ mlx5e_destroy_netdev(netdev); + err_devlink_port_unregister: + mlx5e_devlink_port_unregister(mlx5e_dev); + err_devlink_unregister: +@@ -6760,7 +6761,9 @@ static void _mlx5e_remove(struct auxiliary_device *adev) + struct mlx5_core_dev *mdev = edev->mdev; + + mlx5_core_uplink_netdev_set(mdev, NULL); +- mlx5e_dcbnl_delete_app(priv); ++ ++ if (priv->profile) ++ mlx5e_dcbnl_delete_app(priv); + /* When unload driver, the netdev is in registered state + * if it's from legacy mode. If from switchdev mode, it + * is already unregistered before changing to NIC profile. +@@ -6781,7 +6784,7 @@ static void _mlx5e_remove(struct auxiliary_device *adev) + /* Avoid cleanup if profile rollback failed. */ + if (priv->profile) + priv->profile->cleanup(priv); +- mlx5e_destroy_netdev(priv); ++ mlx5e_destroy_netdev(netdev); + mlx5e_devlink_port_unregister(mlx5e_dev); + mlx5e_destroy_devlink(mlx5e_dev); + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +index 2b1b6e094ba8..493e0f01b5a8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +@@ -1604,7 +1604,7 @@ mlx5e_vport_vf_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep) + priv->profile->cleanup(priv); + + err_destroy_netdev: +- mlx5e_destroy_netdev(netdev_priv(netdev)); ++ mlx5e_destroy_netdev(netdev); + return err; + } + +@@ -1659,7 +1659,7 @@ mlx5e_vport_rep_unload(struct mlx5_eswitch_rep *rep) + mlx5e_rep_vnic_reporter_destroy(priv); + mlx5e_detach_netdev(priv); + priv->profile->cleanup(priv); +- mlx5e_destroy_netdev(priv); ++ mlx5e_destroy_netdev(netdev); + free_ppriv: + kvfree(ppriv); /* mlx5e_rep_priv */ + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1656-net-mlx5e-restore-destroying-state-bit-after-profile-cleanup.patch b/SOURCES/1656-net-mlx5e-restore-destroying-state-bit-after-profile-cleanup.patch new file mode 100644 index 000000000..0a4316644 --- /dev/null +++ b/SOURCES/1656-net-mlx5e-restore-destroying-state-bit-after-profile-cleanup.patch @@ -0,0 +1,71 @@ +From 93b9795b5f597068b48a83b917d80b3ed6386e81 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:10:38 -0400 +Subject: [PATCH] net/mlx5e: Restore destroying state bit after profile cleanup + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5629f8859dca7ef74d7314b60de6a957f23166c0 +Author: Saeed Mahameed +Date: Thu Jan 8 13:26:57 2026 -0800 + + net/mlx5e: Restore destroying state bit after profile cleanup + + Profile rollback can fail in mlx5e_netdev_change_profile() and we will + end up with invalid mlx5e_priv memset to 0, we must maintain the + 'destroying' bit in order to gracefully shutdown even if the + profile/priv are not valid. + + This patch maintains the previous state of the 'destroying' state of + mlx5e_priv after priv cleanup, to allow the remove flow to cleanup + common resources from mlx5_core to avoid FW fatal errors as seen below: + + $ devlink dev eswitch set pci/0000:00:03.0 mode switchdev + Error: mlx5_core: Failed setting eswitch to offloads. + dmesg: mlx5_core 0000:00:03.0 enp0s3np0: failed to rollback to orig profile, ... + + $ devlink dev reload pci/0000:00:03.0 + + mlx5_core 0000:00:03.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) + mlx5_core 0000:00:03.0: poll_health:803:(pid 519): Fatal error 3 detected + mlx5_core 0000:00:03.0: firmware version: 28.41.1000 + mlx5_core 0000:00:03.0: 0.000 Gb/s available PCIe bandwidth (Unknown x255 link) + mlx5_core 0000:00:03.0: mlx5_function_enable:1200:(pid 519): enable hca failed + mlx5_core 0000:00:03.0: mlx5_function_enable:1200:(pid 519): enable hca failed + mlx5_core 0000:00:03.0: mlx5_health_try_recover:340:(pid 141): handling bad device here + mlx5_core 0000:00:03.0: mlx5_handle_bad_state:285:(pid 141): Expected to see disabled NIC but it is full driver + mlx5_core 0000:00:03.0: mlx5_error_sw_reset:236:(pid 141): start + mlx5_core 0000:00:03.0: NIC IFC still 0 after 4000ms. + + Fixes: c4d7eb57687f ("net/mxl5e: Add change profile method") + Signed-off-by: Saeed Mahameed + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20260108212657.25090-5-saeed@kernel.org + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 7e3618dfa888..be18d6cfd35b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -6253,6 +6253,7 @@ int mlx5e_priv_init(struct mlx5e_priv *priv, + + void mlx5e_priv_cleanup(struct mlx5e_priv *priv) + { ++ bool destroying = test_bit(MLX5E_STATE_DESTROYING, &priv->state); + int i; + + /* bail if change profile failed and also rollback failed */ +@@ -6279,6 +6280,8 @@ void mlx5e_priv_cleanup(struct mlx5e_priv *priv) + } + + memset(priv, 0, sizeof(*priv)); ++ if (destroying) /* restore destroying bit, to allow unload */ ++ set_bit(MLX5E_STATE_DESTROYING, &priv->state); + } + + static unsigned int mlx5e_get_max_num_txqs(struct mlx5_core_dev *mdev, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1657-net-mlx5-fix-memory-leak-in-esw-acl-ingress-lgcy-setup.patch b/SOURCES/1657-net-mlx5-fix-memory-leak-in-esw-acl-ingress-lgcy-setup.patch new file mode 100644 index 000000000..bb577af17 --- /dev/null +++ b/SOURCES/1657-net-mlx5-fix-memory-leak-in-esw-acl-ingress-lgcy-setup.patch @@ -0,0 +1,48 @@ +From d65e10ccab7a91e2068ad9c6f4736dfe6da13d2f Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:10:39 -0400 +Subject: [PATCH] net/mlx5: Fix memory leak in esw_acl_ingress_lgcy_setup() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 108948f723b13874b7ebf6b3f1cc598a7de38622 +Author: Zilin Guan +Date: Tue Jan 20 13:46:40 2026 +0000 + + net/mlx5: Fix memory leak in esw_acl_ingress_lgcy_setup() + + In esw_acl_ingress_lgcy_setup(), if esw_acl_table_create() fails, + the function returns directly without releasing the previously + created counter, leading to a memory leak. + + Fix this by jumping to the out label instead of returning directly, + which aligns with the error handling logic of other paths in this + function. + + Compile tested only. Issue found using a prototype static analysis tool + and code review. + + Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") + Signed-off-by: Zilin Guan + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20260120134640.2717808-1-zilin@seu.edu.cn + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c +index 1c37098e09ea..49a637829c59 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c +@@ -188,7 +188,7 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, + if (IS_ERR(vport->ingress.acl)) { + err = PTR_ERR(vport->ingress.acl); + vport->ingress.acl = NULL; +- return err; ++ goto out; + } + + err = esw_acl_ingress_lgcy_groups_create(esw, vport); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1658-net-mlx5-fix-unbinding-uplink-netdev-in-switchdev-mode.patch b/SOURCES/1658-net-mlx5-fix-unbinding-uplink-netdev-in-switchdev-mode.patch new file mode 100644 index 000000000..bdd2b2bd7 --- /dev/null +++ b/SOURCES/1658-net-mlx5-fix-unbinding-uplink-netdev-in-switchdev-mode.patch @@ -0,0 +1,159 @@ +From 05b3733cda98e6ed503d943ce964f775e8199fc4 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:29:26 -0400 +Subject: [PATCH] net/mlx5: Fix Unbinding uplink-netdev in switchdev mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2ae8c7edea87f54609bda30963a099cd3c64b0bb +Author: Shay Drory +Date: Mon Jan 26 09:14:53 2026 +0200 + + net/mlx5: Fix Unbinding uplink-netdev in switchdev mode + + It is possible to unbind the uplink ETH driver while the E-Switch is + in switchdev mode. This leads to netdevice reference counting issues[1], + as the driver removal path was not designed to clean up from this state. + + During uplink ETH driver removal (_mlx5e_remove), the code now waits for + any concurrent E-Switch mode transition to finish. It then removes the + REPs auxiliary device, if exists. This ensures a graceful cleanup. + + [1] + unregister_netdevice: waiting for eth2 to become free. Usage count = 2 + ref_tracker: netdev@00000000c912e04b has 1/1 users at + ib_device_set_netdev+0x130/0x270 [ib_core] + mlx5_ib_vport_rep_load+0xf4/0x3e0 [mlx5_ib] + mlx5_esw_offloads_rep_load+0xc7/0xe0 [mlx5_core] + esw_offloads_enable+0x583/0x900 [mlx5_core] + mlx5_eswitch_enable_locked+0x1b2/0x290 [mlx5_core] + mlx5_devlink_eswitch_mode_set+0x107/0x3e0 [mlx5_core] + devlink_nl_eswitch_set_doit+0x60/0xd0 + genl_family_rcv_msg_doit+0xe0/0x130 + genl_rcv_msg+0x183/0x290 + netlink_rcv_skb+0x4b/0xf0 + genl_rcv+0x24/0x40 + netlink_unicast+0x255/0x380 + netlink_sendmsg+0x1f3/0x420 + __sock_sendmsg+0x38/0x60 + __sys_sendto+0x119/0x180 + __x64_sys_sendto+0x20/0x30 + + Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") + Signed-off-by: Shay Drory + Reviewed-by: Mark Bloch + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1769411695-18820-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c +index 64c04f52990f..781e39b5aa1d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c +@@ -575,3 +575,17 @@ bool mlx5_same_hw_devs(struct mlx5_core_dev *dev, struct mlx5_core_dev *peer_dev + return plen && flen && flen == plen && + !memcmp(fsystem_guid, psystem_guid, flen); + } ++ ++void mlx5_core_reps_aux_devs_remove(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_priv *priv = &dev->priv; ++ ++ if (priv->adev[MLX5_INTERFACE_PROTOCOL_ETH]) ++ device_lock_assert(&priv->adev[MLX5_INTERFACE_PROTOCOL_ETH]->adev.dev); ++ else ++ mlx5_core_err(dev, "ETH driver already removed\n"); ++ if (priv->adev[MLX5_INTERFACE_PROTOCOL_IB_REP]) ++ del_adev(&priv->adev[MLX5_INTERFACE_PROTOCOL_IB_REP]->adev); ++ if (priv->adev[MLX5_INTERFACE_PROTOCOL_ETH_REP]) ++ del_adev(&priv->adev[MLX5_INTERFACE_PROTOCOL_ETH_REP]->adev); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index be18d6cfd35b..4ccd20317759 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -6763,6 +6763,7 @@ static void _mlx5e_remove(struct auxiliary_device *adev) + struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5_core_dev *mdev = edev->mdev; + ++ mlx5_eswitch_safe_aux_devs_remove(mdev); + mlx5_core_uplink_netdev_set(mdev, NULL); + + if (priv->profile) +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index ad1073f7b79f..829b9ecca7bc 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -929,6 +929,7 @@ int mlx5_esw_ipsec_vf_packet_offload_set(struct mlx5_eswitch *esw, struct mlx5_v + int mlx5_esw_ipsec_vf_packet_offload_supported(struct mlx5_core_dev *dev, + u16 vport_num); + bool mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev); ++void mlx5_eswitch_safe_aux_devs_remove(struct mlx5_core_dev *dev); + #else /* CONFIG_MLX5_ESWITCH */ + /* eswitch API stubs */ + static inline int mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; } +@@ -1012,6 +1013,9 @@ mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) + return -EOPNOTSUPP; + } + ++static inline void ++mlx5_eswitch_safe_aux_devs_remove(struct mlx5_core_dev *dev) {} ++ + #endif /* CONFIG_MLX5_ESWITCH */ + + #endif /* __MLX5_ESWITCH_H__ */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 575b12079933..30bf164a067f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -3980,6 +3980,32 @@ static bool mlx5_devlink_switchdev_active_mode_change(struct mlx5_eswitch *esw, + return true; + } + ++#define MLX5_ESW_HOLD_TIMEOUT_MS 7000 ++#define MLX5_ESW_HOLD_RETRY_DELAY_MS 500 ++ ++void mlx5_eswitch_safe_aux_devs_remove(struct mlx5_core_dev *dev) ++{ ++ unsigned long timeout; ++ bool hold_esw = true; ++ ++ /* Wait for any concurrent eswitch mode transition to complete. */ ++ if (!mlx5_esw_hold(dev)) { ++ timeout = jiffies + msecs_to_jiffies(MLX5_ESW_HOLD_TIMEOUT_MS); ++ while (!mlx5_esw_hold(dev)) { ++ if (!time_before(jiffies, timeout)) { ++ hold_esw = false; ++ break; ++ } ++ msleep(MLX5_ESW_HOLD_RETRY_DELAY_MS); ++ } ++ } ++ if (hold_esw) { ++ if (mlx5_eswitch_mode(dev) == MLX5_ESWITCH_OFFLOADS) ++ mlx5_core_reps_aux_devs_remove(dev); ++ mlx5_esw_release(dev); ++ } ++} ++ + int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, + struct netlink_ext_ack *extack) + { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index cfebc110c02f..99b0a25054ef 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -290,6 +290,7 @@ int mlx5_register_device(struct mlx5_core_dev *dev); + void mlx5_unregister_device(struct mlx5_core_dev *dev); + void mlx5_dev_set_lightweight(struct mlx5_core_dev *dev); + bool mlx5_dev_is_lightweight(struct mlx5_core_dev *dev); ++void mlx5_core_reps_aux_devs_remove(struct mlx5_core_dev *dev); + + void mlx5_fw_reporters_create(struct mlx5_core_dev *dev); + int mlx5_query_mtpps(struct mlx5_core_dev *dev, u32 *mtpps, u32 mtpps_size); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1659-net-mlx5e-tc-delete-flows-only-for-existing-peers.patch b/SOURCES/1659-net-mlx5e-tc-delete-flows-only-for-existing-peers.patch new file mode 100644 index 000000000..4757065a8 --- /dev/null +++ b/SOURCES/1659-net-mlx5e-tc-delete-flows-only-for-existing-peers.patch @@ -0,0 +1,134 @@ +From 970afdeeddc84cae0ac878cb44d284e5d0fdb77a Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:29:26 -0400 +Subject: [PATCH] net/mlx5e: TC, delete flows only for existing peers + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f67666938ae626cbda63fbf5176b3583c07e7124 +Author: Mark Bloch +Date: Mon Jan 26 09:14:54 2026 +0200 + + net/mlx5e: TC, delete flows only for existing peers + + When deleting TC steering flows, iterate only over actual devcom + peers instead of assuming all possible ports exist. This avoids + touching non-existent peers and ensures cleanup is limited to + devices the driver is currently connected to. + + BUG: kernel NULL pointer dereference, address: 0000000000000008 + #PF: supervisor write access in kernel mode + #PF: error_code(0x0002) - not-present page + PGD 133c8a067 P4D 0 + Oops: Oops: 0002 [#1] SMP + CPU: 19 UID: 0 PID: 2169 Comm: tc Not tainted 6.18.0+ #156 NONE + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 + RIP: 0010:mlx5e_tc_del_fdb_peers_flow+0xbe/0x200 [mlx5_core] + Code: 00 00 a8 08 74 a8 49 8b 46 18 f6 c4 02 74 9f 4c 8d bf a0 12 00 00 4c 89 ff e8 0e e7 96 e1 49 8b 44 24 08 49 8b 0c 24 4c 89 ff <48> 89 41 08 48 89 08 49 89 2c 24 49 89 5c 24 08 e8 7d ce 96 e1 49 + RSP: 0018:ff11000143867528 EFLAGS: 00010246 + RAX: 0000000000000000 RBX: dead000000000122 RCX: 0000000000000000 + RDX: ff11000143691580 RSI: ff110001026e5000 RDI: ff11000106f3d2a0 + RBP: dead000000000100 R08: 00000000000003fd R09: 0000000000000002 + R10: ff11000101c75690 R11: ff1100085faea178 R12: ff11000115f0ae78 + R13: 0000000000000000 R14: ff11000115f0a800 R15: ff11000106f3d2a0 + FS: 00007f35236bf740(0000) GS:ff110008dc809000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000008 CR3: 0000000157a01001 CR4: 0000000000373eb0 + Call Trace: + + mlx5e_tc_del_flow+0x46/0x270 [mlx5_core] + mlx5e_flow_put+0x25/0x50 [mlx5_core] + mlx5e_delete_flower+0x2a6/0x3e0 [mlx5_core] + tc_setup_cb_reoffload+0x20/0x80 + fl_reoffload+0x26f/0x2f0 [cls_flower] + ? mlx5e_tc_reoffload_flows_work+0xc0/0xc0 [mlx5_core] + ? mlx5e_tc_reoffload_flows_work+0xc0/0xc0 [mlx5_core] + tcf_block_playback_offloads+0x9e/0x1c0 + tcf_block_unbind+0x7b/0xd0 + tcf_block_setup+0x186/0x1d0 + tcf_block_offload_cmd.isra.0+0xef/0x130 + tcf_block_offload_unbind+0x43/0x70 + __tcf_block_put+0x85/0x160 + ingress_destroy+0x32/0x110 [sch_ingress] + __qdisc_destroy+0x44/0x100 + qdisc_graft+0x22b/0x610 + tc_get_qdisc+0x183/0x4d0 + rtnetlink_rcv_msg+0x2d7/0x3d0 + ? rtnl_calcit.isra.0+0x100/0x100 + netlink_rcv_skb+0x53/0x100 + netlink_unicast+0x249/0x320 + ? __alloc_skb+0x102/0x1f0 + netlink_sendmsg+0x1e3/0x420 + __sock_sendmsg+0x38/0x60 + ____sys_sendmsg+0x1ef/0x230 + ? copy_msghdr_from_user+0x6c/0xa0 + ___sys_sendmsg+0x7f/0xc0 + ? ___sys_recvmsg+0x8a/0xc0 + ? __sys_sendto+0x119/0x180 + __sys_sendmsg+0x61/0xb0 + do_syscall_64+0x55/0x640 + entry_SYSCALL_64_after_hwframe+0x4b/0x53 + RIP: 0033:0x7f35238bb764 + Code: 15 b9 86 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00 f3 0f 1e fa 80 3d e5 08 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55 + RSP: 002b:00007ffed4c35638 EFLAGS: 00000202 ORIG_RAX: 000000000000002e + RAX: ffffffffffffffda RBX: 000055a2efcc75e0 RCX: 00007f35238bb764 + RDX: 0000000000000000 RSI: 00007ffed4c356a0 RDI: 0000000000000003 + RBP: 00007ffed4c35710 R08: 0000000000000010 R09: 00007f3523984b20 + R10: 0000000000000004 R11: 0000000000000202 R12: 00007ffed4c35790 + R13: 000000006947df8f R14: 000055a2efcc75e0 R15: 00007ffed4c35780 + + Fixes: 9be6c21fdcf8 ("net/mlx5e: Handle offloads flows per peer") + Signed-off-by: Mark Bloch + Reviewed-by: Shay Drori + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1769411695-18820-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index 17ae07b47f7f..40ae3c61773f 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -2147,11 +2147,14 @@ static void mlx5e_tc_del_fdb_peer_flow(struct mlx5e_tc_flow *flow, + + static void mlx5e_tc_del_fdb_peers_flow(struct mlx5e_tc_flow *flow) + { ++ struct mlx5_devcom_comp_dev *devcom; ++ struct mlx5_devcom_comp_dev *pos; ++ struct mlx5_eswitch *peer_esw; + int i; + +- for (i = 0; i < MLX5_MAX_PORTS; i++) { +- if (i == mlx5_get_dev_index(flow->priv->mdev)) +- continue; ++ devcom = flow->priv->mdev->priv.eswitch->devcom; ++ mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) { ++ i = mlx5_get_dev_index(peer_esw->dev); + mlx5e_tc_del_fdb_peer_flow(flow, i); + } + } +@@ -5513,12 +5516,16 @@ int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags) + + void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw) + { ++ struct mlx5_devcom_comp_dev *devcom; ++ struct mlx5_devcom_comp_dev *pos; + struct mlx5e_tc_flow *flow, *tmp; ++ struct mlx5_eswitch *peer_esw; + int i; + +- for (i = 0; i < MLX5_MAX_PORTS; i++) { +- if (i == mlx5_get_dev_index(esw->dev)) +- continue; ++ devcom = esw->devcom; ++ ++ mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) { ++ i = mlx5_get_dev_index(peer_esw->dev); + list_for_each_entry_safe(flow, tmp, &esw->offloads.peer_flows[i], peer[i]) + mlx5e_tc_del_fdb_peers_flow(flow); + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1660-net-mlx5e-account-for-netdev-stats-in-ndo-get-stats64.patch b/SOURCES/1660-net-mlx5e-account-for-netdev-stats-in-ndo-get-stats64.patch new file mode 100644 index 000000000..e069e1f6e --- /dev/null +++ b/SOURCES/1660-net-mlx5e-account-for-netdev-stats-in-ndo-get-stats64.patch @@ -0,0 +1,77 @@ +From 790788e38aff2b1e3115669c125f7266d88be7a7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:29:26 -0400 +Subject: [PATCH] net/mlx5e: Account for netdev stats in ndo_get_stats64 + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 476681f10cc1e0e56e26856684e75d4678b072b2 +Author: Gal Pressman +Date: Mon Jan 26 09:14:55 2026 +0200 + + net/mlx5e: Account for netdev stats in ndo_get_stats64 + + The driver's ndo_get_stats64 callback is only reporting mlx5 counters, + without accounting for the netdev stats, causing errors from the network + stack to be invisible in statistics. + + Add netdev_stats_to_stats64() call to first populate the counters, then + add mlx5 counters on top, ensuring both are accounted for (where + appropriate). + + Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality") + Signed-off-by: Gal Pressman + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1769411695-18820-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index 4ccd20317759..af067ff1ebcf 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -4019,6 +4019,8 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) + mlx5e_queue_update_stats(priv); + } + ++ netdev_stats_to_stats64(stats, &dev->stats); ++ + if (mlx5e_is_uplink_rep(priv)) { + struct mlx5e_vport_stats *vstats = &priv->stats.vport; + +@@ -4035,21 +4037,21 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) + mlx5e_fold_sw_stats64(priv, stats); + } + +- stats->rx_missed_errors = priv->stats.qcnt.rx_out_of_buffer; +- stats->rx_dropped = PPORT_2863_GET(pstats, if_in_discards); ++ stats->rx_missed_errors += priv->stats.qcnt.rx_out_of_buffer; ++ stats->rx_dropped += PPORT_2863_GET(pstats, if_in_discards); + +- stats->rx_length_errors = ++ stats->rx_length_errors += + PPORT_802_3_GET(pstats, a_in_range_length_errors) + + PPORT_802_3_GET(pstats, a_out_of_range_length_field) + + PPORT_802_3_GET(pstats, a_frame_too_long_errors) + + VNIC_ENV_GET(&priv->stats.vnic, eth_wqe_too_small); +- stats->rx_crc_errors = ++ stats->rx_crc_errors += + PPORT_802_3_GET(pstats, a_frame_check_sequence_errors); +- stats->rx_frame_errors = PPORT_802_3_GET(pstats, a_alignment_errors); +- stats->tx_aborted_errors = PPORT_2863_GET(pstats, if_out_discards); +- stats->rx_errors = stats->rx_length_errors + stats->rx_crc_errors + +- stats->rx_frame_errors; +- stats->tx_errors = stats->tx_aborted_errors + stats->tx_carrier_errors; ++ stats->rx_frame_errors += PPORT_802_3_GET(pstats, a_alignment_errors); ++ stats->tx_aborted_errors += PPORT_2863_GET(pstats, if_out_discards); ++ stats->rx_errors += stats->rx_length_errors + stats->rx_crc_errors + ++ stats->rx_frame_errors; ++ stats->tx_errors += stats->tx_aborted_errors + stats->tx_carrier_errors; + } + + static void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1661-net-mlx5-fix-return-type-mismatch-in-mlx5-esw-vport-vhca-id.patch b/SOURCES/1661-net-mlx5-fix-return-type-mismatch-in-mlx5-esw-vport-vhca-id.patch new file mode 100644 index 000000000..cedfa59de --- /dev/null +++ b/SOURCES/1661-net-mlx5-fix-return-type-mismatch-in-mlx5-esw-vport-vhca-id.patch @@ -0,0 +1,47 @@ +From 3dd049ba859efec10f425696118e102fde8d4882 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:36:26 -0400 +Subject: [PATCH] net/mlx5: Fix return type mismatch in + mlx5_esw_vport_vhca_id() + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit ca12c4a155ebf84e9ef29b05ce979bc89364290f +Author: Zeng Chi +Date: Fri Jan 23 16:57:49 2026 +0800 + + net/mlx5: Fix return type mismatch in mlx5_esw_vport_vhca_id() + + The function mlx5_esw_vport_vhca_id() is declared to return bool, + but returns -EOPNOTSUPP (-45), which is an int error code. This + causes a signedness bug as reported by smatch. + + This patch fixes this smatch report: + drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:981 mlx5_esw_vport_vhca_id() + warn: signedness bug returning '(-45)' + + Fixes: 1baf30426553 ("net/mlx5: E-Switch, Set/Query hca cap via vhca id") + Reviewed-by: Parav Pandit + Signed-off-by: Zeng Chi + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20260123085749.1401969-1-zeng_chi911@163.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 829b9ecca7bc..714ad28e8445 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -1010,7 +1010,7 @@ mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev) + static inline bool + mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) + { +- return -EOPNOTSUPP; ++ return false; + } + + static inline void +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1662-net-mlx5-fs-fix-inverted-cap-check-in-tx-flow-table-root-dis.patch b/SOURCES/1662-net-mlx5-fs-fix-inverted-cap-check-in-tx-flow-table-root-dis.patch new file mode 100644 index 000000000..e531421cd --- /dev/null +++ b/SOURCES/1662-net-mlx5-fs-fix-inverted-cap-check-in-tx-flow-table-root-dis.patch @@ -0,0 +1,47 @@ +From bcedb8b2e9ce1a7b4d28614fa149cf305905c980 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:37:37 -0400 +Subject: [PATCH] net/mlx5: fs, Fix inverted cap check in tx flow table root + disconnect + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 2610a3d65691a1301ab10c92ff6ebab0bedf9199 +Author: Shay Drory +Date: Tue Jan 27 10:52:38 2026 +0200 + + net/mlx5: fs, Fix inverted cap check in tx flow table root disconnect + + The capability check for reset_root_to_default was inverted, causing + the function to return -EOPNOTSUPP when the capability IS supported, + rather than when it is NOT supported. + + Fix the capability check condition. + + Fixes: 3c9c34c32bc6 ("net/mlx5: fs, Command to control TX flow table root") + Signed-off-by: Shay Drory + Reviewed-by: Mark Bloch + Reviewed-by: Simon Horman + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1769503961-124173-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +index ced747bef641..c348ee62cd3a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +@@ -1198,7 +1198,8 @@ int mlx5_fs_cmd_set_tx_flow_table_root(struct mlx5_core_dev *dev, u32 ft_id, boo + u32 out[MLX5_ST_SZ_DW(set_flow_table_root_out)] = {}; + u32 in[MLX5_ST_SZ_DW(set_flow_table_root_in)] = {}; + +- if (disconnect && MLX5_CAP_FLOWTABLE_NIC_TX(dev, reset_root_to_default)) ++ if (disconnect && ++ !MLX5_CAP_FLOWTABLE_NIC_TX(dev, reset_root_to_default)) + return -EOPNOTSUPP; + + MLX5_SET(set_flow_table_root_in, in, opcode, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1663-net-mlx5-fix-vhca-id-access-call-trace-use-before-alloc.patch b/SOURCES/1663-net-mlx5-fix-vhca-id-access-call-trace-use-before-alloc.patch new file mode 100644 index 000000000..ff50060ec --- /dev/null +++ b/SOURCES/1663-net-mlx5-fix-vhca-id-access-call-trace-use-before-alloc.patch @@ -0,0 +1,157 @@ +From 92e9db10ad78811b7ae2dc215e3dd24de2322b8b Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:37:37 -0400 +Subject: [PATCH] net/mlx5: Fix vhca_id access call trace use before alloc + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit a8f930b7be7be3f18f14446df461e17137400407 +Author: Parav Pandit +Date: Tue Jan 27 10:52:40 2026 +0200 + + net/mlx5: Fix vhca_id access call trace use before alloc + + HCA CAP structure is allocated in mlx5_hca_caps_alloc(). + mlx5_mdev_init() + mlx5_hca_caps_alloc() + + And HCA CAP is read from the device in mlx5_init_one(). + + The vhca_id's debugfs file is published even before above two + operations are done. + Due to this when user reads the vhca id before the initialization, + following call trace is observed. + + Fix this by deferring debugfs publication until the HCA CAP is + allocated and read from the device. + + BUG: kernel NULL pointer dereference, address: 0000000000000004 + PGD 0 P4D 0 + Oops: Oops: 0000 [#1] SMP PTI + CPU: 23 UID: 0 PID: 6605 Comm: cat Kdump: loaded Not tainted 6.18.0-rc7-sf+ #110 PREEMPT(full) + Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016 + RIP: 0010:vhca_id_show+0x17/0x30 [mlx5_core] + Code: cb 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 47 70 48 c7 c6 45 f0 12 c1 48 8b 80 70 03 00 00 <8b> 50 04 0f ca 0f b7 d2 e8 8c 82 47 cb 31 c0 c3 cc cc cc cc 0f 1f + RSP: 0018:ffffd37f4f337d40 EFLAGS: 00010203 + RAX: 0000000000000000 RBX: ffff8f18445c9b40 RCX: 0000000000000001 + RDX: ffff8f1109825180 RSI: ffffffffc112f045 RDI: ffff8f18445c9b40 + RBP: 0000000000000000 R08: 0000645eac0d2928 R09: 0000000000000006 + R10: ffffd37f4f337d48 R11: 0000000000000000 R12: ffffd37f4f337dd8 + R13: ffffd37f4f337db0 R14: ffff8f18445c9b68 R15: 0000000000000001 + FS: 00007f3eea099580(0000) GS:ffff8f2090f1f000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000004 CR3: 00000008b64e4006 CR4: 00000000003726f0 + Call Trace: + + seq_read_iter+0x11f/0x4f0 + ? _raw_spin_unlock+0x15/0x30 + ? do_anonymous_page+0x104/0x810 + seq_read+0xf6/0x120 + ? srso_alias_untrain_ret+0x1/0x10 + full_proxy_read+0x5c/0x90 + vfs_read+0xad/0x320 + ? handle_mm_fault+0x1ab/0x290 + ksys_read+0x52/0xd0 + do_syscall_64+0x61/0x11e0 + entry_SYSCALL_64_after_hwframe+0x76/0x7e + + Fixes: dd3dd7263cde ("net/mlx5: Expose vhca_id to debugfs") + Signed-off-by: Parav Pandit + Reviewed-by: Shay Drori + Reviewed-by: Simon Horman + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1769503961-124173-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c +index 36806e813c33..1301c56e20d6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c +@@ -613,3 +613,19 @@ void mlx5_debug_cq_remove(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq) + cq->dbg = NULL; + } + } ++ ++static int vhca_id_show(struct seq_file *file, void *priv) ++{ ++ struct mlx5_core_dev *dev = file->private; ++ ++ seq_printf(file, "0x%x\n", MLX5_CAP_GEN(dev, vhca_id)); ++ return 0; ++} ++ ++DEFINE_SHOW_ATTRIBUTE(vhca_id); ++ ++void mlx5_vhca_debugfs_init(struct mlx5_core_dev *dev) ++{ ++ debugfs_create_file("vhca_id", 0400, dev->priv.dbg.dbg_root, dev, ++ &vhca_id_fops); ++} +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index bb794c276b7f..5f6a8eef1982 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1806,16 +1806,6 @@ static int mlx5_hca_caps_alloc(struct mlx5_core_dev *dev) + return -ENOMEM; + } + +-static int vhca_id_show(struct seq_file *file, void *priv) +-{ +- struct mlx5_core_dev *dev = file->private; +- +- seq_printf(file, "0x%x\n", MLX5_CAP_GEN(dev, vhca_id)); +- return 0; +-} +- +-DEFINE_SHOW_ATTRIBUTE(vhca_id); +- + static int mlx5_notifiers_init(struct mlx5_core_dev *dev) + { + int err; +@@ -1884,7 +1874,7 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx) + priv->numa_node = dev_to_node(mlx5_core_dma_dev(dev)); + priv->dbg.dbg_root = debugfs_create_dir(dev_name(dev->device), + mlx5_debugfs_root); +- debugfs_create_file("vhca_id", 0400, priv->dbg.dbg_root, dev, &vhca_id_fops); ++ + INIT_LIST_HEAD(&priv->traps); + + err = mlx5_cmd_init(dev); +@@ -2022,6 +2012,8 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id) + goto err_init_one; + } + ++ mlx5_vhca_debugfs_init(dev); ++ + pci_save_state(pdev); + return 0; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +index 99b0a25054ef..f2d74382fb85 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +@@ -258,6 +258,7 @@ int mlx5_wait_for_pages(struct mlx5_core_dev *dev, int *pages); + void mlx5_cmd_flush(struct mlx5_core_dev *dev); + void mlx5_cq_debugfs_init(struct mlx5_core_dev *dev); + void mlx5_cq_debugfs_cleanup(struct mlx5_core_dev *dev); ++void mlx5_vhca_debugfs_init(struct mlx5_core_dev *dev); + + int mlx5_query_pcam_reg(struct mlx5_core_dev *dev, u32 *pcam, u8 feature_group, + u8 access_reg_group); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c +index b706f1486504..c45540fe7d9d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c +@@ -76,6 +76,7 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia + goto init_one_err; + } + ++ mlx5_vhca_debugfs_init(mdev); + return 0; + + init_one_err: +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1664-net-mlx5e-skip-esn-replay-window-setup-for-ipsec-crypto-offl.patch b/SOURCES/1664-net-mlx5e-skip-esn-replay-window-setup-for-ipsec-crypto-offl.patch new file mode 100644 index 000000000..ce7f88556 --- /dev/null +++ b/SOURCES/1664-net-mlx5e-skip-esn-replay-window-setup-for-ipsec-crypto-offl.patch @@ -0,0 +1,53 @@ +From 62d638d5076e68952faa57a2858b1ef4ed296809 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 17:37:37 -0400 +Subject: [PATCH] net/mlx5e: Skip ESN replay window setup for IPsec crypto + offload + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 011be342dd24b5168a5dcf408b14c3babe503341 +Author: Jianbo Liu +Date: Tue Jan 27 10:52:41 2026 +0200 + + net/mlx5e: Skip ESN replay window setup for IPsec crypto offload + + Commit a5e400a985df ("net/mlx5e: Honor user choice of IPsec replay + window size") introduced logic to setup the ESN replay window size. + This logic is only valid for packet offload. + + However, the check to skip this block only covered outbound offloads. + It was not skipped for crypto offload, causing it to fall through to + the new switch statement and trigger its WARN_ON default case (for + instance, if a window larger than 256 bits was configured). + + Fix this by amending the condition to also skip the replay window + setup if the offload type is not XFRM_DEV_OFFLOAD_PACKET. + + Fixes: a5e400a985df ("net/mlx5e: Honor user choice of IPsec replay window size") + Signed-off-by: Jianbo Liu + Reviewed-by: Leon Romanovsky + Reviewed-by: Simon Horman + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/1769503961-124173-5-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +index a8fb4bec369c..9c7064187ed0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +@@ -430,7 +430,8 @@ void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry, + attrs->replay_esn.esn = sa_entry->esn_state.esn; + attrs->replay_esn.esn_msb = sa_entry->esn_state.esn_msb; + attrs->replay_esn.overlap = sa_entry->esn_state.overlap; +- if (attrs->dir == XFRM_DEV_OFFLOAD_OUT) ++ if (attrs->dir == XFRM_DEV_OFFLOAD_OUT || ++ x->xso.type != XFRM_DEV_OFFLOAD_PACKET) + goto skip_replay_window; + + switch (x->replay_esn->replay_window) { +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1665-rdma-mlx5-change-default-device-for-lag-slaves-in-rdma-trans.patch b/SOURCES/1665-rdma-mlx5-change-default-device-for-lag-slaves-in-rdma-trans.patch new file mode 100644 index 000000000..46e48de8f --- /dev/null +++ b/SOURCES/1665-rdma-mlx5-change-default-device-for-lag-slaves-in-rdma-trans.patch @@ -0,0 +1,138 @@ +From 12d9e4ddee7403d143feac428ed32f172d516daa Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 22:57:23 -0400 +Subject: [PATCH] RDMA/mlx5: Change default device for LAG slaves in RDMA + TRANSPORT namespaces + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 3506242da07156e6804c061554bd01d77c1b463b +Author: Patrisious Haddad +Date: Wed Oct 29 17:42:56 2025 +0200 + + RDMA/mlx5: Change default device for LAG slaves in RDMA TRANSPORT namespaces + + In case of a LAG configuration change the root namespace core device for + all of the LAG slaves to be the core device of the master device for + RDMA_TRANSPORT namespaces, in order to ensure all tables are created + through the master device. + Once the LAG is disabled revert back to the native core device. + + Signed-off-by: Patrisious Haddad + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-4-98bb707b5d57@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/ib_rep.c b/drivers/infiniband/hw/mlx5/ib_rep.c +index cc8859d3c2f5..bbecca405171 100644 +--- a/drivers/infiniband/hw/mlx5/ib_rep.c ++++ b/drivers/infiniband/hw/mlx5/ib_rep.c +@@ -44,6 +44,63 @@ static void mlx5_ib_num_ports_update(struct mlx5_core_dev *dev, u32 *num_ports) + } + } + ++static int mlx5_ib_set_owner_transport(struct mlx5_core_dev *cur_owner, ++ struct mlx5_core_dev *new_owner) ++{ ++ int ret; ++ ++ if (!MLX5_CAP_FLOWTABLE_RDMA_TRANSPORT_TX(cur_owner, ft_support) || ++ !MLX5_CAP_FLOWTABLE_RDMA_TRANSPORT_RX(cur_owner, ft_support)) ++ return 0; ++ ++ if (!MLX5_CAP_ADV_RDMA(new_owner, rdma_transport_manager) || ++ !MLX5_CAP_ADV_RDMA(new_owner, rdma_transport_manager_other_eswitch)) ++ return 0; ++ ++ ret = mlx5_fs_set_root_dev(cur_owner, new_owner, ++ FS_FT_RDMA_TRANSPORT_TX); ++ if (ret) ++ return ret; ++ ++ ret = mlx5_fs_set_root_dev(cur_owner, new_owner, ++ FS_FT_RDMA_TRANSPORT_RX); ++ if (ret) { ++ mlx5_fs_set_root_dev(cur_owner, cur_owner, ++ FS_FT_RDMA_TRANSPORT_TX); ++ return ret; ++ } ++ ++ return 0; ++} ++ ++static void mlx5_ib_release_transport(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_core_dev *peer_dev; ++ int i, ret; ++ ++ mlx5_lag_for_each_peer_mdev(dev, peer_dev, i) { ++ ret = mlx5_ib_set_owner_transport(peer_dev, peer_dev); ++ WARN_ON_ONCE(ret); ++ } ++} ++ ++static int mlx5_ib_take_transport(struct mlx5_core_dev *dev) ++{ ++ struct mlx5_core_dev *peer_dev; ++ int ret; ++ int i; ++ ++ mlx5_lag_for_each_peer_mdev(dev, peer_dev, i) { ++ ret = mlx5_ib_set_owner_transport(peer_dev, dev); ++ if (ret) { ++ mlx5_ib_release_transport(dev); ++ return ret; ++ } ++ } ++ ++ return 0; ++} ++ + static int + mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep) + { +@@ -88,10 +145,18 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep) + else + return mlx5_ib_set_vport_rep(lag_master, rep, vport_index); + ++ if (mlx5_lag_is_shared_fdb(dev)) { ++ ret = mlx5_ib_take_transport(lag_master); ++ if (ret) ++ return ret; ++ } ++ + ibdev = ib_alloc_device_with_net(mlx5_ib_dev, ib_dev, + mlx5_core_net(lag_master)); +- if (!ibdev) +- return -ENOMEM; ++ if (!ibdev) { ++ ret = -ENOMEM; ++ goto release_transport; ++ } + + ibdev->port = kcalloc(num_ports, sizeof(*ibdev->port), + GFP_KERNEL); +@@ -127,6 +192,10 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep) + kfree(ibdev->port); + fail_port: + ib_dealloc_device(&ibdev->ib_dev); ++release_transport: ++ if (mlx5_lag_is_shared_fdb(lag_master)) ++ mlx5_ib_release_transport(lag_master); ++ + return ret; + } + +@@ -182,6 +251,7 @@ mlx5_ib_vport_rep_unload(struct mlx5_eswitch_rep *rep) + esw = peer_mdev->priv.eswitch; + mlx5_eswitch_unregister_vport_reps(esw, REP_IB); + } ++ mlx5_ib_release_transport(mdev); + } + __mlx5_ib_remove(dev, dev->profile, MLX5_IB_STAGE_MAX); + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1666-rdma-mlx5-add-other-eswitch-support-for-devx-destruction.patch b/SOURCES/1666-rdma-mlx5-add-other-eswitch-support-for-devx-destruction.patch new file mode 100644 index 000000000..531e54457 --- /dev/null +++ b/SOURCES/1666-rdma-mlx5-add-other-eswitch-support-for-devx-destruction.patch @@ -0,0 +1,67 @@ +From 5e8af3841a8d260e3b9643e58cbea841e35f6cc7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 22:57:23 -0400 +Subject: [PATCH] RDMA/mlx5: Add other_eswitch support for devx destruction + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 5939decc64f6b9099c2c356d75047c66a6639e00 +Author: Patrisious Haddad +Date: Wed Oct 29 17:42:57 2025 +0200 + + RDMA/mlx5: Add other_eswitch support for devx destruction + + When building a devx object destruction command for steering objects add + consideration for other_eswitch argument to allow proper destruction for + objects that were created with it. + + Signed-off-by: Patrisious Haddad + Reviewed-by: Mark Bloch + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-5-98bb707b5d57@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c +index 4e2edf3378d7..e52a73bb94ee 100644 +--- a/drivers/infiniband/hw/mlx5/devx.c ++++ b/drivers/infiniband/hw/mlx5/devx.c +@@ -1225,6 +1225,11 @@ static void devx_obj_build_destroy_cmd(void *in, void *out, void *din, + MLX5_GET(create_flow_table_in, in, other_vport)); + MLX5_SET(destroy_flow_table_in, din, vport_number, + MLX5_GET(create_flow_table_in, in, vport_number)); ++ MLX5_SET(destroy_flow_table_in, din, other_eswitch, ++ MLX5_GET(create_flow_table_in, in, other_eswitch)); ++ MLX5_SET(destroy_flow_table_in, din, eswitch_owner_vhca_id, ++ MLX5_GET(create_flow_table_in, in, ++ eswitch_owner_vhca_id)); + MLX5_SET(destroy_flow_table_in, din, table_type, + MLX5_GET(create_flow_table_in, in, table_type)); + MLX5_SET(destroy_flow_table_in, din, table_id, *obj_id); +@@ -1237,6 +1242,11 @@ static void devx_obj_build_destroy_cmd(void *in, void *out, void *din, + MLX5_GET(create_flow_group_in, in, other_vport)); + MLX5_SET(destroy_flow_group_in, din, vport_number, + MLX5_GET(create_flow_group_in, in, vport_number)); ++ MLX5_SET(destroy_flow_group_in, din, other_eswitch, ++ MLX5_GET(create_flow_group_in, in, other_eswitch)); ++ MLX5_SET(destroy_flow_group_in, din, eswitch_owner_vhca_id, ++ MLX5_GET(create_flow_group_in, in, ++ eswitch_owner_vhca_id)); + MLX5_SET(destroy_flow_group_in, din, table_type, + MLX5_GET(create_flow_group_in, in, table_type)); + MLX5_SET(destroy_flow_group_in, din, table_id, +@@ -1251,6 +1261,10 @@ static void devx_obj_build_destroy_cmd(void *in, void *out, void *din, + MLX5_GET(set_fte_in, in, other_vport)); + MLX5_SET(delete_fte_in, din, vport_number, + MLX5_GET(set_fte_in, in, vport_number)); ++ MLX5_SET(delete_fte_in, din, other_eswitch, ++ MLX5_GET(set_fte_in, in, other_eswitch)); ++ MLX5_SET(delete_fte_in, din, eswitch_owner_vhca_id, ++ MLX5_GET(set_fte_in, in, eswitch_owner_vhca_id)); + MLX5_SET(delete_fte_in, din, table_type, + MLX5_GET(set_fte_in, in, table_type)); + MLX5_SET(delete_fte_in, din, table_id, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1667-rdma-mlx5-refactor-get-prio-function.patch b/SOURCES/1667-rdma-mlx5-refactor-get-prio-function.patch new file mode 100644 index 000000000..f10395f0f --- /dev/null +++ b/SOURCES/1667-rdma-mlx5-refactor-get-prio-function.patch @@ -0,0 +1,151 @@ +From 358f6d5032cf113b0dbe0ea3c4fd9f693e2c2870 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 22:57:23 -0400 +Subject: [PATCH] RDMA/mlx5: Refactor _get_prio() function + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit f277662b734e96bf38ce7d422b091f53df3ff8cb +Author: Patrisious Haddad +Date: Wed Oct 29 17:42:58 2025 +0200 + + RDMA/mlx5: Refactor _get_prio() function + + Refactor the _get_prio() function to remove redundant arguments by + reusing the existing flow table attributes struct instead of passing + attributes separately. This improves code clarity and maintainability. + + In addition allows downstream patch to add new parameter without + needing to change __get_prio() arguments. + + Signed-off-by: Patrisious Haddad + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-6-98bb707b5d57@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c +index b0f7663c24c1..c8a25370aa79 100644 +--- a/drivers/infiniband/hw/mlx5/fs.c ++++ b/drivers/infiniband/hw/mlx5/fs.c +@@ -691,22 +691,13 @@ static bool __maybe_unused mlx5_ib_shared_ft_allowed(struct ib_device *device) + return MLX5_CAP_GEN(dev->mdev, shared_object_to_user_object_allowed); + } + +-static struct mlx5_ib_flow_prio *_get_prio(struct mlx5_ib_dev *dev, +- struct mlx5_flow_namespace *ns, ++static struct mlx5_ib_flow_prio *_get_prio(struct mlx5_flow_namespace *ns, + struct mlx5_ib_flow_prio *prio, +- int priority, +- int num_entries, int num_groups, +- u32 flags, u16 vport) ++ struct mlx5_flow_table_attr *ft_attr) + { +- struct mlx5_flow_table_attr ft_attr = {}; + struct mlx5_flow_table *ft; + +- ft_attr.prio = priority; +- ft_attr.max_fte = num_entries; +- ft_attr.flags = flags; +- ft_attr.vport = vport; +- ft_attr.autogroup.max_num_groups = num_groups; +- ft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr); ++ ft = mlx5_create_auto_grouped_flow_table(ns, ft_attr); + if (IS_ERR(ft)) + return ERR_CAST(ft); + +@@ -720,6 +711,7 @@ static struct mlx5_ib_flow_prio *get_flow_table(struct mlx5_ib_dev *dev, + enum flow_table_type ft_type) + { + bool dont_trap = flow_attr->flags & IB_FLOW_ATTR_FLAGS_DONT_TRAP; ++ struct mlx5_flow_table_attr ft_attr = {}; + struct mlx5_flow_namespace *ns = NULL; + enum mlx5_flow_namespace_type fn_type; + struct mlx5_ib_flow_prio *prio; +@@ -797,11 +789,14 @@ static struct mlx5_ib_flow_prio *get_flow_table(struct mlx5_ib_dev *dev, + max_table_size = min_t(int, num_entries, max_table_size); + + ft = prio->flow_table; +- if (!ft) +- return _get_prio(dev, ns, prio, priority, max_table_size, +- num_groups, flags, 0); ++ if (ft) ++ return prio; + +- return prio; ++ ft_attr.prio = priority; ++ ft_attr.max_fte = max_table_size; ++ ft_attr.flags = flags; ++ ft_attr.autogroup.max_num_groups = num_groups; ++ return _get_prio(ns, prio, &ft_attr); + } + + enum { +@@ -950,6 +945,7 @@ static int get_per_qp_prio(struct mlx5_ib_dev *dev, + enum mlx5_ib_optional_counter_type type) + { + enum mlx5_ib_optional_counter_type per_qp_type; ++ struct mlx5_flow_table_attr ft_attr = {}; + enum mlx5_flow_namespace_type fn_type; + struct mlx5_flow_namespace *ns; + struct mlx5_ib_flow_prio *prio; +@@ -1003,7 +999,10 @@ static int get_per_qp_prio(struct mlx5_ib_dev *dev, + if (prio->flow_table) + return 0; + +- prio = _get_prio(dev, ns, prio, priority, MLX5_FS_MAX_POOL_SIZE, 1, 0, 0); ++ ft_attr.prio = priority; ++ ft_attr.max_fte = MLX5_FS_MAX_POOL_SIZE; ++ ft_attr.autogroup.max_num_groups = 1; ++ prio = _get_prio(ns, prio, &ft_attr); + if (IS_ERR(prio)) + return PTR_ERR(prio); + +@@ -1223,6 +1222,7 @@ int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, + struct mlx5_ib_op_fc *opfc, + enum mlx5_ib_optional_counter_type type) + { ++ struct mlx5_flow_table_attr ft_attr = {}; + enum mlx5_flow_namespace_type fn_type; + int priority, i, err, spec_num; + struct mlx5_flow_act flow_act = {}; +@@ -1304,8 +1304,10 @@ int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, + if (err) + goto free; + +- prio = _get_prio(dev, ns, prio, priority, +- dev->num_ports * MAX_OPFC_RULES, 1, 0, 0); ++ ft_attr.prio = priority; ++ ft_attr.max_fte = dev->num_ports * MAX_OPFC_RULES; ++ ft_attr.autogroup.max_num_groups = 1; ++ prio = _get_prio(ns, prio, &ft_attr); + if (IS_ERR(prio)) { + err = PTR_ERR(prio); + goto put_prio; +@@ -1903,6 +1905,7 @@ _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, + bool mcast, u32 ib_port) + { + struct mlx5_core_dev *ft_mdev = dev->mdev; ++ struct mlx5_flow_table_attr ft_attr = {}; + struct mlx5_flow_namespace *ns = NULL; + struct mlx5_ib_flow_prio *prio = NULL; + int max_table_size = 0; +@@ -2026,8 +2029,12 @@ _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, + if (prio->flow_table) + return prio; + +- return _get_prio(dev, ns, prio, priority, max_table_size, +- MLX5_FS_MAX_TYPES, flags, vport); ++ ft_attr.prio = priority; ++ ft_attr.max_fte = max_table_size; ++ ft_attr.flags = flags; ++ ft_attr.vport = vport; ++ ft_attr.autogroup.max_num_groups = MLX5_FS_MAX_TYPES; ++ return _get_prio(ns, prio, &ft_attr); + } + + static struct mlx5_ib_flow_handler * +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1668-rdma-mlx5-add-other-eswitch-support-to-userspace-tables.patch b/SOURCES/1668-rdma-mlx5-add-other-eswitch-support-to-userspace-tables.patch new file mode 100644 index 000000000..459d3fc52 --- /dev/null +++ b/SOURCES/1668-rdma-mlx5-add-other-eswitch-support-to-userspace-tables.patch @@ -0,0 +1,82 @@ +From 1e56cbcbe140a743961ae9209de4e33c933d1de7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 22:57:24 -0400 +Subject: [PATCH] RDMA/mlx5: Add other eswitch support to userspace tables + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6e79e210058e8f95fb3824e33b781960851ae7d1 +Author: Patrisious Haddad +Date: Wed Oct 29 17:42:59 2025 +0200 + + RDMA/mlx5: Add other eswitch support to userspace tables + + Allows the creation of RDMA TRANSPORT tables over VFs/SFs that + belong to another eswitch manager. Which is only possible for PFs that + were connected via a create_lag PRM command. + + Signed-off-by: Patrisious Haddad + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-7-98bb707b5d57@nvidia.com + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c +index c8a25370aa79..d17823ce7f38 100644 +--- a/drivers/infiniband/hw/mlx5/fs.c ++++ b/drivers/infiniband/hw/mlx5/fs.c +@@ -1874,7 +1874,7 @@ static int mlx5_ib_fill_transport_ns_info(struct mlx5_ib_dev *dev, + u32 *flags, u16 *vport_idx, + u16 *vport, + struct mlx5_core_dev **ft_mdev, +- u32 ib_port) ++ u32 ib_port, u16 *esw_owner_vhca_id) + { + struct mlx5_core_dev *esw_mdev; + +@@ -1888,8 +1888,13 @@ static int mlx5_ib_fill_transport_ns_info(struct mlx5_ib_dev *dev, + return -EINVAL; + + esw_mdev = mlx5_eswitch_get_core_dev(dev->port[ib_port - 1].rep->esw); +- if (esw_mdev != dev->mdev) +- return -EOPNOTSUPP; ++ if (esw_mdev != dev->mdev) { ++ if (!MLX5_CAP_ADV_RDMA(dev->mdev, ++ rdma_transport_manager_other_eswitch)) ++ return -EOPNOTSUPP; ++ *flags |= MLX5_FLOW_TABLE_OTHER_ESWITCH; ++ *esw_owner_vhca_id = MLX5_CAP_GEN(esw_mdev, vhca_id); ++ } + + *flags |= MLX5_FLOW_TABLE_OTHER_VPORT; + *ft_mdev = esw_mdev; +@@ -1908,6 +1913,7 @@ _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, + struct mlx5_flow_table_attr ft_attr = {}; + struct mlx5_flow_namespace *ns = NULL; + struct mlx5_ib_flow_prio *prio = NULL; ++ u16 esw_owner_vhca_id = 0; + int max_table_size = 0; + u16 vport_idx = 0; + bool esw_encap; +@@ -1969,7 +1975,8 @@ _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, + return ERR_PTR(-EINVAL); + ret = mlx5_ib_fill_transport_ns_info(dev, ns_type, &flags, + &vport_idx, &vport, +- &ft_mdev, ib_port); ++ &ft_mdev, ib_port, ++ &esw_owner_vhca_id); + if (ret) + return ERR_PTR(ret); + +@@ -2033,6 +2040,7 @@ _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, + ft_attr.max_fte = max_table_size; + ft_attr.flags = flags; + ft_attr.vport = vport; ++ ft_attr.esw_owner_vhca_id = esw_owner_vhca_id; + ft_attr.autogroup.max_num_groups = MLX5_FS_MAX_TYPES; + return _get_prio(ns, prio, &ft_attr); + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1669-ib-mlx5-reduce-imr-ksm-size-when-5-level-paging-is-enabled.patch b/SOURCES/1669-ib-mlx5-reduce-imr-ksm-size-when-5-level-paging-is-enabled.patch new file mode 100644 index 000000000..276393df5 --- /dev/null +++ b/SOURCES/1669-ib-mlx5-reduce-imr-ksm-size-when-5-level-paging-is-enabled.patch @@ -0,0 +1,277 @@ +From 7ab8fa2cec647de00118c81fbd7186f05dcb2302 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Mon, 20 Apr 2026 22:57:24 -0400 +Subject: [PATCH] IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 6dbd547adad534c0daad13ca9e1f862278ca955b +Author: Yishai Hadas +Date: Thu Nov 20 16:49:28 2025 +0200 + + IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled + + Enabling 5-level paging (LA57) increases TASK_SIZE on x86_64 from 2^47 + to 2^56. This affects implicit ODP, which uses TASK_SIZE to calculate + the number of IMR KSM entries. + + As a result, the number of entries and the memory usage for KSM mkeys + increase drastically: + + - With 2^47 TASK_SIZE: 0x20000 entries (~2MB) + - With 2^56 TASK_SIZE: 0x4000000 entries (~1GB) + + This issue could happen previously on systems with LA57 manually + enabled, but now commit 7212b58d6d71 ("x86/mm/64: Make 5-level paging + support unconditional") enables LA57 by default on all supported + systems. This makes the issue impact widespread. + + To mitigate this, increase the size each MTT entry maps from 1GB to 16GB + when 5-level paging is enabled. This reduces the number of KSM entries + and lowers the memory usage on LA57 systems from 1GB to 64MB per IMR. + + As now 'mlx5_imr_mtt_size' is larger than 32 bits, we move to use u64 + instead of int as part of populate_klm() to prevent overflow of the + 'step' variable. + + In addition, as populate_klm() actually handles KSM and not KLM, as it's + used only by implicit ODP, we renamed its signature and the internal + structures accordingly while dropping the byte_count handling which is + not relevant in KSM. The page size in KSM is fixed for all the entries + and come from the log_page_size of the mkey. + + Note: + On platforms where the calculated value for 'mlx5_imr_ksm_page_shift' is + higher than the max firmware cap to be changed over UMR, or that the + calculated value for 'log_va_pages' is higher than what we may expect, + the implicit ODP cap will be simply turned off. + + Co-developed-by: Or Har-Toov + Signed-off-by: Or Har-Toov + Signed-off-by: Yishai Hadas + Reviewed-by: Michael Guralnik + Signed-off-by: Edward Srouji + Link: https://patch.msgid.link/20251120-reduce-ksm-v1-1-6864bfc814dc@kernel.org + Signed-off-by: Leon Romanovsky + +Signed-off-by: Kamal Heib + +diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c +index 6441abdf1f3b..e71ee3d52eb0 100644 +--- a/drivers/infiniband/hw/mlx5/odp.c ++++ b/drivers/infiniband/hw/mlx5/odp.c +@@ -97,33 +97,28 @@ struct mlx5_pagefault { + * a pagefault. */ + #define MMU_NOTIFIER_TIMEOUT 1000 + +-#define MLX5_IMR_MTT_BITS (30 - PAGE_SHIFT) +-#define MLX5_IMR_MTT_SHIFT (MLX5_IMR_MTT_BITS + PAGE_SHIFT) +-#define MLX5_IMR_MTT_ENTRIES BIT_ULL(MLX5_IMR_MTT_BITS) +-#define MLX5_IMR_MTT_SIZE BIT_ULL(MLX5_IMR_MTT_SHIFT) +-#define MLX5_IMR_MTT_MASK (~(MLX5_IMR_MTT_SIZE - 1)) +- +-#define MLX5_KSM_PAGE_SHIFT MLX5_IMR_MTT_SHIFT +- + static u64 mlx5_imr_ksm_entries; ++static u64 mlx5_imr_mtt_entries; ++static u64 mlx5_imr_mtt_size; ++static u8 mlx5_imr_mtt_shift; ++static u8 mlx5_imr_ksm_page_shift; + +-static void populate_klm(struct mlx5_klm *pklm, size_t idx, size_t nentries, ++static void populate_ksm(struct mlx5_ksm *pksm, size_t idx, size_t nentries, + struct mlx5_ib_mr *imr, int flags) + { + struct mlx5_core_dev *dev = mr_to_mdev(imr)->mdev; +- struct mlx5_klm *end = pklm + nentries; +- int step = MLX5_CAP_ODP(dev, mem_page_fault) ? MLX5_IMR_MTT_SIZE : 0; ++ struct mlx5_ksm *end = pksm + nentries; ++ u64 step = MLX5_CAP_ODP(dev, mem_page_fault) ? mlx5_imr_mtt_size : 0; + __be32 key = MLX5_CAP_ODP(dev, mem_page_fault) ? + cpu_to_be32(imr->null_mmkey.key) : + mr_to_mdev(imr)->mkeys.null_mkey; + u64 va = +- MLX5_CAP_ODP(dev, mem_page_fault) ? idx * MLX5_IMR_MTT_SIZE : 0; ++ MLX5_CAP_ODP(dev, mem_page_fault) ? idx * mlx5_imr_mtt_size : 0; + + if (flags & MLX5_IB_UPD_XLT_ZAP) { +- for (; pklm != end; pklm++, idx++, va += step) { +- pklm->bcount = cpu_to_be32(MLX5_IMR_MTT_SIZE); +- pklm->key = key; +- pklm->va = cpu_to_be64(va); ++ for (; pksm != end; pksm++, idx++, va += step) { ++ pksm->key = key; ++ pksm->va = cpu_to_be64(va); + } + return; + } +@@ -147,16 +142,15 @@ static void populate_klm(struct mlx5_klm *pklm, size_t idx, size_t nentries, + */ + lockdep_assert_held(&to_ib_umem_odp(imr->umem)->umem_mutex); + +- for (; pklm != end; pklm++, idx++, va += step) { ++ for (; pksm != end; pksm++, idx++, va += step) { + struct mlx5_ib_mr *mtt = xa_load(&imr->implicit_children, idx); + +- pklm->bcount = cpu_to_be32(MLX5_IMR_MTT_SIZE); + if (mtt) { +- pklm->key = cpu_to_be32(mtt->ibmr.lkey); +- pklm->va = cpu_to_be64(idx * MLX5_IMR_MTT_SIZE); ++ pksm->key = cpu_to_be32(mtt->ibmr.lkey); ++ pksm->va = cpu_to_be64(idx * mlx5_imr_mtt_size); + } else { +- pklm->key = key; +- pklm->va = cpu_to_be64(va); ++ pksm->key = key; ++ pksm->va = cpu_to_be64(va); + } + } + } +@@ -201,7 +195,7 @@ int mlx5_odp_populate_xlt(void *xlt, size_t idx, size_t nentries, + struct mlx5_ib_mr *mr, int flags) + { + if (flags & MLX5_IB_UPD_XLT_INDIRECT) { +- populate_klm(xlt, idx, nentries, mr, flags); ++ populate_ksm(xlt, idx, nentries, mr, flags); + return 0; + } else { + return populate_mtt(xlt, idx, nentries, mr, flags); +@@ -226,7 +220,7 @@ static void free_implicit_child_mr_work(struct work_struct *work) + + mutex_lock(&odp_imr->umem_mutex); + mlx5r_umr_update_xlt(mr->parent, +- ib_umem_start(odp) >> MLX5_IMR_MTT_SHIFT, 1, 0, ++ ib_umem_start(odp) >> mlx5_imr_mtt_shift, 1, 0, + MLX5_IB_UPD_XLT_INDIRECT | MLX5_IB_UPD_XLT_ATOMIC); + mutex_unlock(&odp_imr->umem_mutex); + mlx5_ib_dereg_mr(&mr->ibmr, NULL); +@@ -237,7 +231,7 @@ static void free_implicit_child_mr_work(struct work_struct *work) + static void destroy_unused_implicit_child_mr(struct mlx5_ib_mr *mr) + { + struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem); +- unsigned long idx = ib_umem_start(odp) >> MLX5_IMR_MTT_SHIFT; ++ unsigned long idx = ib_umem_start(odp) >> mlx5_imr_mtt_shift; + struct mlx5_ib_mr *imr = mr->parent; + + /* +@@ -425,7 +419,10 @@ static void internal_fill_odp_caps(struct mlx5_ib_dev *dev) + if (MLX5_CAP_GEN(dev->mdev, fixed_buffer_size) && + MLX5_CAP_GEN(dev->mdev, null_mkey) && + MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset) && +- !MLX5_CAP_GEN(dev->mdev, umr_indirect_mkey_disabled)) ++ !MLX5_CAP_GEN(dev->mdev, umr_indirect_mkey_disabled) && ++ mlx5_imr_ksm_entries != 0 && ++ !(mlx5_imr_ksm_page_shift > ++ get_max_log_entity_size_cap(dev, MLX5_MKC_ACCESS_MODE_KSM))) + caps->general_caps |= IB_ODP_SUPPORT_IMPLICIT; + } + +@@ -476,14 +473,14 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr, + int err; + + odp = ib_umem_odp_alloc_child(to_ib_umem_odp(imr->umem), +- idx * MLX5_IMR_MTT_SIZE, +- MLX5_IMR_MTT_SIZE, &mlx5_mn_ops); ++ idx * mlx5_imr_mtt_size, ++ mlx5_imr_mtt_size, &mlx5_mn_ops); + if (IS_ERR(odp)) + return ERR_CAST(odp); + + mr = mlx5_mr_cache_alloc(dev, imr->access_flags, + MLX5_MKC_ACCESS_MODE_MTT, +- MLX5_IMR_MTT_ENTRIES); ++ mlx5_imr_mtt_entries); + if (IS_ERR(mr)) { + ib_umem_odp_release(odp); + return mr; +@@ -495,7 +492,7 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr, + mr->umem = &odp->umem; + mr->ibmr.lkey = mr->mmkey.key; + mr->ibmr.rkey = mr->mmkey.key; +- mr->ibmr.iova = idx * MLX5_IMR_MTT_SIZE; ++ mr->ibmr.iova = idx * mlx5_imr_mtt_size; + mr->parent = imr; + odp->private = mr; + +@@ -506,7 +503,7 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr, + refcount_set(&mr->mmkey.usecount, 2); + + err = mlx5r_umr_update_xlt(mr, 0, +- MLX5_IMR_MTT_ENTRIES, ++ mlx5_imr_mtt_entries, + PAGE_SHIFT, + MLX5_IB_UPD_XLT_ZAP | + MLX5_IB_UPD_XLT_ENABLE); +@@ -611,7 +608,7 @@ struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd, + struct mlx5_ib_mr *imr; + int err; + +- if (!mlx5r_umr_can_load_pas(dev, MLX5_IMR_MTT_ENTRIES * PAGE_SIZE)) ++ if (!mlx5r_umr_can_load_pas(dev, mlx5_imr_mtt_entries * PAGE_SIZE)) + return ERR_PTR(-EOPNOTSUPP); + + umem_odp = ib_umem_odp_alloc_implicit(&dev->ib_dev, access_flags); +@@ -647,7 +644,7 @@ struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd, + + err = mlx5r_umr_update_xlt(imr, 0, + mlx5_imr_ksm_entries, +- MLX5_KSM_PAGE_SHIFT, ++ mlx5_imr_ksm_page_shift, + MLX5_IB_UPD_XLT_INDIRECT | + MLX5_IB_UPD_XLT_ZAP | + MLX5_IB_UPD_XLT_ENABLE); +@@ -750,20 +747,20 @@ static int pagefault_implicit_mr(struct mlx5_ib_mr *imr, + struct ib_umem_odp *odp_imr, u64 user_va, + size_t bcnt, u32 *bytes_mapped, u32 flags) + { +- unsigned long end_idx = (user_va + bcnt - 1) >> MLX5_IMR_MTT_SHIFT; ++ unsigned long end_idx = (user_va + bcnt - 1) >> mlx5_imr_mtt_shift; + unsigned long upd_start_idx = end_idx + 1; + unsigned long upd_len = 0; + unsigned long npages = 0; + int err; + int ret; + +- if (unlikely(user_va >= mlx5_imr_ksm_entries * MLX5_IMR_MTT_SIZE || +- mlx5_imr_ksm_entries * MLX5_IMR_MTT_SIZE - user_va < bcnt)) ++ if (unlikely(user_va >= mlx5_imr_ksm_entries * mlx5_imr_mtt_size || ++ mlx5_imr_ksm_entries * mlx5_imr_mtt_size - user_va < bcnt)) + return -EFAULT; + + /* Fault each child mr that intersects with our interval. */ + while (bcnt) { +- unsigned long idx = user_va >> MLX5_IMR_MTT_SHIFT; ++ unsigned long idx = user_va >> mlx5_imr_mtt_shift; + struct ib_umem_odp *umem_odp; + struct mlx5_ib_mr *mtt; + u64 len; +@@ -1924,9 +1921,25 @@ void mlx5_ib_odp_cleanup_one(struct mlx5_ib_dev *dev) + + int mlx5_ib_odp_init(void) + { ++ u32 log_va_pages = ilog2(TASK_SIZE) - PAGE_SHIFT; ++ u8 mlx5_imr_mtt_bits; ++ ++ /* 48 is default ARM64 VA space and covers X86 4-level paging which is 47 */ ++ if (log_va_pages <= 48 - PAGE_SHIFT) ++ mlx5_imr_mtt_shift = 30; ++ /* 56 is x86-64, 5-level paging */ ++ else if (log_va_pages <= 56 - PAGE_SHIFT) ++ mlx5_imr_mtt_shift = 34; ++ else ++ return 0; ++ ++ mlx5_imr_mtt_size = BIT_ULL(mlx5_imr_mtt_shift); ++ mlx5_imr_mtt_bits = mlx5_imr_mtt_shift - PAGE_SHIFT; ++ mlx5_imr_mtt_entries = BIT_ULL(mlx5_imr_mtt_bits); + mlx5_imr_ksm_entries = BIT_ULL(get_order(TASK_SIZE) - +- MLX5_IMR_MTT_BITS); ++ mlx5_imr_mtt_bits); + ++ mlx5_imr_ksm_page_shift = mlx5_imr_mtt_shift; + return 0; + } + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1670-net-mlx5e-shampo-fix-header-mapping-for-64k-pages.patch b/SOURCES/1670-net-mlx5e-shampo-fix-header-mapping-for-64k-pages.patch new file mode 100644 index 000000000..df750c228 --- /dev/null +++ b/SOURCES/1670-net-mlx5e-shampo-fix-header-mapping-for-64k-pages.patch @@ -0,0 +1,126 @@ +From 27d7412e723700805252d3c166530d596311fefe Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Tue, 21 Apr 2026 14:55:14 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO, Fix header mapping for 64K pages + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Context diff due to the missing of the following commit: +d1668f119943 ("net/mlx5e: Convert over to netmem") + +commit 665a7e13c220bbde55531a24bd5524320648df10 +Author: Dragos Tatulea +Date: Tue Nov 4 08:48:33 2025 +0200 + + net/mlx5e: SHAMPO, Fix header mapping for 64K pages + + HW-GRO is broken on mlx5 for 64K page sizes. The patch in the fixes tag + didn't take into account larger page sizes when doing an align down + of max_ksm_entries. For 64K page size, max_ksm_entries is 0 which will skip + mapping header pages via WQE UMR. This breaks header-data split + and will result in the following syndrome: + + mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x4c9, ci 0x0, qn 0x1133, opcode 0xe, syndrome 0x4, vendor syndrome 0x32 + 00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32 + 00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00 + 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a + 00000030: 00 00 3b c7 93 01 32 04 00 00 00 00 00 00 bf e0 + mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1133 + + Furthermore, the function that fills in WQE UMRs for the headers + (mlx5e_build_shampo_hd_umr()) only supports mapping page sizes that + fit in a single UMR WQE. + + This patch goes back to the old non-aligned max_ksm_entries value and it + changes mlx5e_build_shampo_hd_umr() to support mapping a large page over + multiple UMR WQEs. + + This means that mlx5e_build_shampo_hd_umr() can now leave a page only + partially mapped. The caller, mlx5e_alloc_rx_hd_mpwqe(), ensures that + there are enough UMR WQEs to cover complete pages by working on + ksm_entries that are multiples of MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. + + Fixes: 8a0ee54027b1 ("net/mlx5e: SHAMPO, Simplify UMR allocation for headers") + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1762238915-1027590-2-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 0afdc68896c3..6b3393895792 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -670,7 +670,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + u16 pi, header_offset, err, wqe_bbs; + u32 lkey = rq->mdev->mlx5e_res.hw_objs.mkey; + struct mlx5e_umr_wqe *umr_wqe; +- int headroom, i = 0; ++ int headroom, i; + + headroom = rq->buff.headroom; + wqe_bbs = MLX5E_KSM_UMR_WQEBBS(ksm_entries); +@@ -678,25 +678,24 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + umr_wqe = mlx5_wq_cyc_get_wqe(&sq->wq, pi); + build_ksm_umr(sq, umr_wqe, shampo->mkey_be, index, ksm_entries); + +- WARN_ON_ONCE(ksm_entries & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)); +- while (i < ksm_entries) { +- struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, index); ++ for (i = 0; i < ksm_entries; i++, index++) { ++ struct mlx5e_frag_page *frag_page; + u64 addr; + +- err = mlx5e_page_alloc_fragmented(rq->hd_page_pool, frag_page); +- if (unlikely(err)) +- goto err_unmap; ++ frag_page = mlx5e_shampo_hd_to_frag_page(rq, index); ++ header_offset = mlx5e_shampo_hd_offset(index); ++ if (!header_offset) { ++ err = mlx5e_page_alloc_fragmented(rq->hd_page_pool, ++ frag_page); ++ if (err) ++ goto err_unmap; ++ } + + addr = page_pool_get_dma_addr(frag_page->page); +- +- for (int j = 0; j < MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; j++) { +- header_offset = mlx5e_shampo_hd_offset(index++); +- +- umr_wqe->inline_ksms[i++] = (struct mlx5_ksm) { +- .key = cpu_to_be32(lkey), +- .va = cpu_to_be64(addr + header_offset + headroom), +- }; +- } ++ umr_wqe->inline_ksms[i] = (struct mlx5_ksm) { ++ .key = cpu_to_be32(lkey), ++ .va = cpu_to_be64(addr + header_offset + headroom), ++ }; + } + + sq->db.wqe_info[pi] = (struct mlx5e_icosq_wqe_info) { +@@ -712,7 +711,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + return 0; + + err_unmap: +- while (--i) { ++ while (--i >= 0) { + --index; + header_offset = mlx5e_shampo_hd_offset(index); + if (!header_offset) { +@@ -734,8 +733,7 @@ static int mlx5e_alloc_rx_hd_mpwqe(struct mlx5e_rq *rq) + struct mlx5e_icosq *sq = rq->icosq; + int i, err, max_ksm_entries, len; + +- max_ksm_entries = ALIGN_DOWN(MLX5E_MAX_KSM_PER_WQE(rq->mdev), +- MLX5E_SHAMPO_WQ_HEADER_PER_PAGE); ++ max_ksm_entries = MLX5E_MAX_KSM_PER_WQE(rq->mdev); + ksm_entries = bitmap_find_window(shampo->bitmap, + shampo->hd_per_wqe, + shampo->hd_per_wq, shampo->pi); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1671-net-mlx5e-shampo-fix-skb-size-check-for-64k-pages.patch b/SOURCES/1671-net-mlx5e-shampo-fix-skb-size-check-for-64k-pages.patch new file mode 100644 index 000000000..62874a1a4 --- /dev/null +++ b/SOURCES/1671-net-mlx5e-shampo-fix-skb-size-check-for-64k-pages.patch @@ -0,0 +1,58 @@ +From c7a4dd865f4a385b4b6d96b210bcb651876f7681 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Tue, 21 Apr 2026 14:56:09 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO, Fix skb size check for 64K pages + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit bacd8d80181ebe34b599a39aa26bf73a44c91e55 +Author: Dragos Tatulea +Date: Tue Nov 4 08:48:34 2025 +0200 + + net/mlx5e: SHAMPO, Fix skb size check for 64K pages + + mlx5e_hw_gro_skb_has_enough_space() uses a formula to check if there is + enough space in the skb frags to store more data. This formula is + incorrect for 64K page sizes and it triggers early GRO session + termination because the first fragment will blow up beyond + GRO_LEGACY_MAX_SIZE. + + This patch adds a special case for page sizes >= GRO_LEGACY_MAX_SIZE + (64K) which uses the skb->len instead. Within this context, + the check is safe from fragment overflow because the hardware + will continuously fill the data up to the reservation size of 64K + and the driver will coalesce all data from the same page to the same + fragment. This means that the data will span one fragment or at most + two for such a large page size. + + It is expected that the if statement will be optimized out as the + check is done with constants. + + Fixes: 92552d3abd32 ("net/mlx5e: HW_GRO cqe handler implementation") + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1762238915-1027590-3-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 6b3393895792..cbb6596a65c0 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -2322,7 +2322,10 @@ mlx5e_hw_gro_skb_has_enough_space(struct sk_buff *skb, u16 data_bcnt) + { + int nr_frags = skb_shinfo(skb)->nr_frags; + +- return PAGE_SIZE * nr_frags + data_bcnt <= GRO_LEGACY_MAX_SIZE; ++ if (PAGE_SIZE >= GRO_LEGACY_MAX_SIZE) ++ return skb->len + data_bcnt <= GRO_LEGACY_MAX_SIZE; ++ else ++ return PAGE_SIZE * nr_frags + data_bcnt <= GRO_LEGACY_MAX_SIZE; + } + + static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1672-net-mlx5e-shampo-fix-header-formulas-for-higher-mtus-and-64k.patch b/SOURCES/1672-net-mlx5e-shampo-fix-header-formulas-for-higher-mtus-and-64k.patch new file mode 100644 index 000000000..fde4db191 --- /dev/null +++ b/SOURCES/1672-net-mlx5e-shampo-fix-header-formulas-for-higher-mtus-and-64k.patch @@ -0,0 +1,212 @@ +From 29457aaef82f67cb25ec966a91a6ccf70934a4b7 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Tue, 21 Apr 2026 15:08:34 -0400 +Subject: [PATCH] net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and + 64K pages + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Adjust this change due to the missing of the following commit: +d1668f119943 ("net/mlx5e: Convert over to netmem") + +commit d8a7ed9586c7579a99e9e2d90988c9eceeee61ff +Author: Dragos Tatulea +Date: Tue Nov 4 08:48:35 2025 +0200 + + net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages + + The MLX5E_SHAMPO_WQ_HEADER_PER_PAGE and + MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE macros are used directly in + several places under the assumption that there will always be more + headers per WQE than headers per page. However, this assumption doesn't + hold for 64K page sizes and higher MTUs (> 4K). This can be first + observed during header page allocation: ksm_entries will become 0 during + alignment to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. + + This patch introduces 2 additional members to the mlx5e_shampo_hd struct + which are meant to be used instead of the macrose mentioned above. + When the number of headers per WQE goes below + MLX5E_SHAMPO_WQ_HEADER_PER_PAGE, clamp the number of headers per + page and expand the header size accordingly so that the headers + for one WQE cover a full page. + + All the formulas are adapted to use these two new members. + + Fixes: 945ca432bfd0 ("net/mlx5e: SHAMPO, Drop info array") + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Simon Horman + Link: https://patch.msgid.link/1762238915-1027590-4-git-send-email-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h +index 32224bd1a0e7..041c986f7065 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h +@@ -634,7 +634,10 @@ struct mlx5e_dma_info { + struct mlx5e_shampo_hd { + struct mlx5e_frag_page *pages; + u32 hd_per_wq; ++ u32 hd_per_page; + u16 hd_per_wqe; ++ u8 log_hd_per_page; ++ u8 log_hd_entry_size; + unsigned long *bitmap; + u16 pi; + u16 ci; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index af067ff1ebcf..47416e54379a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -795,8 +795,9 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + int node) + { + void *wqc = MLX5_ADDR_OF(rqc, rqp->rqc, wq); ++ u8 log_hd_per_page, log_hd_entry_size; ++ u16 hd_per_wq, hd_per_wqe; + u32 hd_pool_size; +- u16 hd_per_wq; + int wq_size; + int err; + +@@ -819,11 +820,24 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev, + if (err) + goto err_umr_mkey; + +- rq->mpwqe.shampo->hd_per_wqe = +- mlx5e_shampo_hd_per_wqe(mdev, params, rqp); ++ hd_per_wqe = mlx5e_shampo_hd_per_wqe(mdev, params, rqp); + wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz)); +- hd_pool_size = (rq->mpwqe.shampo->hd_per_wqe * wq_size) / +- MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; ++ ++ BUILD_BUG_ON(MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE > PAGE_SHIFT); ++ if (hd_per_wqe >= MLX5E_SHAMPO_WQ_HEADER_PER_PAGE) { ++ log_hd_per_page = MLX5E_SHAMPO_LOG_WQ_HEADER_PER_PAGE; ++ log_hd_entry_size = MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE; ++ } else { ++ log_hd_per_page = order_base_2(hd_per_wqe); ++ log_hd_entry_size = order_base_2(PAGE_SIZE / hd_per_wqe); ++ } ++ ++ rq->mpwqe.shampo->hd_per_wqe = hd_per_wqe; ++ rq->mpwqe.shampo->hd_per_page = BIT(log_hd_per_page); ++ rq->mpwqe.shampo->log_hd_per_page = log_hd_per_page; ++ rq->mpwqe.shampo->log_hd_entry_size = log_hd_entry_size; ++ ++ hd_pool_size = (hd_per_wqe * wq_size) >> log_hd_per_page; + + if (mlx5_rq_needs_separate_hd_pool(rq)) { + /* Separate page pool for shampo headers */ +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index cbb6596a65c0..8b51369500f9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -647,17 +647,20 @@ static void build_ksm_umr(struct mlx5e_icosq *sq, struct mlx5e_umr_wqe *umr_wqe, + umr_wqe->hdr.uctrl.mkey_mask = cpu_to_be64(MLX5_MKEY_MASK_FREE); + } + +-static struct mlx5e_frag_page *mlx5e_shampo_hd_to_frag_page(struct mlx5e_rq *rq, int header_index) ++static struct mlx5e_frag_page *mlx5e_shampo_hd_to_frag_page(struct mlx5e_rq *rq, ++ int header_index) + { +- BUILD_BUG_ON(MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE > PAGE_SHIFT); ++ struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; + +- return &rq->mpwqe.shampo->pages[header_index >> MLX5E_SHAMPO_LOG_WQ_HEADER_PER_PAGE]; ++ return &shampo->pages[header_index >> shampo->log_hd_per_page]; + } + +-static u64 mlx5e_shampo_hd_offset(int header_index) ++static u64 mlx5e_shampo_hd_offset(struct mlx5e_rq *rq, int header_index) + { +- return (header_index & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) << +- MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE; ++ struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; ++ u32 hd_per_page = shampo->hd_per_page; ++ ++ return (header_index & (hd_per_page - 1)) << shampo->log_hd_entry_size; + } + + static void mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index); +@@ -683,7 +686,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + u64 addr; + + frag_page = mlx5e_shampo_hd_to_frag_page(rq, index); +- header_offset = mlx5e_shampo_hd_offset(index); ++ header_offset = mlx5e_shampo_hd_offset(rq, index); + if (!header_offset) { + err = mlx5e_page_alloc_fragmented(rq->hd_page_pool, + frag_page); +@@ -713,7 +716,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq, + err_unmap: + while (--i >= 0) { + --index; +- header_offset = mlx5e_shampo_hd_offset(index); ++ header_offset = mlx5e_shampo_hd_offset(rq, index); + if (!header_offset) { + struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, index); + +@@ -737,7 +740,7 @@ static int mlx5e_alloc_rx_hd_mpwqe(struct mlx5e_rq *rq) + ksm_entries = bitmap_find_window(shampo->bitmap, + shampo->hd_per_wqe, + shampo->hd_per_wq, shampo->pi); +- ksm_entries = ALIGN_DOWN(ksm_entries, MLX5E_SHAMPO_WQ_HEADER_PER_PAGE); ++ ksm_entries = ALIGN_DOWN(ksm_entries, shampo->hd_per_page); + if (!ksm_entries) + return 0; + +@@ -854,7 +857,7 @@ mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index) + { + struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; + +- if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) { ++ if (((header_index + 1) & (shampo->hd_per_page - 1)) == 0) { + struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index); + + mlx5e_page_release_fragmented(rq->hd_page_pool, frag_page); +@@ -1217,9 +1220,10 @@ static unsigned int mlx5e_lro_update_hdr(struct sk_buff *skb, + static void *mlx5e_shampo_get_packet_hd(struct mlx5e_rq *rq, u16 header_index) + { + struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index); +- u16 head_offset = mlx5e_shampo_hd_offset(header_index) + rq->buff.headroom; ++ u16 head_offset = mlx5e_shampo_hd_offset(rq, header_index); ++ void *addr = page_address(frag_page->page); + +- return page_address(frag_page->page) + head_offset; ++ return addr + head_offset + rq->buff.headroom; + } + + static void mlx5e_shampo_update_ipv4_udp_hdr(struct mlx5e_rq *rq, struct iphdr *ipv4) +@@ -2236,20 +2240,22 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi, + struct mlx5_cqe64 *cqe, u16 header_index) + { + struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index); +- dma_addr_t page_dma_addr = page_pool_get_dma_addr(frag_page->page); +- u16 head_offset = mlx5e_shampo_hd_offset(header_index); +- dma_addr_t dma_addr = page_dma_addr + head_offset; ++ u16 head_offset = mlx5e_shampo_hd_offset(rq, header_index); ++ struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo; + u16 head_size = cqe->shampo.header_size; + u16 rx_headroom = rq->buff.headroom; + struct sk_buff *skb = NULL; + void *hdr, *data; + u32 frag_size; + ++ dma_addr_t page_dma_addr = page_pool_get_dma_addr(frag_page->page); ++ dma_addr_t dma_addr = page_dma_addr + head_offset; + hdr = page_address(frag_page->page) + head_offset; ++ + data = hdr + rx_headroom; + frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + head_size); + +- if (likely(frag_size <= BIT(MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE))) { ++ if (likely(frag_size <= BIT(shampo->log_hd_entry_size))) { + /* build SKB around header */ + dma_sync_single_range_for_cpu(rq->pdev, dma_addr, 0, frag_size, rq->buff.map_dir); + net_prefetchw(hdr); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1673-net-mlx5-qos-restrict-rtnl-area-to-avoid-a-lock-cycle.patch b/SOURCES/1673-net-mlx5-qos-restrict-rtnl-area-to-avoid-a-lock-cycle.patch new file mode 100644 index 000000000..bbb820ffb --- /dev/null +++ b/SOURCES/1673-net-mlx5-qos-restrict-rtnl-area-to-avoid-a-lock-cycle.patch @@ -0,0 +1,114 @@ +From 6713dff9bbe1695af06964814ed721f140ed1d89 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Wed, 22 Apr 2026 09:44:51 -0400 +Subject: [PATCH] net/mlx5: qos: Restrict RTNL area to avoid a lock cycle + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit b7e3a5d9c0d66b7fb44f63aef3bd734821afa0c8 +Author: Cosmin Ratiu +Date: Mon Mar 16 11:46:01 2026 +0200 + + net/mlx5: qos: Restrict RTNL area to avoid a lock cycle + + A lock dependency cycle exists where: + 1. mlx5_ib_roce_init -> mlx5_core_uplink_netdev_event_replay -> + mlx5_blocking_notifier_call_chain (takes notifier_rwsem) -> + mlx5e_mdev_notifier_event -> mlx5_netdev_notifier_register -> + register_netdevice_notifier_dev_net (takes rtnl) + => notifier_rwsem -> rtnl + + 2. mlx5e_probe -> _mlx5e_probe -> + mlx5_core_uplink_netdev_set (takes uplink_netdev_lock) -> + mlx5_blocking_notifier_call_chain (takes notifier_rwsem) + => uplink_netdev_lock -> notifier_rwsem + + 3: devlink_nl_rate_set_doit -> devlink_nl_rate_set -> + mlx5_esw_devlink_rate_leaf_tx_max_set -> esw_qos_devlink_rate_to_mbps -> + mlx5_esw_qos_max_link_speed_get (takes rtnl) -> + mlx5_esw_qos_lag_link_speed_get_locked -> + mlx5_uplink_netdev_get (takes uplink_netdev_lock) + => rtnl -> uplink_netdev_lock + => BOOM! (lock cycle) + + Fix that by restricting the rtnl-protected section to just the necessary + part, the call to netdev_master_upper_dev_get and speed querying, so + that the last lock dependency is avoided and the cycle doesn't close. + This is safe because mlx5_uplink_netdev_get uses netdev_hold to keep the + uplink netdev alive while its master device is queried. + + Use this opportunity to rename the ambiguously-named "hold_rtnl_lock" + argument to "take_rtnl" and remove the "_locked" suffix from + mlx5_esw_qos_lag_link_speed_get_locked. + + Fixes: 6b4be64fd9fe ("net/mlx5e: Harden uplink netdev access against device unbind") + Signed-off-by: Cosmin Ratiu + Reviewed-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/20260316094603.6999-2-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +index 4278bcb04c72..2e11574b3a81 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +@@ -1490,24 +1490,24 @@ static int esw_qos_node_enable_tc_arbitration(struct mlx5_esw_sched_node *node, + return err; + } + +-static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *mdev) ++static u32 mlx5_esw_qos_lag_link_speed_get(struct mlx5_core_dev *mdev, ++ bool take_rtnl) + { + struct ethtool_link_ksettings lksettings; + struct net_device *slave, *master; + u32 speed = SPEED_UNKNOWN; + +- /* Lock ensures a stable reference to master and slave netdevice +- * while port speed of master is queried. +- */ +- ASSERT_RTNL(); +- + slave = mlx5_uplink_netdev_get(mdev); + if (!slave) + goto out; + ++ if (take_rtnl) ++ rtnl_lock(); + master = netdev_master_upper_dev_get(slave); + if (master && !__ethtool_get_link_ksettings(master, &lksettings)) + speed = lksettings.base.speed; ++ if (take_rtnl) ++ rtnl_unlock(); + + out: + mlx5_uplink_netdev_put(mdev, slave); +@@ -1515,20 +1515,15 @@ static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *mdev) + } + + static int mlx5_esw_qos_max_link_speed_get(struct mlx5_core_dev *mdev, u32 *link_speed_max, +- bool hold_rtnl_lock, struct netlink_ext_ack *extack) ++ bool take_rtnl, ++ struct netlink_ext_ack *extack) + { + int err; + + if (!mlx5_lag_is_active(mdev)) + goto skip_lag; + +- if (hold_rtnl_lock) +- rtnl_lock(); +- +- *link_speed_max = mlx5_esw_qos_lag_link_speed_get_locked(mdev); +- +- if (hold_rtnl_lock) +- rtnl_unlock(); ++ *link_speed_max = mlx5_esw_qos_lag_link_speed_get(mdev, take_rtnl); + + if (*link_speed_max != (u32)SPEED_UNKNOWN) + return 0; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1674-net-mlx5-fix-peer-miss-rules-host-disabled-checks.patch b/SOURCES/1674-net-mlx5-fix-peer-miss-rules-host-disabled-checks.patch new file mode 100644 index 000000000..3e65011e2 --- /dev/null +++ b/SOURCES/1674-net-mlx5-fix-peer-miss-rules-host-disabled-checks.patch @@ -0,0 +1,79 @@ +From 5877220f3a935e925a70599a157196b5264e8c36 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Wed, 22 Apr 2026 09:44:58 -0400 +Subject: [PATCH] net/mlx5: Fix peer miss rules host disabled checks + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 76324e4041c0efb4808702b05426d7a0a7d8df5b +Author: Carolina Jubran +Date: Thu Mar 5 16:26:31 2026 +0200 + + net/mlx5: Fix peer miss rules host disabled checks + + The check on mlx5_esw_host_functions_enabled(esw->dev) for adding VF + peer miss rules is incorrect. These rules match traffic from peer's VFs, + so the local device's host function status is irrelevant. Remove this + check to ensure peer VF traffic is properly handled regardless of local + host configuration. + + Also fix the PF peer miss rule deletion to be symmetric with the add + path, so only attempt to delete the rule if it was actually created. + + Fixes: 520369ef43a8 ("net/mlx5: Support disabling host PFs") + Signed-off-by: Carolina Jubran + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/20260305142634.1813208-3-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +index 30bf164a067f..3af2a51ace6a 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +@@ -1241,21 +1241,17 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + flows[peer_vport->index] = flow; + } + +- if (mlx5_esw_host_functions_enabled(esw->dev)) { +- mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, +- mlx5_core_max_vfs(peer_dev)) { +- esw_set_peer_miss_rule_source_port(esw, peer_esw, +- spec, +- peer_vport->vport); +- +- flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), +- spec, &flow_act, &dest, 1); +- if (IS_ERR(flow)) { +- err = PTR_ERR(flow); +- goto add_vf_flow_err; +- } +- flows[peer_vport->index] = flow; ++ mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, ++ mlx5_core_max_vfs(peer_dev)) { ++ esw_set_peer_miss_rule_source_port(esw, peer_esw, spec, ++ peer_vport->vport); ++ flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), ++ spec, &flow_act, &dest, 1); ++ if (IS_ERR(flow)) { ++ err = PTR_ERR(flow); ++ goto add_vf_flow_err; + } ++ flows[peer_vport->index] = flow; + } + + if (mlx5_core_ec_sriov_enabled(peer_dev)) { +@@ -1347,7 +1343,8 @@ static void esw_del_fdb_peer_miss_rules(struct mlx5_eswitch *esw, + mlx5_del_flow_rules(flows[peer_vport->index]); + } + +- if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { ++ if (mlx5_core_is_ecpf_esw_manager(peer_dev) && ++ mlx5_esw_host_functions_enabled(peer_dev)) { + peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); + mlx5_del_flow_rules(flows[peer_vport->index]); + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1675-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-legacy-rq.patch b/SOURCES/1675-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-legacy-rq.patch new file mode 100644 index 000000000..4e9d045fb --- /dev/null +++ b/SOURCES/1675-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-legacy-rq.patch @@ -0,0 +1,132 @@ +From 2dd1bc5b18f3cf2e65f322d22fa558e93418bed0 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Wed, 22 Apr 2026 09:52:13 -0400 +Subject: [PATCH] net/mlx5e: RX, Fix XDP multi-buf frag counting for legacy RQ + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Context diff due to the missing of the following commit: +1827f773e416 ("net: xdp: pass full flags to xdp_update_skb_shared_info()") + +commit a6413e6f6c9d9bb9833324cb3753582f7bc0f2fa +Author: Dragos Tatulea +Date: Thu Mar 5 16:26:34 2026 +0200 + + net/mlx5e: RX, Fix XDP multi-buf frag counting for legacy RQ + + XDP multi-buf programs can modify the layout of the XDP buffer when the + program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The + referenced commit in the fixes tag corrected the assumption in the mlx5 + driver that the XDP buffer layout doesn't change during a program + execution. However, this fix introduced another issue: the dropped + fragments still need to be counted on the driver side to avoid page + fragment reference counting issues. + + Such issue can be observed with the + test_xdp_native_adjst_tail_shrnk_data selftest when using a payload of + 3600 and shrinking by 256 bytes (an upcoming selftest patch): the last + fragment gets released by the XDP code but doesn't get tracked by the + driver. This results in a negative pp_ref_count during page release and + the following splat: + + WARNING: include/net/page_pool/helpers.h:297 at mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core], CPU#12: ip/3137 + Modules linked in: [...] + CPU: 12 UID: 0 PID: 3137 Comm: ip Not tainted 6.19.0-rc3+ #12 NONE + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 + RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core] + [...] + Call Trace: + + mlx5e_dealloc_rx_wqe+0xcb/0x1a0 [mlx5_core] + mlx5e_free_rx_descs+0x7f/0x110 [mlx5_core] + mlx5e_close_rq+0x50/0x60 [mlx5_core] + mlx5e_close_queues+0x36/0x2c0 [mlx5_core] + mlx5e_close_channel+0x1c/0x50 [mlx5_core] + mlx5e_close_channels+0x45/0x80 [mlx5_core] + mlx5e_safe_switch_params+0x1a5/0x230 [mlx5_core] + mlx5e_change_mtu+0xf3/0x2f0 [mlx5_core] + netif_set_mtu_ext+0xf1/0x230 + do_setlink.isra.0+0x219/0x1180 + rtnl_newlink+0x79f/0xb60 + rtnetlink_rcv_msg+0x213/0x3a0 + netlink_rcv_skb+0x48/0xf0 + netlink_unicast+0x24a/0x350 + netlink_sendmsg+0x1ee/0x410 + __sock_sendmsg+0x38/0x60 + ____sys_sendmsg+0x232/0x280 + ___sys_sendmsg+0x78/0xb0 + __sys_sendmsg+0x5f/0xb0 + [...] + do_syscall_64+0x57/0xc50 + + This patch fixes the issue by doing page frag counting on all the + original XDP buffer fragments for all relevant XDP actions (XDP_TX , + XDP_REDIRECT and XDP_PASS). This is basically reverting to the original + counting before the commit in the fixes tag. + + As frag_page is still pointing to the original tail, the nr_frags + parameter to xdp_update_skb_frags_info() needs to be calculated + in a different way to reflect the new nr_frags. + + Fixes: afd5ba577c10 ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ") + Signed-off-by: Dragos Tatulea + Signed-off-by: Tariq Toukan + Reviewed-by: Amery Hung + Link: https://patch.msgid.link/20260305142634.1813208-6-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index 8b51369500f9..ae2bc2275c14 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -1733,6 +1733,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi + struct skb_shared_info *sinfo; + u32 frag_consumed_bytes; + struct bpf_prog *prog; ++ u8 nr_frags_free = 0; + struct sk_buff *skb; + dma_addr_t addr; + u32 truesize; +@@ -1775,15 +1776,13 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi + + prog = rcu_dereference(rq->xdp_prog); + if (prog) { +- u8 nr_frags_free, old_nr_frags = sinfo->nr_frags; ++ u8 old_nr_frags = sinfo->nr_frags; + + if (mlx5e_xdp_handle(rq, prog, mxbuf)) { + if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, + rq->flags)) { + struct mlx5e_wqe_frag_info *pwi; + +- wi -= old_nr_frags - sinfo->nr_frags; +- + for (pwi = head_wi; pwi < wi; pwi++) + pwi->frag_page->frags++; + } +@@ -1791,10 +1790,8 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi + } + + nr_frags_free = old_nr_frags - sinfo->nr_frags; +- if (unlikely(nr_frags_free)) { +- wi -= nr_frags_free; ++ if (unlikely(nr_frags_free)) + truesize -= nr_frags_free * frag_info->frag_stride; +- } + } + + skb = mlx5e_build_linear_skb( +@@ -1810,7 +1807,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi + + if (xdp_buff_has_frags(&mxbuf->xdp)) { + /* sinfo->nr_frags is reset by build_skb, calculate again. */ +- xdp_update_skb_shared_info(skb, wi - head_wi - 1, ++ xdp_update_skb_shared_info(skb, wi - head_wi - nr_frags_free - 1, + sinfo->xdp_frags_size, truesize, + xdp_buff_is_frag_pfmemalloc( + &mxbuf->xdp)); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1676-net-mlx5-fix-crash-when-moving-to-switchdev-mode.patch b/SOURCES/1676-net-mlx5-fix-crash-when-moving-to-switchdev-mode.patch new file mode 100644 index 000000000..7c547cbb8 --- /dev/null +++ b/SOURCES/1676-net-mlx5-fix-crash-when-moving-to-switchdev-mode.patch @@ -0,0 +1,150 @@ +From 0dc330615eafebf537f5880633e3d2d40bd67b54 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Wed, 22 Apr 2026 09:53:11 -0400 +Subject: [PATCH] net/mlx5: Fix crash when moving to switchdev mode + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit 24b2795f9683e092dc22a68f487e7aaaf2ddafea +Author: Patrisious Haddad +Date: Thu Mar 5 16:26:30 2026 +0200 + + net/mlx5: Fix crash when moving to switchdev mode + + When moving to switchdev mode when the device doesn't support IPsec, + we try to clean up the IPsec resources anyway which causes the crash + below, fix that by correctly checking for IPsec support before trying + to clean up its resources. + + [27642.515799] WARNING: arch/x86/mm/fault.c:1276 at + do_user_addr_fault+0x18a/0x680, CPU#4: devlink/6490 + [27642.517159] Modules linked in: xt_conntrack xt_MASQUERADE + ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat xt_addrtype + rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_fwctl nfnetlink + zram zsmalloc mlx5_ib fuse rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi + scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_core + ib_core + [27642.521358] CPU: 4 UID: 0 PID: 6490 Comm: devlink Not tainted + 6.19.0-rc5_for_upstream_min_debug_2026_01_14_16_47 #1 NONE + [27642.522923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS + rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 + [27642.524528] RIP: 0010:do_user_addr_fault+0x18a/0x680 + [27642.525362] Code: ff 0f 84 75 03 00 00 48 89 ee 4c 89 e7 e8 5e b9 22 + 00 49 89 c0 48 85 c0 0f 84 a8 02 00 00 f7 c3 60 80 00 00 74 22 31 c9 eb + ae <0f> 0b 48 83 c4 10 48 89 ea 48 89 de 4c 89 f7 5b 5d 41 5c 41 5d + 41 + [27642.528166] RSP: 0018:ffff88810770f6b8 EFLAGS: 00010046 + [27642.529038] RAX: 0000000000000000 RBX: 0000000000000002 RCX: + ffff88810b980f00 + [27642.530158] RDX: 00000000000000a0 RSI: 0000000000000002 RDI: + ffff88810770f728 + [27642.531270] RBP: 00000000000000a0 R08: 0000000000000000 R09: + 0000000000000000 + [27642.532383] R10: 0000000000000000 R11: 0000000000000000 R12: + ffff888103f3c4c0 + [27642.533499] R13: 0000000000000000 R14: ffff88810770f728 R15: + 0000000000000000 + [27642.534614] FS: 00007f197c741740(0000) GS:ffff88856a94c000(0000) + knlGS:0000000000000000 + [27642.535915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + [27642.536858] CR2: 00000000000000a0 CR3: 000000011334c003 CR4: + 0000000000172eb0 + [27642.537982] Call Trace: + [27642.538466] + [27642.538907] exc_page_fault+0x76/0x140 + [27642.539583] asm_exc_page_fault+0x22/0x30 + [27642.540282] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30 + [27642.541134] Code: 07 85 c0 75 11 ba ff 00 00 00 f0 0f b1 17 75 06 b8 + 01 00 00 00 c3 31 c0 c3 90 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 + 00 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 7e 02 00 00 48 89 d8 + 5b + [27642.543936] RSP: 0018:ffff88810770f7d8 EFLAGS: 00010046 + [27642.544803] RAX: 0000000000000000 RBX: 0000000000000202 RCX: + ffff888113ad96d8 + [27642.545916] RDX: 0000000000000001 RSI: ffff88810770f818 RDI: + 00000000000000a0 + [27642.547027] RBP: 0000000000000098 R08: 0000000000000400 R09: + ffff88810b980f00 + [27642.548140] R10: 0000000000000001 R11: ffff888101845a80 R12: + 00000000000000a8 + [27642.549263] R13: ffffffffa02a9060 R14: 00000000000000a0 R15: + ffff8881130d8a40 + [27642.550379] complete_all+0x20/0x90 + [27642.551010] mlx5e_ipsec_disable_events+0xb6/0xf0 [mlx5_core] + [27642.552022] mlx5e_nic_disable+0x12d/0x220 [mlx5_core] + [27642.552929] mlx5e_detach_netdev+0x66/0xf0 [mlx5_core] + [27642.553822] mlx5e_netdev_change_profile+0x5b/0x120 [mlx5_core] + [27642.554821] mlx5e_vport_rep_load+0x419/0x590 [mlx5_core] + [27642.555757] ? xa_load+0x53/0x90 + [27642.556361] __esw_offloads_load_rep+0x54/0x70 [mlx5_core] + [27642.557328] mlx5_esw_offloads_rep_load+0x45/0xd0 [mlx5_core] + [27642.558320] esw_offloads_enable+0xb4b/0xc90 [mlx5_core] + [27642.559247] mlx5_eswitch_enable_locked+0x34e/0x4f0 [mlx5_core] + [27642.560257] ? mlx5_rescan_drivers_locked+0x222/0x2d0 [mlx5_core] + [27642.561284] mlx5_devlink_eswitch_mode_set+0x5ac/0x9c0 [mlx5_core] + [27642.562334] ? devlink_rate_set_ops_supported+0x21/0x3a0 + [27642.563220] devlink_nl_eswitch_set_doit+0x67/0xe0 + [27642.564026] genl_family_rcv_msg_doit+0xe0/0x130 + [27642.564816] genl_rcv_msg+0x183/0x290 + [27642.565466] ? __devlink_nl_pre_doit.isra.0+0x160/0x160 + [27642.566329] ? devlink_nl_eswitch_get_doit+0x290/0x290 + [27642.567181] ? devlink_nl_pre_doit_parent_dev_optional+0x20/0x20 + [27642.568147] ? genl_family_rcv_msg_dumpit+0xf0/0xf0 + [27642.568966] netlink_rcv_skb+0x4b/0xf0 + [27642.569629] genl_rcv+0x24/0x40 + [27642.570215] netlink_unicast+0x255/0x380 + [27642.570901] ? __alloc_skb+0xfa/0x1e0 + [27642.571560] netlink_sendmsg+0x1f3/0x420 + [27642.572249] __sock_sendmsg+0x38/0x60 + [27642.572911] __sys_sendto+0x119/0x180 + [27642.573561] ? __sys_recvmsg+0x5c/0xb0 + [27642.574227] __x64_sys_sendto+0x20/0x30 + [27642.574904] do_syscall_64+0x55/0xc10 + [27642.575554] entry_SYSCALL_64_after_hwframe+0x4b/0x53 + [27642.576391] RIP: 0033:0x7f197c85e807 + [27642.577050] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 + 00 90 f3 0f 1e fa 80 3d 45 08 0d 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f + 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d + d0 + [27642.579846] RSP: 002b:00007ffebd4e2248 EFLAGS: 00000202 ORIG_RAX: + 000000000000002c + [27642.581082] RAX: ffffffffffffffda RBX: 000055cfcd9cd2a0 RCX: + 00007f197c85e807 + [27642.582200] RDX: 0000000000000038 RSI: 000055cfcd9cd490 RDI: + 0000000000000003 + [27642.583320] RBP: 00007ffebd4e2290 R08: 00007f197c942200 R09: + 000000000000000c + [27642.584437] R10: 0000000000000000 R11: 0000000000000202 R12: + 0000000000000000 + [27642.585555] R13: 000055cfcd9cd490 R14: 00007ffebd4e45d1 R15: + 000055cfcd9cd2a0 + [27642.586671] + [27642.587121] ---[ end trace 0000000000000000 ]--- + [27642.587910] BUG: kernel NULL pointer dereference, address: + 00000000000000a0 + + Fixes: 664f76be38a1 ("net/mlx5: Fix IPsec cleanup over MPV device") + Signed-off-by: Patrisious Haddad + Reviewed-by: Leon Romanovsky + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/20260305142634.1813208-2-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +index feef86fff4bf..91cfabc45032 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +@@ -2912,7 +2912,7 @@ void mlx5e_ipsec_disable_events(struct mlx5e_priv *priv) + goto out; + + peer_priv = mlx5_devcom_get_next_peer_data(priv->devcom, &tmp); +- if (peer_priv) ++ if (peer_priv && peer_priv->ipsec) + complete_all(&peer_priv->ipsec->comp); + + mlx5_devcom_for_each_peer_end(priv->devcom); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1677-net-mlx5-fix-hca-caps-leak-on-notifier-init-failure.patch b/SOURCES/1677-net-mlx5-fix-hca-caps-leak-on-notifier-init-failure.patch new file mode 100644 index 000000000..2aa74e3cd --- /dev/null +++ b/SOURCES/1677-net-mlx5-fix-hca-caps-leak-on-notifier-init-failure.patch @@ -0,0 +1,55 @@ +From cfefb5ef35ad6dd869244d59c0cfaf7caa4f7a93 Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Wed, 22 Apr 2026 09:53:23 -0400 +Subject: [PATCH] net/mlx5: Fix HCA caps leak on notifier init failure + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 + +commit d03fc81a57956248383efec99967d0ae627390a8 +Author: Prathamesh Deshpande +Date: Wed Apr 15 01:49:37 2026 +0100 + + net/mlx5: Fix HCA caps leak on notifier init failure + + mlx5_mdev_init() allocates HCA caps via mlx5_hca_caps_alloc() before + calling mlx5_notifiers_init(). If notifier initialization fails, the + error path jumps to err_hca_caps and skips mlx5_hca_caps_free(), leaking + allocated caps. + + Add a dedicated unwind label for notifier-init failure that frees HCA + caps before continuing the existing cleanup sequence. + + Fixes: b6b03097f982 ("net/mlx5: Initialize events outside devlink lock") + Signed-off-by: Prathamesh Deshpande + Reviewed-by: Cosmin Ratiu + Reviewed-by: Tariq Toukan + Link: https://patch.msgid.link/20260415005022.34764-1-prathameshdeshpande7@gmail.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 5f6a8eef1982..622bc2c5c6f9 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -1907,7 +1907,7 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx) + + err = mlx5_notifiers_init(dev); + if (err) +- goto err_hca_caps; ++ goto err_notifiers_init; + + /* The conjunction of sw_vhca_id with sw_owner_id will be a global + * unique id per function which uses mlx5_core. +@@ -1923,6 +1923,8 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx) + + return 0; + ++err_notifiers_init: ++ mlx5_hca_caps_free(dev); + err_hca_caps: + mlx5_adev_cleanup(dev); + err_adev_init: +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1678-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-striding-rq.patch b/SOURCES/1678-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-striding-rq.patch new file mode 100644 index 000000000..cc3f4987f --- /dev/null +++ b/SOURCES/1678-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-striding-rq.patch @@ -0,0 +1,148 @@ +From c493e44776b6a85009b43c2928b2b16658967f9d Mon Sep 17 00:00:00 2001 +From: Kamal Heib +Date: Wed, 22 Apr 2026 10:14:24 -0400 +Subject: [PATCH] net/mlx5e: RX, Fix XDP multi-buf frag counting for striding + RQ + +JIRA: https://redhat.atlassian.net/browse/RHEL-169055 +Conflicts: +Context diff due to the missing of the following commit: +1827f773e416 ("net: xdp: pass full flags to xdp_update_skb_shared_info()") + +commit db25c42c2e1f9c0d136420fff5e5700f7e771a6f +Author: Dragos Tatulea +Date: Thu Mar 5 16:26:33 2026 +0200 + + net/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQ + + XDP multi-buf programs can modify the layout of the XDP buffer when the + program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The + referenced commit in the fixes tag corrected the assumption in the mlx5 + driver that the XDP buffer layout doesn't change during a program + execution. However, this fix introduced another issue: the dropped + fragments still need to be counted on the driver side to avoid page + fragment reference counting issues. + + The issue was discovered by the drivers/net/xdp.py selftest, + more specifically the test_xdp_native_tx_mb: + - The mlx5 driver allocates a page_pool page and initializes it with + a frag counter of 64 (pp_ref_count=64) and the internal frag counter + to 0. + - The test sends one packet with no payload. + - On RX (mlx5e_skb_from_cqe_mpwrq_nonlinear()), mlx5 configures the XDP + buffer with the packet data starting in the first fragment which is the + page mentioned above. + - The XDP program runs and calls bpf_xdp_pull_data() which moves the + header into the linear part of the XDP buffer. As the packet doesn't + contain more data, the program drops the tail fragment since it no + longer contains any payload (pp_ref_count=63). + - mlx5 device skips counting this fragment. Internal frag counter + remains 0. + - mlx5 releases all 64 fragments of the page but page pp_ref_count is + 63 => negative reference counting error. + + Resulting splat during the test: + + WARNING: CPU: 0 PID: 188225 at ./include/net/page_pool/helpers.h:297 mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core] + Modules linked in: [...] + CPU: 0 UID: 0 PID: 188225 Comm: ip Not tainted 6.18.0-rc7_for_upstream_min_debug_2025_12_08_11_44 #1 NONE + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 + RIP: 0010:mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core] + [...] + Call Trace: + + mlx5e_free_rx_mpwqe+0x20a/0x250 [mlx5_core] + mlx5e_dealloc_rx_mpwqe+0x37/0xb0 [mlx5_core] + mlx5e_free_rx_descs+0x11a/0x170 [mlx5_core] + mlx5e_close_rq+0x78/0xa0 [mlx5_core] + mlx5e_close_queues+0x46/0x2a0 [mlx5_core] + mlx5e_close_channel+0x24/0x90 [mlx5_core] + mlx5e_close_channels+0x5d/0xf0 [mlx5_core] + mlx5e_safe_switch_params+0x2ec/0x380 [mlx5_core] + mlx5e_change_mtu+0x11d/0x490 [mlx5_core] + mlx5e_change_nic_mtu+0x19/0x30 [mlx5_core] + netif_set_mtu_ext+0xfc/0x240 + do_setlink.isra.0+0x226/0x1100 + rtnl_newlink+0x7a9/0xba0 + rtnetlink_rcv_msg+0x220/0x3c0 + netlink_rcv_skb+0x4b/0xf0 + netlink_unicast+0x255/0x380 + netlink_sendmsg+0x1f3/0x420 + __sock_sendmsg+0x38/0x60 + ____sys_sendmsg+0x1e8/0x240 + ___sys_sendmsg+0x7c/0xb0 + [...] + __sys_sendmsg+0x5f/0xb0 + do_syscall_64+0x55/0xc70 + + The problem applies for XDP_PASS as well which is handled in a different + code path in the driver. + + This patch fixes the issue by doing page frag counting on all the + original XDP buffer fragments for all relevant XDP actions (XDP_TX , + XDP_REDIRECT and XDP_PASS). This is basically reverting to the original + counting before the commit in the fixes tag. + + As frag_page is still pointing to the original tail, the nr_frags + parameter to xdp_update_skb_frags_info() needs to be calculated + in a different way to reflect the new nr_frags. + + Fixes: 87bcef158ac1 ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ") + Signed-off-by: Dragos Tatulea + Cc: Amery Hung + Reviewed-by: Nimrod Oren + Signed-off-by: Tariq Toukan + Link: https://patch.msgid.link/20260305142634.1813208-5-tariqt@nvidia.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Kamal Heib + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +index ae2bc2275c14..3c34d8f7367d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +@@ -2086,14 +2086,13 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + + if (prog) { + u8 nr_frags_free, old_nr_frags = sinfo->nr_frags; ++ u8 new_nr_frags; + u32 len; + + if (mlx5e_xdp_handle(rq, prog, mxbuf)) { + if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { + struct mlx5e_frag_page *pfp; + +- frag_page -= old_nr_frags - sinfo->nr_frags; +- + for (pfp = head_page; pfp < frag_page; pfp++) + pfp->frags++; + +@@ -2104,13 +2103,12 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + return NULL; /* page/packet was consumed by XDP */ + } + +- nr_frags_free = old_nr_frags - sinfo->nr_frags; +- if (unlikely(nr_frags_free)) { +- frag_page -= nr_frags_free; ++ new_nr_frags = sinfo->nr_frags; ++ nr_frags_free = old_nr_frags - new_nr_frags; ++ if (unlikely(nr_frags_free)) + truesize -= (nr_frags_free - 1) * PAGE_SIZE + + ALIGN(pg_consumed_bytes, + BIT(rq->mpwqe.log_stride_sz)); +- } + + len = mxbuf->xdp.data_end - mxbuf->xdp.data; + +@@ -2132,7 +2130,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w + struct mlx5e_frag_page *pagep; + + /* sinfo->nr_frags is reset by build_skb, calculate again. */ +- xdp_update_skb_shared_info(skb, frag_page - head_page, ++ xdp_update_skb_shared_info(skb, new_nr_frags, + sinfo->xdp_frags_size, truesize, + xdp_buff_is_frag_pfmemalloc( + &mxbuf->xdp)); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1679-iavf-fix-vlan-filter-lost-on-add-delete-race.patch b/SOURCES/1679-iavf-fix-vlan-filter-lost-on-add-delete-race.patch new file mode 100644 index 000000000..f3f0de334 --- /dev/null +++ b/SOURCES/1679-iavf-fix-vlan-filter-lost-on-add-delete-race.patch @@ -0,0 +1,72 @@ +From 78e72af8acdf612a08878849b70dfd42da25e53a Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Mon, 4 May 2026 08:45:34 +0000 +Subject: [PATCH] iavf: fix VLAN filter lost on add/delete race + +JIRA: https://redhat.atlassian.net/browse/RHEL-144630 + +commit fc9c69be594756b81b54c6bc40803fa6052f35ae +Author: Petr Oros +Date: Wed Feb 25 11:01:37 2026 +0100 + + iavf: fix VLAN filter lost on add/delete race + + When iavf_add_vlan() finds an existing filter in IAVF_VLAN_REMOVE + state, it transitions the filter to IAVF_VLAN_ACTIVE assuming the + pending delete can simply be cancelled. However, there is no guarantee + that iavf_del_vlans() has not already processed the delete AQ request + and removed the filter from the PF. In that case the filter remains in + the driver's list as IAVF_VLAN_ACTIVE but is no longer programmed on + the NIC. Since iavf_add_vlans() only picks up filters in + IAVF_VLAN_ADD state, the filter is never re-added, and spoof checking + drops all traffic for that VLAN. + + CPU0 CPU1 Workqueue + ---- ---- --------- + iavf_del_vlan(vlan 100) + f->state = REMOVE + schedule AQ_DEL_VLAN + iavf_add_vlan(vlan 100) + f->state = ACTIVE + iavf_del_vlans() + f is ACTIVE, skip + iavf_add_vlans() + f is ACTIVE, skip + + Filter is ACTIVE in driver but absent from NIC. + + Transition to IAVF_VLAN_ADD instead and schedule + IAVF_FLAG_AQ_ADD_VLAN_FILTER so iavf_add_vlans() re-programs the + filter. A duplicate add is idempotent on the PF. + + Fixes: 0c0da0e95105 ("iavf: refactor VLAN filter states") + Signed-off-by: Petr Oros + Tested-by: Rafal Romanowski + Signed-off-by: Tony Nguyen + +Signed-off-by: CKI Backport Bot + +diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c +index 7a5efc9ea63f..b5f3774a80a6 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf_main.c ++++ b/drivers/net/ethernet/intel/iavf/iavf_main.c +@@ -781,10 +781,13 @@ iavf_vlan_filter *iavf_add_vlan(struct iavf_adapter *adapter, + adapter->num_vlan_filters++; + iavf_schedule_aq_request(adapter, IAVF_FLAG_AQ_ADD_VLAN_FILTER); + } else if (f->state == IAVF_VLAN_REMOVE) { +- /* IAVF_VLAN_REMOVE means that VLAN wasn't yet removed. +- * We can safely only change the state here. ++ /* Re-add the filter since we cannot tell whether the ++ * pending delete has already been processed by the PF. ++ * A duplicate add is harmless. + */ +- f->state = IAVF_VLAN_ACTIVE; ++ f->state = IAVF_VLAN_ADD; ++ iavf_schedule_aq_request(adapter, ++ IAVF_FLAG_AQ_ADD_VLAN_FILTER); + } + + clearout: +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1680-iavf-rename-iavf-vlan-is-new-to-iavf-vlan-adding.patch b/SOURCES/1680-iavf-rename-iavf-vlan-is-new-to-iavf-vlan-adding.patch new file mode 100644 index 000000000..4d3297ae3 --- /dev/null +++ b/SOURCES/1680-iavf-rename-iavf-vlan-is-new-to-iavf-vlan-adding.patch @@ -0,0 +1,87 @@ +From 0f54b3aee80737b4a44a4fb5c15ec3caafcc62e6 Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Mon, 4 May 2026 08:45:35 +0000 +Subject: [PATCH] iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING + +JIRA: https://redhat.atlassian.net/browse/RHEL-144630 + +commit 70d62b669f1f9080a25278fc90b64309f4ae8959 +Author: Petr Oros +Date: Mon Apr 27 22:22:13 2026 -0700 + + iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING + + Rename the IAVF_VLAN_IS_NEW state to IAVF_VLAN_ADDING to better + describe what the state represents: an ADD request has been sent to + the PF and is waiting for a response. + + This is a pure rename with no behavioral change, preparing for a + cleanup of the VLAN filter state machine. + + Signed-off-by: Petr Oros + Reviewed-by: Aleksandr Loktionov + Tested-by: Rafal Romanowski + Reviewed-by: Simon Horman + Reviewed-by: Przemek Kitszel + Signed-off-by: Jacob Keller + Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-1-cdcb48303fd8@intel.com + Signed-off-by: Paolo Abeni + +Signed-off-by: CKI Backport Bot + +diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h +index 0d7cd7d4335d..94e606e3fcd6 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf.h ++++ b/drivers/net/ethernet/intel/iavf/iavf.h +@@ -158,7 +158,7 @@ struct iavf_vlan { + enum iavf_vlan_state_t { + IAVF_VLAN_INVALID, + IAVF_VLAN_ADD, /* filter needs to be added */ +- IAVF_VLAN_IS_NEW, /* filter is new, wait for PF answer */ ++ IAVF_VLAN_ADDING, /* ADD sent to PF, waiting for response */ + IAVF_VLAN_ACTIVE, /* filter is accepted by PF */ + IAVF_VLAN_DISABLE, /* filter needs to be deleted by PF, then marked INACTIVE */ + IAVF_VLAN_INACTIVE, /* filter is inactive, we are in IFF_DOWN */ +diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +index 88156082a41d..5114934fe81f 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c ++++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +@@ -746,7 +746,7 @@ static void iavf_vlan_add_reject(struct iavf_adapter *adapter) + + spin_lock_bh(&adapter->mac_vlan_list_lock); + list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) { +- if (f->state == IAVF_VLAN_IS_NEW) { ++ if (f->state == IAVF_VLAN_ADDING) { + list_del(&f->list); + kfree(f); + adapter->num_vlan_filters--; +@@ -812,7 +812,7 @@ void iavf_add_vlans(struct iavf_adapter *adapter) + if (f->state == IAVF_VLAN_ADD) { + vvfl->vlan_id[i] = f->vlan.vid; + i++; +- f->state = IAVF_VLAN_IS_NEW; ++ f->state = IAVF_VLAN_ADDING; + if (i == count) + break; + } +@@ -874,7 +874,7 @@ void iavf_add_vlans(struct iavf_adapter *adapter) + vlan->tpid = f->vlan.tpid; + + i++; +- f->state = IAVF_VLAN_IS_NEW; ++ f->state = IAVF_VLAN_ADDING; + } + } + +@@ -2911,7 +2911,7 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, + + spin_lock_bh(&adapter->mac_vlan_list_lock); + list_for_each_entry(f, &adapter->vlan_filter_list, list) { +- if (f->state == IAVF_VLAN_IS_NEW) ++ if (f->state == IAVF_VLAN_ADDING) + f->state = IAVF_VLAN_ACTIVE; + } + spin_unlock_bh(&adapter->mac_vlan_list_lock); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1681-iavf-stop-removing-vlan-filters-from-pf-on-interface-down.patch b/SOURCES/1681-iavf-stop-removing-vlan-filters-from-pf-on-interface-down.patch new file mode 100644 index 000000000..05ee26fb6 --- /dev/null +++ b/SOURCES/1681-iavf-stop-removing-vlan-filters-from-pf-on-interface-down.patch @@ -0,0 +1,233 @@ +From be4a5925a33238e2197330d94f77af7b9a4094a8 Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Mon, 4 May 2026 08:45:36 +0000 +Subject: [PATCH] iavf: stop removing VLAN filters from PF on interface down + +JIRA: https://redhat.atlassian.net/browse/RHEL-144630 + +commit f2ce65b9b917474a1a6ce68d357e15fac2aca0f2 +Author: Petr Oros +Date: Mon Apr 27 22:22:14 2026 -0700 + + iavf: stop removing VLAN filters from PF on interface down + + When a VF goes down, the driver currently sends DEL_VLAN to the PF for + every VLAN filter (ACTIVE -> DISABLE -> send DEL -> INACTIVE), then + re-adds them all on UP (INACTIVE -> ADD -> send ADD -> ADDING -> + ACTIVE). This round-trip is unnecessary because: + + 1. The PF disables the VF's queues via VIRTCHNL_OP_DISABLE_QUEUES, + which already prevents all RX/TX traffic regardless of VLAN filter + state. + + 2. The VLAN filters remaining in PF HW while the VF is down is + harmless - packets matching those filters have nowhere to go with + queues disabled. + + 3. The DEL+ADD cycle during down/up creates race windows where the + VLAN filter list is incomplete. With spoofcheck enabled, the PF + enables TX VLAN filtering on the first non-zero VLAN add, blocking + traffic for any VLANs not yet re-added. + + Remove the entire DISABLE/INACTIVE state machinery: + - Remove IAVF_VLAN_DISABLE and IAVF_VLAN_INACTIVE enum values + - Remove iavf_restore_filters() and its call from iavf_open() + - Remove VLAN filter handling from iavf_clear_mac_vlan_filters(), + rename it to iavf_clear_mac_filters() + - Remove DEL_VLAN_FILTER scheduling from iavf_down() + - Remove all DISABLE/INACTIVE handling from iavf_del_vlans() + + VLAN filters now stay ACTIVE across down/up cycles. Only explicit + user removal (ndo_vlan_rx_kill_vid) or PF/VF reset triggers VLAN + filter deletion/re-addition. + + Fixes: ed1f5b58ea01 ("i40evf: remove VLAN filters on close") + Signed-off-by: Petr Oros + Reviewed-by: Aleksandr Loktionov + Tested-by: Rafal Romanowski + Reviewed-by: Simon Horman + Reviewed-by: Przemek Kitszel + Signed-off-by: Jacob Keller + Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-2-cdcb48303fd8@intel.com + Signed-off-by: Paolo Abeni + +Signed-off-by: CKI Backport Bot + +diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h +index 94e606e3fcd6..14c4084f3739 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf.h ++++ b/drivers/net/ethernet/intel/iavf/iavf.h +@@ -159,10 +159,8 @@ enum iavf_vlan_state_t { + IAVF_VLAN_INVALID, + IAVF_VLAN_ADD, /* filter needs to be added */ + IAVF_VLAN_ADDING, /* ADD sent to PF, waiting for response */ +- IAVF_VLAN_ACTIVE, /* filter is accepted by PF */ +- IAVF_VLAN_DISABLE, /* filter needs to be deleted by PF, then marked INACTIVE */ +- IAVF_VLAN_INACTIVE, /* filter is inactive, we are in IFF_DOWN */ +- IAVF_VLAN_REMOVE, /* filter needs to be removed from list */ ++ IAVF_VLAN_ACTIVE, /* PF confirmed, filter is in HW */ ++ IAVF_VLAN_REMOVE, /* filter queued for DEL from PF */ + }; + + struct iavf_vlan_filter { +diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c +index b5f3774a80a6..796473fe57f4 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf_main.c ++++ b/drivers/net/ethernet/intel/iavf/iavf_main.c +@@ -825,27 +825,6 @@ static void iavf_del_vlan(struct iavf_adapter *adapter, struct iavf_vlan vlan) + spin_unlock_bh(&adapter->mac_vlan_list_lock); + } + +-/** +- * iavf_restore_filters +- * @adapter: board private structure +- * +- * Restore existing non MAC filters when VF netdev comes back up +- **/ +-static void iavf_restore_filters(struct iavf_adapter *adapter) +-{ +- struct iavf_vlan_filter *f; +- +- /* re-add all VLAN filters */ +- spin_lock_bh(&adapter->mac_vlan_list_lock); +- +- list_for_each_entry(f, &adapter->vlan_filter_list, list) { +- if (f->state == IAVF_VLAN_INACTIVE) +- f->state = IAVF_VLAN_ADD; +- } +- +- spin_unlock_bh(&adapter->mac_vlan_list_lock); +- adapter->aq_required |= IAVF_FLAG_AQ_ADD_VLAN_FILTER; +-} + + /** + * iavf_get_num_vlans_added - get number of VLANs added +@@ -1264,13 +1243,12 @@ static void iavf_up_complete(struct iavf_adapter *adapter) + } + + /** +- * iavf_clear_mac_vlan_filters - Remove mac and vlan filters not sent to PF +- * yet and mark other to be removed. ++ * iavf_clear_mac_filters - Remove MAC filters not sent to PF yet and mark ++ * others to be removed. + * @adapter: board private structure + **/ +-static void iavf_clear_mac_vlan_filters(struct iavf_adapter *adapter) ++static void iavf_clear_mac_filters(struct iavf_adapter *adapter) + { +- struct iavf_vlan_filter *vlf, *vlftmp; + struct iavf_mac_filter *f, *ftmp; + + spin_lock_bh(&adapter->mac_vlan_list_lock); +@@ -1289,11 +1267,6 @@ static void iavf_clear_mac_vlan_filters(struct iavf_adapter *adapter) + } + } + +- /* disable all VLAN filters */ +- list_for_each_entry_safe(vlf, vlftmp, &adapter->vlan_filter_list, +- list) +- vlf->state = IAVF_VLAN_DISABLE; +- + spin_unlock_bh(&adapter->mac_vlan_list_lock); + } + +@@ -1389,7 +1362,7 @@ void iavf_down(struct iavf_adapter *adapter) + iavf_napi_disable_all(adapter); + iavf_irq_disable(adapter); + +- iavf_clear_mac_vlan_filters(adapter); ++ iavf_clear_mac_filters(adapter); + iavf_clear_cloud_filters(adapter); + iavf_clear_fdir_filters(adapter); + iavf_clear_adv_rss_conf(adapter); +@@ -1406,8 +1379,6 @@ void iavf_down(struct iavf_adapter *adapter) + */ + if (!list_empty(&adapter->mac_filter_list)) + adapter->aq_required |= IAVF_FLAG_AQ_DEL_MAC_FILTER; +- if (!list_empty(&adapter->vlan_filter_list)) +- adapter->aq_required |= IAVF_FLAG_AQ_DEL_VLAN_FILTER; + if (!list_empty(&adapter->cloud_filter_list)) + adapter->aq_required |= IAVF_FLAG_AQ_DEL_CLOUD_FILTER; + if (!list_empty(&adapter->fdir_list_head)) +@@ -4562,8 +4533,6 @@ static int iavf_open(struct net_device *netdev) + + spin_unlock_bh(&adapter->mac_vlan_list_lock); + +- /* Restore filters that were removed with IFF_DOWN */ +- iavf_restore_filters(adapter); + iavf_restore_fdir_filters(adapter); + + iavf_configure(adapter); +diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +index 5114934fe81f..d62c0d639414 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c ++++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +@@ -911,22 +911,12 @@ void iavf_del_vlans(struct iavf_adapter *adapter) + spin_lock_bh(&adapter->mac_vlan_list_lock); + + list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) { +- /* since VLAN capabilities are not allowed, we dont want to send +- * a VLAN delete request because it will most likely fail and +- * create unnecessary errors/noise, so just free the VLAN +- * filters marked for removal to enable bailing out before +- * sending a virtchnl message +- */ + if (f->state == IAVF_VLAN_REMOVE && + !VLAN_FILTERING_ALLOWED(adapter)) { + list_del(&f->list); + kfree(f); + adapter->num_vlan_filters--; +- } else if (f->state == IAVF_VLAN_DISABLE && +- !VLAN_FILTERING_ALLOWED(adapter)) { +- f->state = IAVF_VLAN_INACTIVE; +- } else if (f->state == IAVF_VLAN_REMOVE || +- f->state == IAVF_VLAN_DISABLE) { ++ } else if (f->state == IAVF_VLAN_REMOVE) { + count++; + } + } +@@ -959,13 +949,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter) + vvfl->vsi_id = adapter->vsi_res->vsi_id; + vvfl->num_elements = count; + list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) { +- if (f->state == IAVF_VLAN_DISABLE) { +- vvfl->vlan_id[i] = f->vlan.vid; +- f->state = IAVF_VLAN_INACTIVE; +- i++; +- if (i == count) +- break; +- } else if (f->state == IAVF_VLAN_REMOVE) { ++ if (f->state == IAVF_VLAN_REMOVE) { + vvfl->vlan_id[i] = f->vlan.vid; + list_del(&f->list); + kfree(f); +@@ -1007,8 +991,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter) + vvfl_v2->vport_id = adapter->vsi_res->vsi_id; + vvfl_v2->num_elements = count; + list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) { +- if (f->state == IAVF_VLAN_DISABLE || +- f->state == IAVF_VLAN_REMOVE) { ++ if (f->state == IAVF_VLAN_REMOVE) { + struct virtchnl_vlan_supported_caps *filtering_support = + &adapter->vlan_v2_caps.filtering.filtering_support; + struct virtchnl_vlan *vlan; +@@ -1022,13 +1005,9 @@ void iavf_del_vlans(struct iavf_adapter *adapter) + vlan->tci = f->vlan.vid; + vlan->tpid = f->vlan.tpid; + +- if (f->state == IAVF_VLAN_DISABLE) { +- f->state = IAVF_VLAN_INACTIVE; +- } else { +- list_del(&f->list); +- kfree(f); +- adapter->num_vlan_filters--; +- } ++ list_del(&f->list); ++ kfree(f); ++ adapter->num_vlan_filters--; + i++; + if (i == count) + break; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1682-iavf-wait-for-pf-confirmation-before-removing-vlan-filters.patch b/SOURCES/1682-iavf-wait-for-pf-confirmation-before-removing-vlan-filters.patch new file mode 100644 index 000000000..d2cc90392 --- /dev/null +++ b/SOURCES/1682-iavf-wait-for-pf-confirmation-before-removing-vlan-filters.patch @@ -0,0 +1,189 @@ +From ef8fb4129e9999e7462c8b53a531ec72164b5ff4 Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Mon, 4 May 2026 08:45:37 +0000 +Subject: [PATCH] iavf: wait for PF confirmation before removing VLAN filters + +JIRA: https://redhat.atlassian.net/browse/RHEL-144630 + +commit bbcbe4ed70dea948849549af7edf44bd42bbd695 +Author: Petr Oros +Date: Mon Apr 27 22:22:15 2026 -0700 + + iavf: wait for PF confirmation before removing VLAN filters + + The VLAN filter DELETE path was asymmetric with the ADD path: ADD + waits for PF confirmation (ADD -> ADDING -> ACTIVE), but DELETE + immediately frees the filter struct after sending the DEL message + without waiting for the PF response. + + This is problematic because: + - If the PF rejects the DEL, the filter remains in HW but the driver + has already freed the tracking structure, losing sync. + - Race conditions between DEL pending and other operations + (add, reset) cannot be properly resolved if the filter struct + is already gone. + + Add IAVF_VLAN_REMOVING state to make the DELETE path symmetric: + + REMOVE -> REMOVING (send DEL) -> PF confirms -> kfree + -> PF rejects -> ACTIVE + + In iavf_del_vlans(), transition filters from REMOVE to REMOVING + instead of immediately freeing them. The new DEL completion handler + in iavf_virtchnl_completion() frees filters on success or reverts + them to ACTIVE on error. + + Update iavf_add_vlan() to handle the REMOVING state: if a DEL is + pending and the user re-adds the same VLAN, queue it for ADD so + it gets re-programmed after the PF processes the DEL. + + The !VLAN_FILTERING_ALLOWED early-exit path still frees filters + directly since no PF message is sent in that case. + + Also update iavf_del_vlan() to skip filters already in REMOVING + state: DEL has been sent to PF and the completion handler will + free the filter when PF confirms. Without this guard, the sequence + DEL(pending) -> user-del -> second DEL could cause the PF to return + an error for the second DEL (filter already gone), causing the + completion handler to incorrectly revert a deleted filter back to + ACTIVE. + + Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection") + Signed-off-by: Petr Oros + Reviewed-by: Aleksandr Loktionov + Tested-by: Rafal Romanowski + Reviewed-by: Przemek Kitszel + Signed-off-by: Jacob Keller + Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-3-cdcb48303fd8@intel.com + Signed-off-by: Paolo Abeni + +Signed-off-by: CKI Backport Bot + +diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h +index 14c4084f3739..a0bf39bdcbeb 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf.h ++++ b/drivers/net/ethernet/intel/iavf/iavf.h +@@ -161,6 +161,7 @@ enum iavf_vlan_state_t { + IAVF_VLAN_ADDING, /* ADD sent to PF, waiting for response */ + IAVF_VLAN_ACTIVE, /* PF confirmed, filter is in HW */ + IAVF_VLAN_REMOVE, /* filter queued for DEL from PF */ ++ IAVF_VLAN_REMOVING, /* DEL sent to PF, waiting for response */ + }; + + struct iavf_vlan_filter { +diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c +index 796473fe57f4..f50dcf75bd6c 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf_main.c ++++ b/drivers/net/ethernet/intel/iavf/iavf_main.c +@@ -781,10 +781,10 @@ iavf_vlan_filter *iavf_add_vlan(struct iavf_adapter *adapter, + adapter->num_vlan_filters++; + iavf_schedule_aq_request(adapter, IAVF_FLAG_AQ_ADD_VLAN_FILTER); + } else if (f->state == IAVF_VLAN_REMOVE) { +- /* Re-add the filter since we cannot tell whether the +- * pending delete has already been processed by the PF. +- * A duplicate add is harmless. +- */ ++ /* DEL not yet sent to PF, cancel it */ ++ f->state = IAVF_VLAN_ACTIVE; ++ } else if (f->state == IAVF_VLAN_REMOVING) { ++ /* DEL already sent to PF, re-add after completion */ + f->state = IAVF_VLAN_ADD; + iavf_schedule_aq_request(adapter, + IAVF_FLAG_AQ_ADD_VLAN_FILTER); +@@ -815,11 +815,14 @@ static void iavf_del_vlan(struct iavf_adapter *adapter, struct iavf_vlan vlan) + list_del(&f->list); + kfree(f); + adapter->num_vlan_filters--; +- } else { ++ } else if (f->state != IAVF_VLAN_REMOVING) { + f->state = IAVF_VLAN_REMOVE; + iavf_schedule_aq_request(adapter, + IAVF_FLAG_AQ_DEL_VLAN_FILTER); + } ++ /* If REMOVING, DEL is already sent to PF; completion ++ * handler will free the filter when PF confirms. ++ */ + } + + spin_unlock_bh(&adapter->mac_vlan_list_lock); +diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +index d62c0d639414..d0b7b8106793 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c ++++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +@@ -948,12 +948,10 @@ void iavf_del_vlans(struct iavf_adapter *adapter) + + vvfl->vsi_id = adapter->vsi_res->vsi_id; + vvfl->num_elements = count; +- list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) { ++ list_for_each_entry(f, &adapter->vlan_filter_list, list) { + if (f->state == IAVF_VLAN_REMOVE) { + vvfl->vlan_id[i] = f->vlan.vid; +- list_del(&f->list); +- kfree(f); +- adapter->num_vlan_filters--; ++ f->state = IAVF_VLAN_REMOVING; + i++; + if (i == count) + break; +@@ -990,7 +988,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter) + + vvfl_v2->vport_id = adapter->vsi_res->vsi_id; + vvfl_v2->num_elements = count; +- list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) { ++ list_for_each_entry(f, &adapter->vlan_filter_list, list) { + if (f->state == IAVF_VLAN_REMOVE) { + struct virtchnl_vlan_supported_caps *filtering_support = + &adapter->vlan_v2_caps.filtering.filtering_support; +@@ -1005,9 +1003,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter) + vlan->tci = f->vlan.vid; + vlan->tpid = f->vlan.tpid; + +- list_del(&f->list); +- kfree(f); +- adapter->num_vlan_filters--; ++ f->state = IAVF_VLAN_REMOVING; + i++; + if (i == count) + break; +@@ -2370,10 +2366,6 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, + ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr); + wake_up(&adapter->vc_waitqueue); + break; +- case VIRTCHNL_OP_DEL_VLAN: +- dev_err(&adapter->pdev->dev, "Failed to delete VLAN filter, error %s\n", +- iavf_stat_str(&adapter->hw, v_retval)); +- break; + case VIRTCHNL_OP_DEL_ETH_ADDR: + dev_err(&adapter->pdev->dev, "Failed to delete MAC filter, error %s\n", + iavf_stat_str(&adapter->hw, v_retval)); +@@ -2896,6 +2888,27 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, + spin_unlock_bh(&adapter->mac_vlan_list_lock); + } + break; ++ case VIRTCHNL_OP_DEL_VLAN: ++ case VIRTCHNL_OP_DEL_VLAN_V2: { ++ struct iavf_vlan_filter *f, *ftmp; ++ ++ spin_lock_bh(&adapter->mac_vlan_list_lock); ++ list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, ++ list) { ++ if (f->state == IAVF_VLAN_REMOVING) { ++ if (v_retval) { ++ /* PF rejected DEL, keep filter */ ++ f->state = IAVF_VLAN_ACTIVE; ++ } else { ++ list_del(&f->list); ++ kfree(f); ++ adapter->num_vlan_filters--; ++ } ++ } ++ } ++ spin_unlock_bh(&adapter->mac_vlan_list_lock); ++ } ++ break; + case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING: + /* PF enabled vlan strip on this VF. + * Update netdev->features if needed to be in sync with ethtool. +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1683-iavf-add-virtchnl-op-add-vlan-to-success-completion-handler.patch b/SOURCES/1683-iavf-add-virtchnl-op-add-vlan-to-success-completion-handler.patch new file mode 100644 index 000000000..b9b253b80 --- /dev/null +++ b/SOURCES/1683-iavf-add-virtchnl-op-add-vlan-to-success-completion-handler.patch @@ -0,0 +1,60 @@ +From 41f0c9ca3d4bd319e2dcea3b46422291bc178b6a Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Mon, 4 May 2026 08:45:38 +0000 +Subject: [PATCH] iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler + +JIRA: https://redhat.atlassian.net/browse/RHEL-144630 + +commit 34d33313b52eeac3a97ad2e3176d523ec70d9283 +Author: Petr Oros +Date: Mon Apr 27 22:22:16 2026 -0700 + + iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler + + The V1 ADD_VLAN opcode had no success handler; filters sent via V1 + stayed in ADDING state permanently. Add a fallthrough case so V1 + filters also transition ADDING -> ACTIVE on PF confirmation. + + Critically, add an `if (v_retval) break` guard: the error switch in + iavf_virtchnl_completion() does NOT return after handling errors, + it falls through to the success switch. Without this guard, a + PF-rejected ADD would incorrectly mark ADDING filters as ACTIVE, + creating a driver/HW mismatch where the driver believes the filter + is installed but the PF never accepted it. + + For V2, this is harmless: iavf_vlan_add_reject() in the error + block already kfree'd all ADDING filters, so the success handler + finds nothing to transition. + + Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection") + Signed-off-by: Petr Oros + Reviewed-by: Aleksandr Loktionov + Tested-by: Rafal Romanowski + Reviewed-by: Przemek Kitszel + Signed-off-by: Jacob Keller + Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-4-cdcb48303fd8@intel.com + Signed-off-by: Paolo Abeni + +Signed-off-by: CKI Backport Bot + +diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +index d0b7b8106793..147adb76f641 100644 +--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c ++++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +@@ -2877,9 +2877,13 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, + spin_unlock_bh(&adapter->adv_rss_lock); + } + break; ++ case VIRTCHNL_OP_ADD_VLAN: + case VIRTCHNL_OP_ADD_VLAN_V2: { + struct iavf_vlan_filter *f; + ++ if (v_retval) ++ break; ++ + spin_lock_bh(&adapter->mac_vlan_list_lock); + list_for_each_entry(f, &adapter->vlan_filter_list, list) { + if (f->state == IAVF_VLAN_ADDING) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1684-netfilter-skip-recording-stale-or-retransmitted-init.patch b/SOURCES/1684-netfilter-skip-recording-stale-or-retransmitted-init.patch new file mode 100644 index 000000000..1f082debd --- /dev/null +++ b/SOURCES/1684-netfilter-skip-recording-stale-or-retransmitted-init.patch @@ -0,0 +1,66 @@ +From d2438e908b9e5d25bcc9b798703ee520770d2172 Mon Sep 17 00:00:00 2001 +From: Xin Long +Date: Wed, 29 Apr 2026 09:33:15 -0400 +Subject: [PATCH] netfilter: skip recording stale or retransmitted INIT + +JIRA: https://issues.redhat.com/browse/RHEL-158357 +Tested: compile only + +commit 576a5d2bad4814c881a829576b1261b9b8159d2b +Author: Xin Long +Date: Sun Apr 26 10:46:40 2026 -0400 + + netfilter: skip recording stale or retransmitted INIT + + An INIT whose init_tag matches the peer's vtag does not provide new state + information. It indicates either: + + - a stale INIT (after INIT-ACK has already been seen on the same side), or + - a retransmitted INIT (after INIT has already been recorded on the same + side). + + In both cases, the INIT must not update ct->proto.sctp.init[] state, since + it does not advance the handshake tracking and may otherwise corrupt + INIT/INIT-ACK validation logic. + + Allow INIT processing only when the conntrack entry is newly created + (SCTP_CONNTRACK_NONE), or when the init_tag differs from the stored peer + vtag. + + Note it skips the check for the ct with old_state SCTP_CONNTRACK_NONE in + nf_conntrack_sctp_packet(), as it is just created in sctp_new() where it + set ct->proto.sctp.vtag[IP_CT_DIR_REPLY] = ih->init_tag. + + Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") + Signed-off-by: Xin Long + Reviewed-by: Marcelo Ricardo Leitner + Acked-by: Florian Westphal + Link: https://patch.msgid.link/ee56c3e416452b2a40589a2a85245ac2ad5e9f4b.1777214801.git.lucien.xin@gmail.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Xin Long + +diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c +index a9f11910a131..014d3468515e 100644 +--- a/net/netfilter/nf_conntrack_proto_sctp.c ++++ b/net/netfilter/nf_conntrack_proto_sctp.c +@@ -470,9 +470,13 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct, + if (!ih) + goto out_unlock; + +- if (ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir]) +- ct->proto.sctp.init[!dir] = 0; +- ct->proto.sctp.init[dir] = 1; ++ /* Do not record INIT matching peer vtag (stale or retransmitted INIT). */ ++ if (old_state == SCTP_CONNTRACK_NONE || ++ ct->proto.sctp.vtag[!dir] != ih->init_tag) { ++ if (ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir]) ++ ct->proto.sctp.init[!dir] = 0; ++ ct->proto.sctp.init[dir] = 1; ++ } + + pr_debug("Setting vtag %x for dir %d\n", ih->init_tag, !dir); + ct->proto.sctp.vtag[!dir] = ih->init_tag; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1685-sctp-discard-stale-init-after-handshake-completion.patch b/SOURCES/1685-sctp-discard-stale-init-after-handshake-completion.patch new file mode 100644 index 000000000..2e36b3483 --- /dev/null +++ b/SOURCES/1685-sctp-discard-stale-init-after-handshake-completion.patch @@ -0,0 +1,52 @@ +From 9df78bef9f55f4b9d1cde20340fae050a6d6b12b Mon Sep 17 00:00:00 2001 +From: Xin Long +Date: Wed, 29 Apr 2026 09:33:15 -0400 +Subject: [PATCH] sctp: discard stale INIT after handshake completion +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://issues.redhat.com/browse/RHEL-158357 +Tested: compile only + +commit 8a92cb475ca90d84db769e4d4383e631ace0d6e5 +Author: Xin Long +Date: Sun Apr 26 10:46:41 2026 -0400 + + sctp: discard stale INIT after handshake completion + + After an association reaches ESTABLISHED, the peer’s init_tag is already + known from the handshake. Any subsequent INIT with the same init_tag is + not a valid restart, but a delayed or duplicate INIT. + + Drop such INIT chunks in sctp_sf_do_unexpected_init() instead of + processing them as new association attempts. + + Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") + Signed-off-by: Xin Long + Acked-by: Marcelo Ricardo Leitner + Link: https://patch.msgid.link/5788c76c1ee122a3ed00189e88dcf9df1fba226c.1777214801.git.lucien.xin@gmail.com + Signed-off-by: Jakub Kicinski + +Signed-off-by: Xin Long + +diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c +index 4848d5d50a5f..488a74d14644 100644 +--- a/net/sctp/sm_statefuns.c ++++ b/net/sctp/sm_statefuns.c +@@ -1554,6 +1554,12 @@ static enum sctp_disposition sctp_sf_do_unexpected_init( + /* Tag the variable length parameters. */ + chunk->param_hdr.v = skb_pull(chunk->skb, sizeof(struct sctp_inithdr)); + ++ if (asoc->state >= SCTP_STATE_ESTABLISHED) { ++ /* Discard INIT matching peer vtag after handshake completion (stale INIT). */ ++ if (ntohl(chunk->subh.init_hdr->init_tag) == asoc->peer.i.init_tag) ++ return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); ++ } ++ + /* Verify the INIT chunk before processing it. */ + err_chunk = NULL; + if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1686-rdma-vmw-pvrdma-fix-double-free-on-pvrdma-alloc-ucontext-err.patch b/SOURCES/1686-rdma-vmw-pvrdma-fix-double-free-on-pvrdma-alloc-ucontext-err.patch new file mode 100644 index 000000000..3e1b647df --- /dev/null +++ b/SOURCES/1686-rdma-vmw-pvrdma-fix-double-free-on-pvrdma-alloc-ucontext-err.patch @@ -0,0 +1,34 @@ +From 1df5711121cdc11e76b889408fdbe459feba1d39 Mon Sep 17 00:00:00 2001 +From: Jason Gunthorpe +Date: Tue, 28 Apr 2026 13:17:43 -0300 +Subject: [PATCH] RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() + error path + +commit e38e86995df27f1f854063dab1f0c6a513db3faf upstream. + +Sashiko points out that pvrdma_uar_free() is already called within +pvrdma_dealloc_ucontext(), so calling it before triggers a double free. + +Cc: stable@vger.kernel.org +Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") +Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 +Link: https://patch.msgid.link/r/10-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com +Signed-off-by: Jason Gunthorpe +Signed-off-by: Greg Kroah-Hartman + +diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c +index 19176583dbde..0d6d8902a6d9 100644 +--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c ++++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c +@@ -350,7 +350,7 @@ int pvrdma_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata) + uresp.qp_tab_size = vdev->dsr->caps.max_qp; + ret = ib_copy_to_udata(udata, &uresp, sizeof(uresp)); + if (ret) { +- pvrdma_uar_free(vdev, &context->uar); ++ /* pvrdma_dealloc_ucontext() also frees the UAR */ + pvrdma_dealloc_ucontext(&context->ibucontext); + return -EFAULT; + } +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1687-sched-fair-skip-sched-balance-running-cmpxchg-when-balance-i.patch b/SOURCES/1687-sched-fair-skip-sched-balance-running-cmpxchg-when-balance-i.patch new file mode 100644 index 000000000..92fd04cef --- /dev/null +++ b/SOURCES/1687-sched-fair-skip-sched-balance-running-cmpxchg-when-balance-i.patch @@ -0,0 +1,187 @@ +From 1ed0f38cc5a629d210c5bec0e275bd4df318d2f9 Mon Sep 17 00:00:00 2001 +From: "Herton R. Krzesinski" +Date: Mon, 9 Mar 2026 19:55:43 -0300 +Subject: [PATCH] sched/fair: Skip sched_balance_running cmpxchg when balance + is not due + +JIRA: https://issues.redhat.com/browse/RHEL-147187 + +commit 3324b2180c17b21c31c16966cc85ca41a7c93703 +Author: Tim Chen +Date: Mon Nov 10 10:47:35 2025 -0800 + + sched/fair: Skip sched_balance_running cmpxchg when balance is not due + + The NUMA sched domain sets the SD_SERIALIZE flag by default, allowing + only one NUMA load balancing operation to run system-wide at a time. + + Currently, each sched group leader directly under NUMA domain attempts + to acquire the global sched_balance_running flag via cmpxchg() before + checking whether load balancing is due or whether it is the designated + load balancer for that NUMA domain. On systems with a large number + of cores, this causes significant cache contention on the shared + sched_balance_running flag. + + This patch reduces unnecessary cmpxchg() operations by first checking + that the balancer is the designated leader for a NUMA domain from + should_we_balance(), and the balance interval has expired before + trying to acquire sched_balance_running to load balance a NUMA + domain. + + On a 2-socket Granite Rapids system with sub-NUMA clustering enabled, + running an OLTP workload, 7.8% of total CPU cycles were previously spent + in sched_balance_domain() contending on sched_balance_running before + this change. + + : 104 static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new) + : 105 { + : 106 return arch_cmpxchg(&v->counter, old, new); + 0.00 : ffffffff81326e6c: xor %eax,%eax + 0.00 : ffffffff81326e6e: mov $0x1,%ecx + 0.00 : ffffffff81326e73: lock cmpxchg %ecx,0x2394195(%rip) # ffffffff836bb010 + : 110 sched_balance_domains(): + : 12234 if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1)) + 99.39 : ffffffff81326e7b: test %eax,%eax + 0.00 : ffffffff81326e7d: jne ffffffff81326e99 + : 12238 if (time_after_eq(jiffies, sd->last_balance + interval)) { + 0.00 : ffffffff81326e7f: mov 0x14e2b3a(%rip),%rax # ffffffff828099c0 + 0.00 : ffffffff81326e86: sub 0x48(%r14),%rax + 0.00 : ffffffff81326e8a: cmp %rdx,%rax + + After applying this fix, sched_balance_domain() is gone from the profile + and there is a 5% throughput improvement. + + [peterz: made it so that redo retains the 'lock' and split out the + CPU_NEWLY_IDLE change to a separate patch] + Signed-off-by: Tim Chen + Signed-off-by: Peter Zijlstra (Intel) + Reviewed-by: Chen Yu + Reviewed-by: Vincent Guittot + Reviewed-by: Shrikanth Hegde + Reviewed-by: K Prateek Nayak + Reviewed-by: Srikar Dronamraju + Tested-by: Mohini Narkhede + Tested-by: Shrikanth Hegde + Link: https://patch.msgid.link/6fed119b723c71552943bfe5798c93851b30a361.1762800251.git.tim.c.chen@linux.intel.com + +(cherry picked from commit 3324b2180c17b21c31c16966cc85ca41a7c93703) +Assisted-by: Patchpal +Signed-off-by: Herton R. Krzesinski + +diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c +index 9c70c9e9356f..091cfc865815 100644 +--- a/kernel/sched/fair.c ++++ b/kernel/sched/fair.c +@@ -11248,6 +11248,21 @@ static void update_lb_imbalance_stat(struct lb_env *env, struct sched_domain *sd + } + } + ++/* ++ * This flag serializes load-balancing passes over large domains ++ * (above the NODE topology level) - only one load-balancing instance ++ * may run at a time, to reduce overhead on very large systems with ++ * lots of CPUs and large NUMA distances. ++ * ++ * - Note that load-balancing passes triggered while another one ++ * is executing are skipped and not re-tried. ++ * ++ * - Also note that this does not serialize rebalance_domains() ++ * execution, as non-SD_SERIALIZE domains will still be ++ * load-balanced in parallel. ++ */ ++static atomic_t sched_balance_running = ATOMIC_INIT(0); ++ + /* + * Check this_cpu to ensure it is balanced within domain. Attempt to move + * tasks if there is an imbalance. +@@ -11273,6 +11288,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq, + .fbq_type = all, + .tasks = LIST_HEAD_INIT(env.tasks), + }; ++ bool need_unlock = false; + + cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask); + +@@ -11284,6 +11300,14 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq, + goto out_balanced; + } + ++ if (!need_unlock && (sd->flags & SD_SERIALIZE) && idle != CPU_NEWLY_IDLE) { ++ int zero = 0; ++ if (!atomic_try_cmpxchg_acquire(&sched_balance_running, &zero, 1)) ++ goto out_balanced; ++ ++ need_unlock = true; ++ } ++ + group = sched_balance_find_src_group(&env); + if (!group) { + schedstat_inc(sd->lb_nobusyg[idle]); +@@ -11524,6 +11548,9 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq, + sd->balance_interval < sd->max_interval) + sd->balance_interval *= 2; + out: ++ if (need_unlock) ++ atomic_set_release(&sched_balance_running, 0); ++ + return ld_moved; + } + +@@ -11648,21 +11675,6 @@ static int active_load_balance_cpu_stop(void *data) + return 0; + } + +-/* +- * This flag serializes load-balancing passes over large domains +- * (above the NODE topology level) - only one load-balancing instance +- * may run at a time, to reduce overhead on very large systems with +- * lots of CPUs and large NUMA distances. +- * +- * - Note that load-balancing passes triggered while another one +- * is executing are skipped and not re-tried. +- * +- * - Also note that this does not serialize rebalance_domains() +- * execution, as non-SD_SERIALIZE domains will still be +- * load-balanced in parallel. +- */ +-static atomic_t sched_balance_running = ATOMIC_INIT(0); +- + /* + * Scale the max sched_balance_rq interval with the number of CPUs in the system. + * This trades load-balance latency on larger machines for less cross talk. +@@ -11718,7 +11730,7 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle) + /* Earliest time when we have to do rebalance again */ + unsigned long next_balance = jiffies + 60*HZ; + int update_next_balance = 0; +- int need_serialize, need_decay = 0; ++ int need_decay = 0; + u64 max_cost = 0; + + rcu_read_lock(); +@@ -11742,13 +11754,6 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle) + } + + interval = get_sd_balance_interval(sd, busy); +- +- need_serialize = sd->flags & SD_SERIALIZE; +- if (need_serialize) { +- if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1)) +- goto out; +- } +- + if (time_after_eq(jiffies, sd->last_balance + interval)) { + if (sched_balance_rq(cpu, rq, sd, idle, &continue_balancing)) { + /* +@@ -11762,9 +11767,6 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle) + sd->last_balance = jiffies; + interval = get_sd_balance_interval(sd, busy); + } +- if (need_serialize) +- atomic_set_release(&sched_balance_running, 0); +-out: + if (time_after(next_balance, sd->last_balance + interval)) { + next_balance = sd->last_balance + interval; + update_next_balance = 1; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1688-sched-fair-have-sd-serialize-affect-newidle-balancing.patch b/SOURCES/1688-sched-fair-have-sd-serialize-affect-newidle-balancing.patch new file mode 100644 index 000000000..58d432efa --- /dev/null +++ b/SOURCES/1688-sched-fair-have-sd-serialize-affect-newidle-balancing.patch @@ -0,0 +1,50 @@ +From 2c7cbbeb752578e4f484727baed6ddb1434b6eaa Mon Sep 17 00:00:00 2001 +From: "Herton R. Krzesinski" +Date: Tue, 10 Mar 2026 11:21:37 -0300 +Subject: [PATCH] sched/fair: Have SD_SERIALIZE affect newidle balancing + +JIRA: https://issues.redhat.com/browse/RHEL-147187 + +commit 522fb20fbdbe48ed98f587d628637ff38ececd2d +Author: Peter Zijlstra +Date: Mon Nov 17 17:13:09 2025 +0100 + + sched/fair: Have SD_SERIALIZE affect newidle balancing + + Also serialize the possiblty much more frequent newidle balancing for + the 'expensive' domains that have SD_BALANCE set. + + Initial benchmarking by K Prateek and Tim showed no negative effect. + + Split out from the larger patch moving sched_balance_running around + for ease of bisect and such. + + Suggested-by: Shrikanth Hegde + Seconded-by: K Prateek Nayak + Signed-off-by: Peter Zijlstra (Intel) + Link: https://lkml.kernel.org/r/df068896-82f9-458d-8fff-5a2f654e8ffd@amd.com + Link: https://patch.msgid.link/6fed119b723c71552943bfe5798c93851b30a361.1762800251.git.tim.c.chen@linux.intel.com + + # Conflicts: + # kernel/sched/fair.c + +(cherry picked from commit 522fb20fbdbe48ed98f587d628637ff38ececd2d) +Assisted-by: Patchpal +Signed-off-by: Herton R. Krzesinski + +diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c +index 091cfc865815..68e3e4283da9 100644 +--- a/kernel/sched/fair.c ++++ b/kernel/sched/fair.c +@@ -11300,7 +11300,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq, + goto out_balanced; + } + +- if (!need_unlock && (sd->flags & SD_SERIALIZE) && idle != CPU_NEWLY_IDLE) { ++ if (!need_unlock && (sd->flags & SD_SERIALIZE)) { + int zero = 0; + if (!atomic_try_cmpxchg_acquire(&sched_balance_running, &zero, 1)) + goto out_balanced; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1689-powerpc-64-force-inlining-of-prevent-user-access-and-set-kua.patch b/SOURCES/1689-powerpc-64-force-inlining-of-prevent-user-access-and-set-kua.patch new file mode 100644 index 000000000..b9ecfeec2 --- /dev/null +++ b/SOURCES/1689-powerpc-64-force-inlining-of-prevent-user-access-and-set-kua.patch @@ -0,0 +1,97 @@ +From 0f12186498e70a3446eb2850847142827bfc7f13 Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Fri, 8 May 2026 17:10:08 -0400 +Subject: [PATCH] powerpc/64: Force inlining of prevent_user_access() and + set_kuap() + +JIRA: https://redhat.atlassian.net/browse/RHEL-166727 + +commit 792993919349fefba20f58ae4843c80e8b01f518 +Author: Christophe Leroy +Date: Fri, 11 Feb 2022 15:16:51 +0100 + + powerpc/64: Force inlining of prevent_user_access() and set_kuap() + + A ppc64_defconfig build exhibits about 10 copied of + prevent_user_access(). It also have one copy of set_kuap(). + + c000000000017340 <.prevent_user_access.constprop.0>: + c00000000001a038: 4b ff d3 09 bl c000000000017340 <.prevent_user_access.constprop.0> + c00000000001aabc: 4b ff c8 85 bl c000000000017340 <.prevent_user_access.constprop.0> + c00000000001ab38: 4b ff c8 09 bl c000000000017340 <.prevent_user_access.constprop.0> + c00000000001ade0: 4b ff c5 61 bl c000000000017340 <.prevent_user_access.constprop.0> + c000000000039b90 <.prevent_user_access.constprop.0>: + c00000000003ac08: 4b ff ef 89 bl c000000000039b90 <.prevent_user_access.constprop.0> + c00000000003b9d0: 4b ff e1 c1 bl c000000000039b90 <.prevent_user_access.constprop.0> + c00000000003ba54: 4b ff e1 3d bl c000000000039b90 <.prevent_user_access.constprop.0> + c00000000003bbfc: 4b ff df 95 bl c000000000039b90 <.prevent_user_access.constprop.0> + c00000000015dde0 <.prevent_user_access.constprop.0>: + c0000000001612c0: 4b ff cb 21 bl c00000000015dde0 <.prevent_user_access.constprop.0> + c000000000161b54: 4b ff c2 8d bl c00000000015dde0 <.prevent_user_access.constprop.0> + c000000000188cf0 <.prevent_user_access.constprop.0>: + c00000000018d658: 4b ff b6 99 bl c000000000188cf0 <.prevent_user_access.constprop.0> + c00000000030fe20 <.prevent_user_access.constprop.0>: + c0000000003123d4: 4b ff da 4d bl c00000000030fe20 <.prevent_user_access.constprop.0> + c000000000313970: 4b ff c4 b1 bl c00000000030fe20 <.prevent_user_access.constprop.0> + c0000000005e6bd0 <.prevent_user_access.constprop.0>: + c0000000005e7d8c: 4b ff ee 45 bl c0000000005e6bd0 <.prevent_user_access.constprop.0> + c0000000007bcae0 <.prevent_user_access.constprop.0>: + c0000000007bda10: 4b ff f0 d1 bl c0000000007bcae0 <.prevent_user_access.constprop.0> + c0000000007bda54: 4b ff f0 8d bl c0000000007bcae0 <.prevent_user_access.constprop.0> + c0000000007bdd28: 4b ff ed b9 bl c0000000007bcae0 <.prevent_user_access.constprop.0> + c0000000007c0390: 4b ff c7 51 bl c0000000007bcae0 <.prevent_user_access.constprop.0> + c00000000094e4f0 <.prevent_user_access.constprop.0>: + c000000000950e40: 4b ff d6 b1 bl c00000000094e4f0 <.prevent_user_access.constprop.0> + c00000000097d2d0 <.prevent_user_access.constprop.0>: + c0000000009813fc: 4b ff be d5 bl c00000000097d2d0 <.prevent_user_access.constprop.0> + c000000000acd540 <.prevent_user_access.constprop.0>: + c000000000ad1d60: 4b ff b7 e1 bl c000000000acd540 <.prevent_user_access.constprop.0> + c000000000e5d680 <.prevent_user_access.constprop.0>: + c000000000e64b60: 4b ff 8b 21 bl c000000000e5d680 <.prevent_user_access.constprop.0> + c000000000e64b6c: 4b ff 8b 15 bl c000000000e5d680 <.prevent_user_access.constprop.0> + c000000000e64c38: 4b ff 8a 49 bl c000000000e5d680 <.prevent_user_access.constprop.0> + + When building signal_64.c with -Winline the following messages appear: + + ./arch/powerpc/include/asm/book3s/64/kup.h:331:20: error: inlining failed in call to 'set_kuap': call is unlikely and code size would grow [-Werror=inline] + ./arch/powerpc/include/asm/book3s/64/kup.h:401:20: error: inlining failed in call to 'prevent_user_access.constprop': call is unlikely and code size would grow [-Werror=inline] + + Those functions are used on hot pathes and have been + expected to be inlined at all time. + + Force them inline. + + This patch reduces the kernel text size by 700 bytes, confirming + that not inlining those functions is not worth it. + + Signed-off-by: Christophe Leroy + Signed-off-by: Michael Ellerman + Link: https://lore.kernel.org/r/eff9b2b211957fa2e8707e46f31674097fd563a3.1644588972.git.christophe.leroy@csgroup.eu + +Signed-off-by: Waiman Long + +diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h +index 23545732ddeb..c82323b864e1 100644 +--- a/arch/powerpc/include/asm/book3s/64/kup.h ++++ b/arch/powerpc/include/asm/book3s/64/kup.h +@@ -328,7 +328,7 @@ static inline unsigned long get_kuap(void) + return mfspr(SPRN_AMR); + } + +-static inline void set_kuap(unsigned long value) ++static __always_inline void set_kuap(unsigned long value) + { + if (!mmu_has_feature(MMU_FTR_BOOK3S_KUAP)) + return; +@@ -398,7 +398,7 @@ static __always_inline void allow_user_access(void __user *to, const void __user + + #endif /* !CONFIG_PPC_KUAP */ + +-static inline void prevent_user_access(unsigned long dir) ++static __always_inline void prevent_user_access(unsigned long dir) + { + set_kuap(AMR_KUAP_BLOCKED); + if (static_branch_unlikely(&uaccess_flush_key)) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1690-compiler-gcc-h-remove-ancient-workaround-for-gcc-pr-58670.patch b/SOURCES/1690-compiler-gcc-h-remove-ancient-workaround-for-gcc-pr-58670.patch new file mode 100644 index 000000000..74e48169f --- /dev/null +++ b/SOURCES/1690-compiler-gcc-h-remove-ancient-workaround-for-gcc-pr-58670.patch @@ -0,0 +1,81 @@ +From add20169be1f92eec68ed7588e96da06720c3a60 Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Fri, 8 May 2026 17:57:10 -0400 +Subject: [PATCH] compiler-gcc.h: remove ancient workaround for gcc PR 58670 + +JIRA: https://redhat.atlassian.net/browse/RHEL-166727 +Conflicts: A context diff due to missing upstream commit 173a3efd3edb + ("bug.h: work around GCC PR82365 in BUG()"). + +commit 43c249ea0b1e10baac4a1264a25d69723ce5d2c2 +Author: Uros Bizjak +Date: Fri, 24 Jun 2022 16:14:12 +0200 + + compiler-gcc.h: remove ancient workaround for gcc PR 58670 + + The workaround for 'asm goto' miscompilation introduces a compiler barrier + quirk that inhibits many useful compiler optimizations. For example, + __try_cmpxchg_user compiles to: + + 11375: 41 8b 4d 00 mov 0x0(%r13),%ecx + 11379: 41 8b 02 mov (%r10),%eax + 1137c: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) + 11380: 0f 94 c2 sete %dl + 11383: 84 d2 test %dl,%dl + 11385: 75 c4 jne 1134b <...> + 11387: 41 89 02 mov %eax,(%r10) + + where the barrier inhibits flags propagation from asm when compiled with + gcc-12. + + When the mentioned quirk is removed, the following code is generated: + + 11553: 41 8b 4d 00 mov 0x0(%r13),%ecx + 11557: 41 8b 02 mov (%r10),%eax + 1155a: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) + 1155e: 74 c9 je 11529 <...> + 11560: 41 89 02 mov %eax,(%r10) + + The refered compiler bug: + + http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 + + was fixed for gcc-4.8.2. + + Current minimum required version of GCC is version 5.1 which has the above + 'asm goto' miscompilation fixed, so remove the workaround. + + Link: https://lkml.kernel.org/r/20220624141412.72274-1-ubizjak@gmail.com + Signed-off-by: Uros Bizjak + Cc: Ingo Molnar + Cc: "H. Peter Anvin" + Cc: Thomas Gleixner + Signed-off-by: Andrew Morton + +Signed-off-by: Waiman Long + +diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h +index 589c5f4a3bfd..6fc88f0a056c 100644 +--- a/include/linux/compiler-gcc.h ++++ b/include/linux/compiler-gcc.h +@@ -59,17 +59,6 @@ + */ + #define barrier_before_unreachable() asm volatile("") + +-/* +- * GCC 'asm goto' miscompiles certain code sequences: +- * +- * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 +- * +- * Work it around via a compiler barrier quirk suggested by Jakub Jelinek. +- * +- * (asm goto is automatically volatile - the naming reflects this.) +- */ +-#define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) +- + #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) + #define __HAVE_BUILTIN_BSWAP32__ + #define __HAVE_BUILTIN_BSWAP64__ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1691-work-around-gcc-bugs-with-asm-goto-with-outputs.patch b/SOURCES/1691-work-around-gcc-bugs-with-asm-goto-with-outputs.patch new file mode 100644 index 000000000..3e56b0ae3 --- /dev/null +++ b/SOURCES/1691-work-around-gcc-bugs-with-asm-goto-with-outputs.patch @@ -0,0 +1,666 @@ +From 157e5d51df4e87292d65d1efbcb1e48196a917a2 Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Fri, 8 May 2026 17:57:13 -0400 +Subject: [PATCH] work around gcc bugs with 'asm goto' with outputs + +JIRA: https://redhat.atlassian.net/browse/RHEL-166727 +Conflicts: + 1) Hunks with merge conflicts for files missing or in unsupported + arches are dropped. + 2) The arch/x86/include/asm/cpufeature.h hunk is dropped as the change + had already been applied as part of RHEL commit 824a2da84fc6 ("x86/asm: + Use %c/%n instead of %P operand modifier in asm templates"). + 3) Two hunks from arch/powerpc/include/asm/uaccess.h are dropped due + to missing upstream commit dc5dac748af9 ("powerpc/64: Add support + to build with prefixed instructions"). + 4) The arch/powerpc/kernel/irq_64.c hunk is applied to + arch/powerpc/kernel/irq.c. + 5) A context diff in the include/linux/compiler-gcc.h hunk due to + missing upstream commit 173a3efd3edb ("bug.h: work around GCC + PR82365 in BUG()"). + +commit 4356e9f841f7fbb945521cef3577ba394c65f3fc +Author: Linus Torvalds +Date: Fri, 9 Feb 2024 12:39:31 -0800 + + work around gcc bugs with 'asm goto' with outputs + + We've had issues with gcc and 'asm goto' before, and we created a + 'asm_volatile_goto()' macro for that in the past: see commits + 3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation + bug") and a9f180345f53 ("compiler/gcc4: Make quirk for + asm_volatile_goto() unconditional"). + + Then, much later, we ended up removing the workaround in commit + 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR + 58670") because we no longer supported building the kernel with the + affected gcc versions, but we left the macro uses around. + + Now, Sean Christopherson reports a new version of a very similar + problem, which is fixed by re-applying that ancient workaround. But the + problem in question is limited to only the 'asm goto with outputs' + cases, so instead of re-introducing the old workaround as-is, let's + rename and limit the workaround to just that much less common case. + + It looks like there are at least two separate issues that all hit in + this area: + + (a) some versions of gcc don't mark the asm goto as 'volatile' when it + has outputs: + + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 + + which is easy to work around by just adding the 'volatile' by hand. + + (b) Internal compiler errors: + + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 + + which are worked around by adding the extra empty 'asm' as a + barrier, as in the original workaround. + + but the problem Sean sees may be a third thing since it involves bad + code generation (not an ICE) even with the manually added 'volatile'. + + but the same old workaround works for this case, even if this feels a + bit like voodoo programming and may only be hiding the issue. + + Reported-and-tested-by: Sean Christopherson + Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ + Cc: Nick Desaulniers + Cc: Uros Bizjak + Cc: Jakub Jelinek + Cc: Andrew Pinski + Signed-off-by: Linus Torvalds + +Signed-off-by: Waiman Long + +diff --git a/arch/arc/include/asm/jump_label.h b/arch/arc/include/asm/jump_label.h +index 9d9618079739..a339223d9e05 100644 +--- a/arch/arc/include/asm/jump_label.h ++++ b/arch/arc/include/asm/jump_label.h +@@ -31,7 +31,7 @@ + static __always_inline bool arch_static_branch(struct static_key *key, + bool branch) + { +- asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" ++ asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" + "1: \n" + "nop \n" + ".pushsection __jump_table, \"aw\" \n" +@@ -47,7 +47,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, + static __always_inline bool arch_static_branch_jump(struct static_key *key, + bool branch) + { +- asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" ++ asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" + "1: \n" + "b %l[l_yes] \n" + ".pushsection __jump_table, \"aw\" \n" +diff --git a/arch/arm/include/asm/jump_label.h b/arch/arm/include/asm/jump_label.h +index e12d7d096fc0..e4eb54f6cd9f 100644 +--- a/arch/arm/include/asm/jump_label.h ++++ b/arch/arm/include/asm/jump_label.h +@@ -11,7 +11,7 @@ + + static __always_inline bool arch_static_branch(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + WASM(nop) "\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + ".word 1b, %l[l_yes], %c0\n\t" +@@ -25,7 +25,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran + + static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + WASM(b) " %l[l_yes]\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + ".word 1b, %l[l_yes], %c0\n\t" +diff --git a/arch/arm64/include/asm/alternative-macros.h b/arch/arm64/include/asm/alternative-macros.h +index 210bb43cff2c..d328f549b1a6 100644 +--- a/arch/arm64/include/asm/alternative-macros.h ++++ b/arch/arm64/include/asm/alternative-macros.h +@@ -229,7 +229,7 @@ alternative_has_cap_likely(const unsigned long cpucap) + if (!cpucap_is_possible(cpucap)) + return false; + +- asm_volatile_goto( ++ asm goto( + ALTERNATIVE_CB("b %l[l_no]", %[cpucap], alt_cb_patch_nops) + : + : [cpucap] "i" (cpucap) +@@ -247,7 +247,7 @@ alternative_has_cap_unlikely(const unsigned long cpucap) + if (!cpucap_is_possible(cpucap)) + return false; + +- asm_volatile_goto( ++ asm goto( + ALTERNATIVE("nop", "b %l[l_yes]", %[cpucap]) + : + : [cpucap] "i" (cpucap) +diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h +index 48ddc0f45d22..6aafbb789991 100644 +--- a/arch/arm64/include/asm/jump_label.h ++++ b/arch/arm64/include/asm/jump_label.h +@@ -18,7 +18,7 @@ + static __always_inline bool arch_static_branch(struct static_key * const key, + const bool branch) + { +- asm_volatile_goto( ++ asm goto( + "1: nop \n\t" + " .pushsection __jump_table, \"aw\" \n\t" + " .align 3 \n\t" +@@ -35,7 +35,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, + static __always_inline bool arch_static_branch_jump(struct static_key * const key, + const bool branch) + { +- asm_volatile_goto( ++ asm goto( + "1: b %l[l_yes] \n\t" + " .pushsection __jump_table, \"aw\" \n\t" + " .align 3 \n\t" +diff --git a/arch/mips/include/asm/jump_label.h b/arch/mips/include/asm/jump_label.h +index c5c6864e64bc..405c85173f2c 100644 +--- a/arch/mips/include/asm/jump_label.h ++++ b/arch/mips/include/asm/jump_label.h +@@ -36,7 +36,7 @@ + + static __always_inline bool arch_static_branch(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\t" B_INSN " 2f\n\t" ++ asm goto("1:\t" B_INSN " 2f\n\t" + "2:\t.insn\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + WORD_INSN " 1b, %l[l_yes], %0\n\t" +@@ -50,7 +50,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran + + static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\t" J_INSN " %l[l_yes]\n\t" ++ asm goto("1:\t" J_INSN " %l[l_yes]\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + WORD_INSN " 1b, %l[l_yes], %0\n\t" + ".popsection\n\t" +diff --git a/arch/parisc/include/asm/jump_label.h b/arch/parisc/include/asm/jump_label.h +index 7efb1aa2f7f8..9e06acd0e58b 100644 +--- a/arch/parisc/include/asm/jump_label.h ++++ b/arch/parisc/include/asm/jump_label.h +@@ -11,7 +11,7 @@ + + static __always_inline bool arch_static_branch(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + "nop\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + ".word 1b - ., %l[l_yes] - .\n\t" +@@ -26,7 +26,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran + + static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + "b,n %l[l_yes]\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + ".word 1b - ., %l[l_yes] - .\n\t" +diff --git a/arch/powerpc/include/asm/jump_label.h b/arch/powerpc/include/asm/jump_label.h +index 93ce3ec25387..2f2a86ed2280 100644 +--- a/arch/powerpc/include/asm/jump_label.h ++++ b/arch/powerpc/include/asm/jump_label.h +@@ -17,7 +17,7 @@ + + static __always_inline bool arch_static_branch(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + "nop # arch_static_branch\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + ".long 1b - ., %l[l_yes] - .\n\t" +@@ -32,7 +32,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran + + static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + "b %l[l_yes] # arch_static_branch_jump\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + ".long 1b - ., %l[l_yes] - .\n\t" +diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h +index fd515d2e7dbd..1f645841ff8e 100644 +--- a/arch/powerpc/include/asm/uaccess.h ++++ b/arch/powerpc/include/asm/uaccess.h +@@ -73,7 +73,7 @@ __pu_failed: \ + * are no aliasing issues. + */ + #define __put_user_asm_goto(x, addr, label, op) \ +- asm_volatile_goto( \ ++ asm goto( \ + "1: " op "%U1%X1 %0,%1 # put_user\n" \ + EX_TABLE(1b, %l2) \ + : \ +@@ -86,7 +86,7 @@ __pu_failed: \ + __put_user_asm_goto(x, ptr, label, "std") + #else /* __powerpc64__ */ + #define __put_user_asm2_goto(x, addr, label) \ +- asm_volatile_goto( \ ++ asm goto( \ + "1: stw%X1 %0, %1\n" \ + "2: stw%X1 %L0, %L1\n" \ + EX_TABLE(1b, %l2) \ +@@ -130,7 +130,7 @@ do { \ + #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT + + #define __get_user_asm_goto(x, addr, label, op) \ +- asm_volatile_goto( \ ++ asm_goto_output( \ + "1: "op"%U1%X1 %0, %1 # get_user\n" \ + EX_TABLE(1b, %l2) \ + : "=r" (x) \ +@@ -143,7 +143,7 @@ do { \ + __get_user_asm_goto(x, addr, label, "ld") + #else /* __powerpc64__ */ + #define __get_user_asm2_goto(x, addr, label) \ +- asm_volatile_goto( \ ++ asm_goto_output( \ + "1: lwz%X1 %0, %1\n" \ + "2: lwz%X1 %L0, %L1\n" \ + EX_TABLE(1b, %l2) \ +diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c +index 2a1d2f4adab1..218f95e56091 100644 +--- a/arch/powerpc/kernel/irq.c ++++ b/arch/powerpc/kernel/irq.c +@@ -242,7 +242,7 @@ notrace void arch_local_irq_restore(unsigned long mask) + * This allows interrupts to be unmasked without hard disabling, and + * also without new hard interrupts coming in ahead of pending ones. + */ +- asm_volatile_goto( ++ asm goto( + "1: \n" + " lbz 9,%0(13) \n" + " cmpwi 9,0 \n" +diff --git a/arch/s390/include/asm/jump_label.h b/arch/s390/include/asm/jump_label.h +index 895f774bbcc5..bf78cf381dfc 100644 +--- a/arch/s390/include/asm/jump_label.h ++++ b/arch/s390/include/asm/jump_label.h +@@ -25,7 +25,7 @@ + */ + static __always_inline bool arch_static_branch(struct static_key *key, bool branch) + { +- asm_volatile_goto("0: brcl 0,%l[label]\n" ++ asm goto("0: brcl 0,%l[label]\n" + ".pushsection __jump_table,\"aw\"\n" + ".balign 8\n" + ".long 0b-.,%l[label]-.\n" +@@ -39,7 +39,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran + + static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) + { +- asm_volatile_goto("0: brcl 15,%l[label]\n" ++ asm goto("0: brcl 15,%l[label]\n" + ".pushsection __jump_table,\"aw\"\n" + ".balign 8\n" + ".long 0b-.,%l[label]-.\n" +diff --git a/arch/sparc/include/asm/jump_label.h b/arch/sparc/include/asm/jump_label.h +index 94eb529dcb77..2718cbea826a 100644 +--- a/arch/sparc/include/asm/jump_label.h ++++ b/arch/sparc/include/asm/jump_label.h +@@ -10,7 +10,7 @@ + + static __always_inline bool arch_static_branch(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + "nop\n\t" + "nop\n\t" + ".pushsection __jump_table, \"aw\"\n\t" +@@ -26,7 +26,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran + + static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + "b %l[l_yes]\n\t" + "nop\n\t" + ".pushsection __jump_table, \"aw\"\n\t" +diff --git a/arch/um/include/asm/cpufeature.h b/arch/um/include/asm/cpufeature.h +index 4b6d1b526bc1..66fe06db872f 100644 +--- a/arch/um/include/asm/cpufeature.h ++++ b/arch/um/include/asm/cpufeature.h +@@ -75,7 +75,7 @@ extern void setup_clear_cpu_cap(unsigned int bit); + */ + static __always_inline bool _static_cpu_has(u16 bit) + { +- asm_volatile_goto("1: jmp 6f\n" ++ asm goto("1: jmp 6f\n" + "2:\n" + ".skip -(((5f-4f) - (2b-1b)) > 0) * " + "((5f-4f) - (2b-1b)),0x90\n" +diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h +index 071572e23d3a..cbbef32517f0 100644 +--- a/arch/x86/include/asm/jump_label.h ++++ b/arch/x86/include/asm/jump_label.h +@@ -24,7 +24,7 @@ + + static __always_inline bool arch_static_branch(struct static_key *key, bool branch) + { +- asm_volatile_goto("1:" ++ asm goto("1:" + "jmp %l[l_yes] # objtool NOPs this \n\t" + JUMP_TABLE_ENTRY + : : "i" (key), "i" (2 | branch) : : l_yes); +@@ -38,7 +38,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran + + static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) + { +- asm_volatile_goto("1:" ++ asm goto("1:" + ".byte " __stringify(BYTES_NOP5) "\n\t" + JUMP_TABLE_ENTRY + : : "i" (key), "i" (branch) : : l_yes); +@@ -52,7 +52,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co + + static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) + { +- asm_volatile_goto("1:" ++ asm goto("1:" + "jmp %l[l_yes]\n\t" + JUMP_TABLE_ENTRY + : : "i" (key), "i" (branch) : : l_yes); +diff --git a/arch/x86/include/asm/rmwcc.h b/arch/x86/include/asm/rmwcc.h +index 7fa611216417..1919ccf493cd 100644 +--- a/arch/x86/include/asm/rmwcc.h ++++ b/arch/x86/include/asm/rmwcc.h +@@ -18,7 +18,7 @@ + #define __GEN_RMWcc(fullop, _var, cc, clobbers, ...) \ + ({ \ + bool c = false; \ +- asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ ++ asm goto (fullop "; j" #cc " %l[cc_label]" \ + : : [var] "m" (_var), ## __VA_ARGS__ \ + : clobbers : cc_label); \ + if (0) { \ +diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h +index 9d45aff76187..1c5513b04f03 100644 +--- a/arch/x86/include/asm/special_insns.h ++++ b/arch/x86/include/asm/special_insns.h +@@ -205,7 +205,7 @@ static inline void clwb(volatile void *__p) + #ifdef CONFIG_X86_USER_SHADOW_STACK + static inline int write_user_shstk_64(u64 __user *addr, u64 val) + { +- asm_volatile_goto("1: wrussq %[val], (%[addr])\n" ++ asm goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); +diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h +index 34f500f9b014..3a7755c1a441 100644 +--- a/arch/x86/include/asm/uaccess.h ++++ b/arch/x86/include/asm/uaccess.h +@@ -133,7 +133,7 @@ extern int __get_user_bad(void); + + #ifdef CONFIG_X86_32 + #define __put_user_goto_u64(x, addr, label) \ +- asm_volatile_goto("\n" \ ++ asm goto("\n" \ + "1: movl %%eax,0(%1)\n" \ + "2: movl %%edx,4(%1)\n" \ + _ASM_EXTABLE_UA(1b, %l2) \ +@@ -295,7 +295,7 @@ do { \ + } while (0) + + #define __get_user_asm(x, addr, itype, ltype, label) \ +- asm_volatile_goto("\n" \ ++ asm_goto_output("\n" \ + "1: mov"itype" %[umem],%[output]\n" \ + _ASM_EXTABLE_UA(1b, %l2) \ + : [output] ltype(x) \ +@@ -375,7 +375,7 @@ do { \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ +- asm_volatile_goto("\n" \ ++ asm_goto_output("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ +@@ -394,7 +394,7 @@ do { \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ +- asm_volatile_goto("\n" \ ++ asm_goto_output("\n" \ + "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ +@@ -477,7 +477,7 @@ struct __large_struct { unsigned long buf[100]; }; + * aliasing issues. + */ + #define __put_user_goto(x, addr, itype, ltype, label) \ +- asm_volatile_goto("\n" \ ++ asm goto("\n" \ + "1: mov"itype" %0,%1\n" \ + _ASM_EXTABLE_UA(1b, %l2) \ + : : ltype(x), "m" (__m(addr)) \ +diff --git a/arch/x86/kvm/svm/svm_ops.h b/arch/x86/kvm/svm/svm_ops.h +index 36c8af87a707..4e725854c63a 100644 +--- a/arch/x86/kvm/svm/svm_ops.h ++++ b/arch/x86/kvm/svm/svm_ops.h +@@ -8,7 +8,7 @@ + + #define svm_asm(insn, clobber...) \ + do { \ +- asm_volatile_goto("1: " __stringify(insn) "\n\t" \ ++ asm goto("1: " __stringify(insn) "\n\t" \ + _ASM_EXTABLE(1b, %l[fault]) \ + ::: clobber : fault); \ + return; \ +@@ -18,7 +18,7 @@ fault: \ + + #define svm_asm1(insn, op1, clobber...) \ + do { \ +- asm_volatile_goto("1: " __stringify(insn) " %0\n\t" \ ++ asm goto("1: " __stringify(insn) " %0\n\t" \ + _ASM_EXTABLE(1b, %l[fault]) \ + :: op1 : clobber : fault); \ + return; \ +@@ -28,7 +28,7 @@ fault: \ + + #define svm_asm2(insn, op1, op2, clobber...) \ + do { \ +- asm_volatile_goto("1: " __stringify(insn) " %1, %0\n\t" \ ++ asm goto("1: " __stringify(insn) " %1, %0\n\t" \ + _ASM_EXTABLE(1b, %l[fault]) \ + :: op1, op2 : clobber : fault); \ + return; \ +diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c +index 09fc5c6d13d7..bc94324bf778 100644 +--- a/arch/x86/kvm/vmx/vmx.c ++++ b/arch/x86/kvm/vmx/vmx.c +@@ -745,7 +745,7 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx, + */ + static int kvm_cpu_vmxoff(void) + { +- asm_volatile_goto("1: vmxoff\n\t" ++ asm goto("1: vmxoff\n\t" + _ASM_EXTABLE(1b, %l[fault]) + ::: "cc", "memory" : fault); + +@@ -2807,7 +2807,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer) + + cr4_set_bits(X86_CR4_VMXE); + +- asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t" ++ asm goto("1: vmxon %[vmxon_pointer]\n\t" + _ASM_EXTABLE(1b, %l[fault]) + : : [vmxon_pointer] "m"(vmxon_pointer) + : : fault); +diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h +index f41ce3c24123..8060e5fc6dbd 100644 +--- a/arch/x86/kvm/vmx/vmx_ops.h ++++ b/arch/x86/kvm/vmx/vmx_ops.h +@@ -94,7 +94,7 @@ static __always_inline unsigned long __vmcs_readl(unsigned long field) + + #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT + +- asm_volatile_goto("1: vmread %[field], %[output]\n\t" ++ asm_goto_output("1: vmread %[field], %[output]\n\t" + "jna %l[do_fail]\n\t" + + _ASM_EXTABLE(1b, %l[do_exception]) +@@ -188,7 +188,7 @@ static __always_inline unsigned long vmcs_readl(unsigned long field) + + #define vmx_asm1(insn, op1, error_args...) \ + do { \ +- asm_volatile_goto("1: " __stringify(insn) " %0\n\t" \ ++ asm goto("1: " __stringify(insn) " %0\n\t" \ + ".byte 0x2e\n\t" /* branch not taken hint */ \ + "jna %l[error]\n\t" \ + _ASM_EXTABLE(1b, %l[fault]) \ +@@ -205,7 +205,7 @@ fault: \ + + #define vmx_asm2(insn, op1, op2, error_args...) \ + do { \ +- asm_volatile_goto("1: " __stringify(insn) " %1, %0\n\t" \ ++ asm goto("1: " __stringify(insn) " %1, %0\n\t" \ + ".byte 0x2e\n\t" /* branch not taken hint */ \ + "jna %l[error]\n\t" \ + _ASM_EXTABLE(1b, %l[fault]) \ +diff --git a/arch/xtensa/include/asm/jump_label.h b/arch/xtensa/include/asm/jump_label.h +index c812bf85021c..46c8596259d2 100644 +--- a/arch/xtensa/include/asm/jump_label.h ++++ b/arch/xtensa/include/asm/jump_label.h +@@ -13,7 +13,7 @@ + static __always_inline bool arch_static_branch(struct static_key *key, + bool branch) + { +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + "_nop\n\t" + ".pushsection __jump_table, \"aw\"\n\t" + ".word 1b, %l[l_yes], %c0\n\t" +@@ -38,7 +38,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, + * make it reachable and wrap both into a no-transform block + * to avoid any assembler interference with this. + */ +- asm_volatile_goto("1:\n\t" ++ asm goto("1:\n\t" + ".begin no-transform\n\t" + "_j %l[l_yes]\n\t" + "2:\n\t" +diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h +index 6fc88f0a056c..25d2b060cf50 100644 +--- a/include/linux/compiler-gcc.h ++++ b/include/linux/compiler-gcc.h +@@ -59,6 +59,25 @@ + */ + #define barrier_before_unreachable() asm volatile("") + ++/* ++ * GCC 'asm goto' with outputs miscompiles certain code sequences: ++ * ++ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 ++ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 ++ * ++ * Work it around via the same compiler barrier quirk that we used ++ * to use for the old 'asm goto' workaround. ++ * ++ * Also, always mark such 'asm goto' statements as volatile: all ++ * asm goto statements are supposed to be volatile as per the ++ * documentation, but some versions of gcc didn't actually do ++ * that for asms with outputs: ++ * ++ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 ++ */ ++#define asm_goto_output(x...) \ ++ do { asm volatile goto(x); asm (""); } while (0) ++ + #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) + #define __HAVE_BUILTIN_BSWAP32__ + #define __HAVE_BUILTIN_BSWAP64__ +diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h +index 575898ac4daa..f84ef9014c98 100644 +--- a/include/linux/compiler_types.h ++++ b/include/linux/compiler_types.h +@@ -373,8 +373,8 @@ __no_sanitize_memory + #define __member_size(p) __builtin_object_size(p, 1) + #endif + +-#ifndef asm_volatile_goto +-#define asm_volatile_goto(x...) asm goto(x) ++#ifndef asm_goto_output ++#define asm_goto_output(x...) asm goto(x) + #endif + + #ifdef CONFIG_CC_HAS_ASM_INLINE +diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c +index cf5683afaf83..56a4deb276fb 100644 +--- a/net/netfilter/nft_set_pipapo_avx2.c ++++ b/net/netfilter/nft_set_pipapo_avx2.c +@@ -57,7 +57,7 @@ + + /* Jump to label if @reg is zero */ + #define NFT_PIPAPO_AVX2_NOMATCH_GOTO(reg, label) \ +- asm_volatile_goto("vptest %%ymm" #reg ", %%ymm" #reg ";" \ ++ asm goto("vptest %%ymm" #reg ", %%ymm" #reg ";" \ + "je %l[" #label "]" : : : : label) + + /* Store 256 bits from YMM register into memory. Contrary to bucket load +diff --git a/samples/bpf/asm_goto_workaround.h b/samples/bpf/asm_goto_workaround.h +index 7048bb3594d6..634e81d83efd 100644 +--- a/samples/bpf/asm_goto_workaround.h ++++ b/samples/bpf/asm_goto_workaround.h +@@ -4,14 +4,14 @@ + #define __ASM_GOTO_WORKAROUND_H + + /* +- * This will bring in asm_volatile_goto and asm_inline macro definitions ++ * This will bring in asm_goto_output and asm_inline macro definitions + * if enabled by compiler and config options. + */ + #include + +-#ifdef asm_volatile_goto +-#undef asm_volatile_goto +-#define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto") ++#ifdef asm_goto_output ++#undef asm_goto_output ++#define asm_goto_output(x...) asm volatile("invalid use of asm_goto_output") + #endif + + /* +diff --git a/tools/arch/x86/include/asm/rmwcc.h b/tools/arch/x86/include/asm/rmwcc.h +index 11ff975242ca..e2ff22b379a4 100644 +--- a/tools/arch/x86/include/asm/rmwcc.h ++++ b/tools/arch/x86/include/asm/rmwcc.h +@@ -4,7 +4,7 @@ + + #define __GEN_RMWcc(fullop, var, cc, ...) \ + do { \ +- asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \ ++ asm goto (fullop "; j" cc " %l[cc_label]" \ + : : "m" (var), ## __VA_ARGS__ \ + : "memory" : cc_label); \ + return 0; \ +diff --git a/tools/include/linux/compiler_types.h b/tools/include/linux/compiler_types.h +index 1bdd834bdd57..d09f9dc172a4 100644 +--- a/tools/include/linux/compiler_types.h ++++ b/tools/include/linux/compiler_types.h +@@ -36,8 +36,8 @@ + #include + #endif + +-#ifndef asm_volatile_goto +-#define asm_volatile_goto(x...) asm goto(x) ++#ifndef asm_goto_output ++#define asm_goto_output(x...) asm goto(x) + #endif + + #endif /* __LINUX_COMPILER_TYPES_H */ +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1692-init-kconfig-fix-cc-has-asm-goto-tied-output-test-with-dash.patch b/SOURCES/1692-init-kconfig-fix-cc-has-asm-goto-tied-output-test-with-dash.patch new file mode 100644 index 000000000..f593068f2 --- /dev/null +++ b/SOURCES/1692-init-kconfig-fix-cc-has-asm-goto-tied-output-test-with-dash.patch @@ -0,0 +1,52 @@ +From 2c488a61ac36c5fdeac896b4dacfc1169d3a02f6 Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Fri, 8 May 2026 20:06:07 -0400 +Subject: [PATCH] init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +JIRA: https://redhat.atlassian.net/browse/RHEL-166727 + +commit 534bd70374d646f17e2cebe0e6e4cdd478ce4f0c +Author: Alexandre Belloni +Date: Tue, 15 Nov 2022 12:01:58 +0100 + + init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash + + When using dash as /bin/sh, the CC_HAS_ASM_GOTO_TIED_OUTPUT test fails + with a syntax error which is not the one we are looking for: + + : In function ‘foo’: + :1:29: warning: missing terminating " character + :1:29: error: missing terminating " character + :2:5: error: expected ‘:’ before ‘+’ token + :2:7: warning: missing terminating " character + :2:7: error: missing terminating " character + :2:5: error: expected declaration or statement at end of input + + Removing '\n' solves this. + + Fixes: 1aa0e8b144b6 ("Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug") + Signed-off-by: Alexandre Belloni + Reviewed-by: Sean Christopherson + Signed-off-by: Masahiro Yamada + +Signed-off-by: Waiman Long + +diff --git a/init/Kconfig b/init/Kconfig +index c2b4633fb6ad..f7f1675bb4db 100644 +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -76,7 +76,7 @@ config CC_HAS_ASM_GOTO_OUTPUT + config CC_HAS_ASM_GOTO_TIED_OUTPUT + depends on CC_HAS_ASM_GOTO_OUTPUT + # Detect buggy gcc and clang, fixed in gcc-11 clang-14. +- def_bool $(success,echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | $CC -x c - -c -o /dev/null) ++ def_bool $(success,echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .": "+m"(*x) ::: bar); return *x; bar: return 0; }' | $CC -x c - -c -o /dev/null) + + config TOOLS_SUPPORT_RELR + def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1693-update-workarounds-for-gcc-asm-goto-issue.patch b/SOURCES/1693-update-workarounds-for-gcc-asm-goto-issue.patch new file mode 100644 index 000000000..9f9ddc025 --- /dev/null +++ b/SOURCES/1693-update-workarounds-for-gcc-asm-goto-issue.patch @@ -0,0 +1,127 @@ +From 0e2c7308f190cc8605b796f87e4cbdd3a2a57004 Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Fri, 8 May 2026 20:06:35 -0400 +Subject: [PATCH] update workarounds for gcc "asm goto" issue + +JIRA: https://redhat.atlassian.net/browse/RHEL-166727 + +commit 68fb3ca0e408e00db1c3f8fccdfa19e274c033be +Author: Linus Torvalds +Date: Thu, 15 Feb 2024 11:14:33 -0800 + + update workarounds for gcc "asm goto" issue + + In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with + outputs") I did the gcc workaround unconditionally, because the cause of + the bad code generation wasn't entirely clear. + + In the meantime, Jakub Jelinek debugged the issue, and has come up with + a fix in gcc [2], which also got backported to the still maintained + branches of gcc-11, gcc-12 and gcc-13. + + Note that while the fix technically wasn't in the original gcc-14 + branch, Jakub says: + + "while it is true that no GCC 14 snapshots until today (or whenever the + fix will be committed) have the fix, for GCC trunk it is up to the + distros to use the latest snapshot if they use it at all and would + allow better testing of the kernel code without the workaround, so + that if there are other issues they won't be discovered years later. + Most userland code doesn't actually use asm goto with outputs..." + + so we will consider gcc-14 to be fixed - if somebody is using gcc + snapshots of the gcc-14 before the fix, they should upgrade. + + Note that while the bug goes back to gcc-11, in practice other gcc + changes seem to have effectively hidden it since gcc-12.1 as per a + bisect by Jakub. So even a gcc-14 snapshot without the fix likely + doesn't show actual problems. + + Also, make the default 'asm_goto_output()' macro mark the asm as + volatile by hand, because of an unrelated gcc issue [1] where it doesn't + match the documented behavior ("asm goto is always volatile"). + + Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1] + Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2] + Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ + Requested-by: Jakub Jelinek + Cc: Uros Bizjak + Cc: Nick Desaulniers + Cc: Sean Christopherson + Cc: Andrew Pinski + Signed-off-by: Linus Torvalds + +Signed-off-by: Waiman Long + +diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h +index 25d2b060cf50..77f1bedf0290 100644 +--- a/include/linux/compiler-gcc.h ++++ b/include/linux/compiler-gcc.h +@@ -62,10 +62,9 @@ + /* + * GCC 'asm goto' with outputs miscompiles certain code sequences: + * +- * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 +- * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 ++ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 + * +- * Work it around via the same compiler barrier quirk that we used ++ * Work around it via the same compiler barrier quirk that we used + * to use for the old 'asm goto' workaround. + * + * Also, always mark such 'asm goto' statements as volatile: all +@@ -75,8 +74,10 @@ + * + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 + */ ++#ifdef CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND + #define asm_goto_output(x...) \ + do { asm volatile goto(x); asm (""); } while (0) ++#endif + + #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) + #define __HAVE_BUILTIN_BSWAP32__ +diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h +index f84ef9014c98..1f405d334c61 100644 +--- a/include/linux/compiler_types.h ++++ b/include/linux/compiler_types.h +@@ -373,8 +373,15 @@ __no_sanitize_memory + #define __member_size(p) __builtin_object_size(p, 1) + #endif + ++/* ++ * Some versions of gcc do not mark 'asm goto' volatile: ++ * ++ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 ++ * ++ * We do it here by hand, because it doesn't hurt. ++ */ + #ifndef asm_goto_output +-#define asm_goto_output(x...) asm goto(x) ++#define asm_goto_output(x...) asm volatile goto(x) + #endif + + #ifdef CONFIG_CC_HAS_ASM_INLINE +diff --git a/init/Kconfig b/init/Kconfig +index f7f1675bb4db..bbfb726a6522 100644 +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -78,6 +78,15 @@ config CC_HAS_ASM_GOTO_TIED_OUTPUT + # Detect buggy gcc and clang, fixed in gcc-11 clang-14. + def_bool $(success,echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .": "+m"(*x) ::: bar); return *x; bar: return 0; }' | $CC -x c - -c -o /dev/null) + ++config GCC_ASM_GOTO_OUTPUT_WORKAROUND ++ bool ++ depends on CC_IS_GCC && CC_HAS_ASM_GOTO_OUTPUT ++ # Fixed in GCC 14, 13.3, 12.4 and 11.5 ++ # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 ++ default y if GCC_VERSION < 110500 ++ default y if GCC_VERSION >= 120000 && GCC_VERSION < 120400 ++ default y if GCC_VERSION >= 130000 && GCC_VERSION < 130300 ++ + config TOOLS_SUPPORT_RELR + def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh) + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1694-init-kconfig-remove-config-gcc-asm-goto-output-workaround.patch b/SOURCES/1694-init-kconfig-remove-config-gcc-asm-goto-output-workaround.patch new file mode 100644 index 000000000..6df1a30e8 --- /dev/null +++ b/SOURCES/1694-init-kconfig-remove-config-gcc-asm-goto-output-workaround.patch @@ -0,0 +1,257 @@ +From 37170aeb1d2b90bd97b9a1a213e37a7da303ad18 Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Fri, 8 May 2026 20:07:10 -0400 +Subject: [PATCH] init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND + +JIRA: https://redhat.atlassian.net/browse/RHEL-166727 +Conflicts: + A context diff in the include/linux/compiler-gcc.h hunk due to + missing upstream commit 173a3efd3edb ("bug.h: work around GCC PR82365 + in BUG()"). + +commit f2f6a8e8871725035959b90bac048cde555aa0e9 +Author: Mark Rutland +Date: Thu, 18 Jul 2024 13:06:47 +0100 + + init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND + + Several versions of GCC mis-compile asm goto with outputs. We try to + workaround this, but our workaround is demonstrably incomplete and + liable to result in subtle bugs, especially on arm64 where get_user() + has recently been moved over to using asm goto with outputs. + + From discussion(s) with Linus at: + + https://lore.kernel.org/linux-arm-kernel/Zpfv2tnlQ-gOLGac@J2N7QTR9R3.cambridge.arm.com/ + https://lore.kernel.org/linux-arm-kernel/ZpfxLrJAOF2YNqCk@J2N7QTR9R3.cambridge.arm.com/ + + ... it sounds like the best thing to do for now is to remove the + workaround and make CC_HAS_ASM_GOTO_OUTPUT depend on working compiler + versions. + + The issue was originally reported to GCC by Sean Christopherson: + + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 + + ... and Jakub Jelinek fixed this for GCC 14, with the fix backported to + 13.3.0, 12.4.0, and 11.5.0. + + In the kernel, we tried to workaround broken compilers in commits: + + 4356e9f841f7 ("work around gcc bugs with 'asm goto' with outputs") + 68fb3ca0e408 ("update workarounds for gcc "asm goto" issue") + + ... but the workaround of adding an empty asm("") after the asm volatile + goto(...) demonstrably does not always avoid the problem, as can be seen + in the following test case: + + | #define asm_goto_output(x...) \ + | do { asm volatile goto(x); asm (""); } while (0) + | + | #define __good_or_bad(__val, __key) \ + | do { \ + | __label__ __failed; \ + | unsigned long __tmp; \ + | asm_goto_output( \ + | " cbnz %[key], %l[__failed]\n" \ + | " mov %[val], #0x900d\n" \ + | : [val] "=r" (__tmp) \ + | : [key] "r" (__key) \ + | : \ + | : __failed); \ + | (__val) = __tmp; \ + | break; \ + | __failed: \ + | (__val) = 0xbad; \ + | } while (0) + | + | unsigned long get_val(unsigned long key); + | unsigned long get_val(unsigned long key) + | { + | unsigned long val = 0xbad; + | + | __good_or_bad(val, key); + | + | return val; + | } + + GCC 13.2.0 (at -O2) compiles this to: + + | cbnz x0, .Lfailed + | mov x0, #0x900d + | .Lfailed: + | ret + + GCC 14.1.0 (at -O2) compiles this to: + + | cbnz x0, .Lfailed + | mov x0, #0x900d + | ret + | .Lfailed: + | mov x0, #0xbad + | ret + + Note that GCC 13.2.0 erroneously omits the assignment to 'val' in the + error path (even though this does not depend on an output of the asm + goto). GCC 14.1.0 correctly retains the assignment. + + This problem can be seen within the kernel with the following test case: + + | #include + | #include + | + | noinline unsigned long test_unsafe_get_user(unsigned long __user *ptr); + | noinline unsigned long test_unsafe_get_user(unsigned long __user *ptr) + | { + | unsigned long val; + | + | unsafe_get_user(val, ptr, Efault); + | return val; + | + | Efault: + | val = 0x900d; + | return val; + | } + + GCC 13.2.0 (arm64 defconfig) compiles this to: + + | and x0, x0, #0xff7fffffffffffff + | ldtr x0, [x0] + | .Lextable_fixup: + | ret + + GCC 13.2.0 (x86_64 defconfig + MITIGATION_RETPOLINE=n) compiles this to: + + | endbr64 + | mov (%rdi),%rax + | .Lextable_fixup: + | ret + + ... omitting the assignment to 'val' in the error path, and leaving + garbage in the result register returned by the function (which happens + to contain the faulting address in the generated code). + + GCC 14.1.0 (arm64 defconfig) compiles this to: + + | and x0, x0, #0xff7fffffffffffff + | ldtr x0, [x0] + | ret + | .Lextable_fixup: + | mov x0, #0x900d // #36877 + | ret + + GCC 14.1.0 (x86_64 defconfig + MITIGATION_RETPOLINE=n) compiles this to: + + | endbr64 + | mov (%rdi),%rax + | ret + | .Lextable_fixup: + | mov $0x900d,%eax + | ret + + ... retaining the expected assignment to 'val' in the error path. + + We don't have a complete and reasonable workaround. While placing empty + asm("") blocks after each goto label *might* be sufficient, we don't + know for certain, this is tedious and error-prone, and there doesn't + seem to be a neat way to wrap this up (which is especially painful for + cases with multiple goto labels). + + Avoid this issue by disabling CONFIG_CC_HAS_ASM_GOTO_OUTPUT for + known-broken compiler versions and removing the workaround (along with + the CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND config option). + + For the moment I've left the default implementation of asm_goto_output() + unchanged. This should now be redundant since any compiler with the fix + for the clobbering issue whould also have a fix for the (earlier) + volatile issue, but it's far less churny to leave it around, which makes + it easier to backport this patch if necessary. + + Signed-off-by: Mark Rutland + Cc: Alex Coplan + Cc: Catalin Marinas + Cc: Jakub Jelinek + Cc: Peter Zijlstra + Cc: Sean Christopherson + Cc: Szabolcs Nagy + Cc: Will Deacon + Cc: linux-arm-kernel@lists.infradead.org + Cc: linux-kernel@vger.kernel.org + Signed-off-by: Linus Torvalds + +Signed-off-by: Waiman Long + +diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h +index 77f1bedf0290..6fc88f0a056c 100644 +--- a/include/linux/compiler-gcc.h ++++ b/include/linux/compiler-gcc.h +@@ -59,26 +59,6 @@ + */ + #define barrier_before_unreachable() asm volatile("") + +-/* +- * GCC 'asm goto' with outputs miscompiles certain code sequences: +- * +- * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 +- * +- * Work around it via the same compiler barrier quirk that we used +- * to use for the old 'asm goto' workaround. +- * +- * Also, always mark such 'asm goto' statements as volatile: all +- * asm goto statements are supposed to be volatile as per the +- * documentation, but some versions of gcc didn't actually do +- * that for asms with outputs: +- * +- * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 +- */ +-#ifdef CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND +-#define asm_goto_output(x...) \ +- do { asm volatile goto(x); asm (""); } while (0) +-#endif +- + #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) + #define __HAVE_BUILTIN_BSWAP32__ + #define __HAVE_BUILTIN_BSWAP64__ +diff --git a/init/Kconfig b/init/Kconfig +index bbfb726a6522..df5a34169f22 100644 +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -70,23 +70,25 @@ config CC_CAN_LINK_STATIC + default $(success,$(srctree)/scripts/cc-can-link.sh $(CC) $(CLANG_FLAGS) $(USERCFLAGS) $(USERLDFLAGS) $(m64-flag) -static) if 64BIT + default $(success,$(srctree)/scripts/cc-can-link.sh $(CC) $(CLANG_FLAGS) $(USERCFLAGS) $(USERLDFLAGS) $(m32-flag) -static) + ++# Fixed in GCC 14, 13.3, 12.4 and 11.5 ++# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 ++config GCC_ASM_GOTO_OUTPUT_BROKEN ++ bool ++ depends on CC_IS_GCC ++ default y if GCC_VERSION < 110500 ++ default y if GCC_VERSION >= 120000 && GCC_VERSION < 120400 ++ default y if GCC_VERSION >= 130000 && GCC_VERSION < 130300 ++ + config CC_HAS_ASM_GOTO_OUTPUT +- def_bool $(success,echo 'int foo(int x) { asm goto ("": "=r"(x) ::: bar); return x; bar: return 0; }' | $(CC) -x c - -c -o /dev/null) ++ def_bool y ++ depends on !GCC_ASM_GOTO_OUTPUT_BROKEN ++ depends on $(success,echo 'int foo(int x) { asm goto ("": "=r"(x) ::: bar); return x; bar: return 0; }' | $(CC) -x c - -c -o /dev/null) + + config CC_HAS_ASM_GOTO_TIED_OUTPUT + depends on CC_HAS_ASM_GOTO_OUTPUT + # Detect buggy gcc and clang, fixed in gcc-11 clang-14. + def_bool $(success,echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .": "+m"(*x) ::: bar); return *x; bar: return 0; }' | $CC -x c - -c -o /dev/null) + +-config GCC_ASM_GOTO_OUTPUT_WORKAROUND +- bool +- depends on CC_IS_GCC && CC_HAS_ASM_GOTO_OUTPUT +- # Fixed in GCC 14, 13.3, 12.4 and 11.5 +- # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 +- default y if GCC_VERSION < 110500 +- default y if GCC_VERSION >= 120000 && GCC_VERSION < 120400 +- default y if GCC_VERSION >= 130000 && GCC_VERSION < 130300 +- + config TOOLS_SUPPORT_RELR + def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh) + +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1695-rdma-mlx5-fix-error-path-fall-through-in-mlx5-ib-dev-res-srq.patch b/SOURCES/1695-rdma-mlx5-fix-error-path-fall-through-in-mlx5-ib-dev-res-srq.patch new file mode 100644 index 000000000..8241b237b --- /dev/null +++ b/SOURCES/1695-rdma-mlx5-fix-error-path-fall-through-in-mlx5-ib-dev-res-srq.patch @@ -0,0 +1,53 @@ +From 7fe755dde8aa5a969850638df9f39e30c87631ac Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Thu, 28 May 2026 15:40:52 +0000 +Subject: [PATCH] RDMA/mlx5: Fix error path fall-through in + mlx5_ib_dev_res_srq_init() + +JIRA: https://redhat.atlassian.net/browse/RHEL-179997 +CVE: CVE-2026-46176 +Backported from tree(s): linux + +commit c488df06bd552bb8b6e14fa0cfd5ad986c6e9525 +Author: Junrui Luo +Date: Fri Apr 24 13:51:02 2026 +0800 + + RDMA/mlx5: Fix error path fall-through in mlx5_ib_dev_res_srq_init() + + mlx5_ib_dev_res_srq_init() allocates two SRQs, s0 and s1. When + ib_create_srq() fails for s1, the error branch destroys s0 but falls + through and unconditionally assigns the freed s0 and the ERR_PTR s1 to + devr->s0 and devr->s1. + + This leads to several problems: the lock-free fast path checks + "if (devr->s1) return 0;" and treats the ERR_PTR as already initialised; + users in mlx5_ib_create_qp() dereference the freed SRQ or ERR_PTR via + to_msrq(devr->s0)->msrq.srqn; and mlx5_ib_dev_res_cleanup() dereferences + the ERR_PTR and double-frees s0 on teardown. + + Fix by adding the same `goto unlock` in the s1 failure path. + + Cc: stable@vger.kernel.org + Fixes: 5895e70f2e6e ("IB/mlx5: Allocate resources just before first QP/SRQ is created") + Link: https://patch.msgid.link/r/SYBPR01MB7881E1E0970268BD69C0BA75AF2B2@SYBPR01MB7881.ausprd01.prod.outlook.com + Reported-by: Yuhao Jiang + Signed-off-by: Junrui Luo + Signed-off-by: Jason Gunthorpe + +Signed-off-by: CKI Backport Bot + +diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c +index dc2c5cc47860..36f06ea8a680 100644 +--- a/drivers/infiniband/hw/mlx5/main.c ++++ b/drivers/infiniband/hw/mlx5/main.c +@@ -3122,6 +3122,7 @@ int mlx5_ib_dev_res_srq_init(struct mlx5_ib_dev *dev) + "Couldn't create SRQ 1 for res init, err=%pe\n", + s1); + ib_destroy_srq(s0); ++ goto unlock; + } + + devr->s0 = s0; +-- +2.50.1 (Apple Git-155) + diff --git a/SPECS/kernel.spec b/SPECS/kernel.spec index 572657858..977df49d9 100644 --- a/SPECS/kernel.spec +++ b/SPECS/kernel.spec @@ -176,13 +176,13 @@ Summary: The Linux kernel # define buildid .local %define specversion 5.14.0 %define patchversion 5.14 -%define pkgrelease 687.17.1 +%define pkgrelease 687.19.1 %define kversion 5 %define tarfile_release 5.14.0-687.5.1.el9_8 # This is needed to do merge window version magic %define patchlevel 14 # This allows pkg_release to have configurable %%{?dist} tag -%define specrelease 687.17.1%{?buildid}%{?dist} +%define specrelease 687.19.1%{?buildid}%{?dist} # This defines the kabi tarball version %define kabiversion 5.14.0-687.5.1.el9_8 @@ -1190,6 +1190,389 @@ Patch1309: 1309-scsi-qla2xxx-completely-fix-fcport-double-free.patch Patch1310: 1310-rbd-eliminate-a-race-in-lock-dwork-draining-on-unmap.patch Patch1311: 1311-net-sched-fix-pedit-partial-cow-leading-to-page-cache-corrup.patch Patch1312: 1312-kvm-arm64-vgic-its-drop-translation-cache-ref-only-for-eras.patch +Patch1313: 1313-netfilter-flowtable-strictly-check-for-maximum-number-of-act.patch +Patch1314: 1314-drm-amd-display-do-not-skip-unrelated-mode-changes-in-dsc-va.patch +Patch1315: 1315-ipv6-icmp-clear-skb2-cb-in-ip6-err-gen-icmpv6-unreach.patch +Patch1316: 1316-alsa-aloop-fix-peer-runtime-uaf-during-format-change-stop.patch +Patch1317: 1317-rdma-iwcm-fix-workqueue-list-corruption-by-removing-work-lis.patch +Patch1318: 1318-binder-use-cred-instead-of-task-for-selinux-checks.patch +Patch1319: 1319-locks-fix-toctou-race-when-granting-write-lease.patch +Patch1320: 1320-fs-use-a-helper-for-opening-kernel-internal-files.patch +Patch1321: 1321-fs-move-kmem-cache-zalloc-into-alloc-empty-file-helpers.patch +Patch1322: 1322-fs-use-backing-file-container-for-internal-files-with-fake-f.patch +Patch1323: 1323-ovl-enable-fsnotify-events-on-underlying-real-files.patch +Patch1324: 1324-fs-move-cleanup-from-init-file-into-its-callers.patch +Patch1325: 1325-lsm-constify-the-file-parameter-in-security-binder-transfer-.patch +Patch1326: 1326-cachefiles-use-kiocb-start-end-write-helpers.patch +Patch1327: 1327-fs-fix-kernel-doc-warnings.patch +Patch1328: 1328-fs-rename-mnt-want-drop-write-helpers.patch +Patch1329: 1329-fs-get-mnt-writers-count-for-an-open-backing-file-s-real-pat.patch +Patch1330: 1330-fs-create-helper-file-user-path-for-user-displayed-mapped-fi.patch +Patch1331: 1331-fs-store-real-path-instead-of-fake-path-in-backing-file-f-pa.patch +Patch1332: 1332-fs-prepare-for-stackable-filesystems-backing-file-helpers.patch +Patch1333: 1333-fs-factor-out-backing-file-read-write-iter-helpers.patch +Patch1334: 1334-fs-factor-out-backing-file-splice-read-write-helpers.patch +Patch1335: 1335-fs-factor-out-backing-file-mmap-helper.patch +Patch1336: 1336-lsm-add-helper-for-blob-allocations.patch +Patch1337: 1337-ovl-fix-nested-backing-file-paths.patch +Patch1338: 1338-fs-constify-file-ptr-in-backing-file-accessor-helpers.patch +Patch1339: 1339-ovl-remove-unneeded-non-const-conversion.patch +Patch1340: 1340-ovl-remove-redundant-iocb-dio-caller-comp-clearing.patch +Patch1341: 1341-perf-core-fix-mmap-event-path-names-with-backing-files.patch +Patch1342: 1342-fs-prepare-for-adding-lsm-blob-to-backing-file.patch +Patch1343: 1343-lsm-add-backing-file-lsm-hooks.patch +Patch1344: 1344-selinux-fix-overlayfs-mmap-and-mprotect-access-checks.patch +Patch1345: 1345-selinux-rhel-only-hotfix-for-execmem-regression.patch +Patch1346: 1346-net-mlx5-hws-fix-matcher-action-template-attach.patch +Patch1347: 1347-net-mlx5-hws-remove-unused-element-array.patch +Patch1348: 1348-net-mlx5-hws-make-pool-single-resource.patch +Patch1349: 1349-net-mlx5-hws-refactor-pool-implementation.patch +Patch1350: 1350-net-mlx5-hws-cleanup-after-pool-refactoring.patch +Patch1351: 1351-net-mlx5-hws-add-fullness-tracking-to-pool.patch +Patch1352: 1352-net-mlx5-hws-fix-pool-size-optimization.patch +Patch1353: 1353-net-mlx5-hws-implement-action-ste-pool.patch +Patch1354: 1354-net-mlx5-hws-use-the-new-action-ste-pool.patch +Patch1355: 1355-net-mlx5-hws-cleanup-matcher-action-ste-table.patch +Patch1356: 1356-net-mlx5-hws-free-unused-action-ste-tables.patch +Patch1357: 1357-net-mlx5-hws-export-action-ste-tables-to-debugfs.patch +Patch1358: 1358-net-mlx5e-ethtool-fix-formatting-of-ptp-rq0-csum-complete-ta.patch +Patch1359: 1359-net-mlx5-fix-spelling-mistakes-in-mlx5-core-dbg-message-and-.patch +Patch1360: 1360-net-mlx5-hws-fix-ip-version-decision.patch +Patch1361: 1361-net-mlx5-hws-harden-ip-version-definer-checks.patch +Patch1362: 1362-net-mlx5-hws-disallow-matcher-ip-version-mixing.patch +Patch1363: 1363-rdma-mlx5-fix-error-flow-upon-firmware-failure-for-rq-destru.patch +Patch1364: 1364-net-mlx5-support-software-tx-timestamp.patch +Patch1365: 1365-net-mlx5-hws-expose-function-mlx5hws-table-ft-set-next-ft-in.patch +Patch1366: 1366-net-mlx5-hws-add-definer-function-to-get-field-name-str.patch +Patch1367: 1367-net-mlx5-hws-expose-polling-function-in-header-file.patch +Patch1368: 1368-net-mlx5-hws-introduce-isolated-matchers.patch +Patch1369: 1369-net-mlx5-hws-support-complex-matchers.patch +Patch1370: 1370-net-mlx5-hws-force-rehash-when-rule-insertion-failed.patch +Patch1371: 1371-net-mlx5-hws-fix-counting-of-rules-in-the-matcher.patch +Patch1372: 1372-net-mlx5-hws-fix-redundant-extension-of-action-templates.patch +Patch1373: 1373-net-mlx5-hws-rework-rehash-loop.patch +Patch1374: 1374-net-mlx5-hws-dump-bad-completion-details.patch +Patch1375: 1375-net-mlx5-use-to-delayed-work.patch +Patch1376: 1376-net-mlx5-sws-fix-reformat-id-error-handling.patch +Patch1377: 1377-net-mlx5-hws-register-reformat-actions-with-fw.patch +Patch1378: 1378-net-mlx5-hws-fix-typo-nope-to-nop.patch +Patch1379: 1379-net-mlx5-hws-handle-modify-header-actions-dependency.patch +Patch1380: 1380-net-mlx5-core-add-error-handling-inmlx5-query-nic-vport-qkey.patch +Patch1381: 1381-net-mlx5e-allow-setting-mac-address-of-representors.patch +Patch1382: 1382-net-mlx5-add-error-handling-in-mlx5-query-nic-vport-node-gui.patch +Patch1383: 1383-net-mlx5-hws-fix-an-error-code-in-mlx5hws-bwc-rule-create-co.patch +Patch1384: 1384-net-mlx5-ensure-fw-pages-are-always-allocated-on-same-numa.patch +Patch1385: 1385-net-mlx5-fix-return-value-when-searching-for-existing-flow-g.patch +Patch1386: 1386-net-mlx5-hws-init-mutex-on-the-correct-path.patch +Patch1387: 1387-net-mlx5-hws-fix-missing-ip-version-handling-in-definer.patch +Patch1388: 1388-net-mlx5-hws-make-sure-the-uplink-is-the-last-destination.patch +Patch1389: 1389-net-mlx5e-fix-leak-of-geneve-tlv-option-object.patch +Patch1390: 1390-net-mlx5-hws-add-error-checking-to-hws-bwc-rule-complex-hash.patch +Patch1391: 1391-net-mlx5e-fix-race-between-dim-disable-and-net-dim.patch +Patch1392: 1392-net-mlx5e-add-new-prio-for-promiscuous-mode.patch +Patch1393: 1393-net-mlx5-correctly-set-gso-size-when-lro-is-used.patch +Patch1394: 1394-net-mlx5-fix-memory-leak-in-cmd-exec.patch +Patch1395: 1395-net-mlx5-e-switch-fix-peer-miss-rules-to-use-peer-eswitch.patch +Patch1396: 1396-rdma-mlx5-convert-timeouts-to-secs-to-jiffies.patch +Patch1397: 1397-rdma-mlx5-remove-the-redundant-mlx5-ib-stage-uar-stage.patch +Patch1398: 1398-rdma-mlx5-add-support-for-200gbps-per-lane-speeds.patch +Patch1399: 1399-rdma-mlx5-avoid-flexible-array-warning.patch +Patch1400: 1400-rdma-mlx5-initialize-obj-event-obj-sub-list-before-xa-insert.patch +Patch1401: 1401-rdma-mlx5-fix-hw-counters-query-for-non-representor-devices.patch +Patch1402: 1402-rdma-mlx5-fix-cc-counters-query-for-mpv.patch +Patch1403: 1403-rdma-mlx5-fix-vport-loopback-for-mpv-device.patch +Patch1404: 1404-net-mlx5-expose-serial-numbers-in-devlink-info.patch +Patch1405: 1405-net-mlx5e-shampo-reorganize-mlx5-rq-shampo-alloc.patch +Patch1406: 1406-net-mlx5e-shampo-remove-redundant-params.patch +Patch1407: 1407-net-mlx5e-shampo-improve-hw-gro-capability-checking.patch +Patch1408: 1408-net-mlx5e-shampo-separate-pool-for-headers.patch +Patch1409: 1409-net-mlx5e-implement-queue-mgmt-ops-and-single-channel-swap.patch +Patch1410: 1410-net-mlx5e-support-ethtool-tcp-data-split-settings.patch +Patch1411: 1411-net-mlx5-fs-add-multiple-prios-to-rdma-transport-steering-do.patch +Patch1412: 1412-net-mlx5-small-refactor-for-general-object-capabilities.patch +Patch1413: 1413-net-mlx5-add-ifc-bits-for-pcie-congestion-event-object.patch +Patch1414: 1414-rdma-mlx5-allocate-ib-device-with-net-namespace-supplied-fro.patch +Patch1415: 1415-net-mlx5e-fix-error-handling-in-rq-memory-model-registration.patch +Patch1416: 1416-net-mlx5-fs-fix-rdma-transport-init-cleanup-flow.patch +Patch1417: 1417-net-mlx5-check-device-memory-pointer-before-usage.patch +Patch1418: 1418-net-mlx5-add-no-op-implementation-for-setting-tc-bw-on-rate-.patch +Patch1419: 1419-net-mlx5-add-support-for-setting-tc-bw-on-nodes.patch +Patch1420: 1420-net-mlx5-add-traffic-class-scheduling-support-for-vport-qos.patch +Patch1421: 1421-net-mlx5-manage-tc-arbiter-nodes-and-implement-full-support-.patch +Patch1422: 1422-net-mlx5-hws-remove-unused-create-dest-array-parameter.patch +Patch1423: 1423-net-mlx5-hws-remove-incorrect-comment.patch +Patch1424: 1424-net-mlx5-hws-export-rule-skip-logic.patch +Patch1425: 1425-net-mlx5-hws-refactor-rule-skip-logic.patch +Patch1426: 1426-net-mlx5-hws-create-stes-directly-from-matcher.patch +Patch1427: 1427-net-mlx5-hws-decouple-matcher-rx-and-tx-sizes.patch +Patch1428: 1428-net-mlx5-hws-track-matcher-sizes-individually.patch +Patch1429: 1429-net-mlx5-hws-rearrange-to-prevent-forward-declaration.patch +Patch1430: 1430-net-mlx5-hws-shrink-empty-matchers.patch +Patch1431: 1431-net-mlx5-add-hws-as-secondary-steering-mode.patch +Patch1432: 1432-net-mlx5-fix-spelling-mistake-disabliing-disabling.patch +Patch1433: 1433-eth-mlx5-migrate-to-the-rxfh-context-ops.patch +Patch1434: 1434-net-mlx5e-remove-unused-vlan-insertion-logic-in-tx-path.patch +Patch1435: 1435-net-mlx5e-ct-extract-a-memcmp-from-a-spinlock-section.patch +Patch1436: 1436-net-mlx5e-replace-recursive-vlan-push-handling-with-an-itera.patch +Patch1437: 1437-net-mlx5-warn-when-write-combining-is-not-supported.patch +Patch1438: 1438-net-mlx5e-rx-remove-unnecessary-rqt-redirects.patch +Patch1439: 1439-net-mlx5-expose-hca-capability-bits-for-mkey-max-page-size.patch +Patch1440: 1440-rdma-mlx5-fix-umr-modifying-of-mkey-page-size.patch +Patch1441: 1441-net-mlx5-expose-disciplined-fr-counter-through-hca-capabilit.patch +Patch1442: 1442-net-mlx5-ifc-updates-for-disabled-host-pf.patch +Patch1443: 1443-net-mlx5e-create-destroy-pcie-congestion-event-object.patch +Patch1444: 1444-net-mlx5e-add-device-pcie-congestion-ethtool-stats.patch +Patch1445: 1445-net-mlx5-fix-an-is-err-vs-null-bug-in-esw-qos-move-node.patch +Patch1446: 1446-net-mlx5-hws-enable-ipsec-hardware-offload-in-legacy-mode.patch +Patch1447: 1447-net-mlx5e-fix-kdoc-warning-on-eswitch-h.patch +Patch1448: 1448-net-mlx5e-properly-access-rcu-protected-qdisc-sleeping-varia.patch +Patch1449: 1449-net-mlx5-add-ifc-bits-to-support-rss-for-ipsec-offload.patch +Patch1450: 1450-net-mlx5-add-ifc-bits-and-enums-for-buf-ownership.patch +Patch1451: 1451-net-mlx5-expose-cable-length-field-in-pfcc-register.patch +Patch1452: 1452-net-mlx5e-shampo-cleanup-reservation-size-formula.patch +Patch1453: 1453-net-mlx5e-shampo-remove-mlx5e-shampo-get-log-hd-entry-size.patch +Patch1454: 1454-net-mlx5e-remove-duplicate-mkey-from-shampo-header.patch +Patch1455: 1455-pci-tph-expose-pcie-tph-get-st-table-size.patch +Patch1456: 1456-net-mlx5-expose-ifc-bits-for-tph.patch +Patch1457: 1457-net-mlx5-add-support-for-device-steering-tag.patch +Patch1458: 1458-net-mlx5-fix-build-wframe-larger-than-warnings.patch +Patch1459: 1459-net-fix-typos.patch +Patch1460: 1460-net-mlx5e-clear-read-only-port-buffer-size-in-pbmc-before-up.patch +Patch1461: 1461-net-mlx5e-remove-skb-secpath-if-xfrm-state-is-not-found.patch +Patch1462: 1462-net-mlx5e-fix-potential-deadlock-by-deferring-rx-timeout-rec.patch +Patch1463: 1463-net-mlx5e-support-routed-networks-during-ipsec-macs-initiali.patch +Patch1464: 1464-net-mlx5e-expose-tis-via-devlink-tx-reporter-diagnose.patch +Patch1465: 1465-net-mlx5-correctly-set-gso-segs-when-lro-is-used.patch +Patch1466: 1466-net-mlx5-hws-fix-bad-parameter-in-cq-creation.patch +Patch1467: 1467-net-mlx5-hws-fix-simple-rules-rehash-error-flow.patch +Patch1468: 1468-net-mlx5-hws-fix-complex-rules-rehash-error-flow.patch +Patch1469: 1469-net-mlx5-hws-prevent-rehash-from-filling-up-the-queues.patch +Patch1470: 1470-net-mlx5-hws-don-t-rehash-on-every-kind-of-insertion-failure.patch +Patch1471: 1471-net-mlx5-hws-fix-table-creation-uid.patch +Patch1472: 1472-net-mlx5-ct-use-the-correct-counter-offset.patch +Patch1473: 1473-net-mlx5-base-ecvf-devlink-port-attrs-from-0.patch +Patch1474: 1474-net-mlx5-remove-default-qos-group-and-attach-vports-directly.patch +Patch1475: 1475-net-mlx5e-preserve-tc-bw-during-parent-changes.patch +Patch1476: 1476-net-mlx5-destroy-vport-qos-element-when-no-configuration-rem.patch +Patch1477: 1477-net-mlx5-fix-qos-reference-leak-in-vport-enable-error-path.patch +Patch1478: 1478-net-mlx5-restore-missing-scheduling-node-cleanup-on-vport-en.patch +Patch1479: 1479-net-mlx5e-query-fw-for-buffer-ownership.patch +Patch1480: 1480-net-mlx5e-preserve-shared-buffer-capacity-during-headroom-up.patch +Patch1481: 1481-net-mlx5-hws-fix-memory-leak-in-hws-pool-buddy-init-error-pa.patch +Patch1482: 1482-net-mlx5-hws-fix-memory-leak-in-hws-action-get-shared-stc-ni.patch +Patch1483: 1483-net-mlx5-hws-fix-uninitialized-variables-in-mlx5hws-pat-calc.patch +Patch1484: 1484-net-mlx5-hws-fix-pattern-destruction-in-mlx5hws-pat-get-patt.patch +Patch1485: 1485-net-mlx5-reload-auxiliary-drivers-on-fw-activate.patch +Patch1486: 1486-net-mlx5-fix-lockdep-assertion-on-sync-reset-unload-event.patch +Patch1487: 1487-net-mlx5-nack-sync-reset-when-sfs-are-present.patch +Patch1488: 1488-net-mlx5-prevent-flow-steering-mode-changes-in-switchdev-mod.patch +Patch1489: 1489-net-mlx5e-set-local-xoff-after-fw-update.patch +Patch1490: 1490-net-mlx5e-harden-uplink-netdev-access-against-device-unbind.patch +Patch1491: 1491-net-mlx5e-add-a-miss-level-for-ipsec-crypto-offload.patch +Patch1492: 1492-net-mlx5-hws-ignore-flow-level-for-multi-dest-table.patch +Patch1493: 1493-net-mlx5e-fix-missing-fec-rs-stats-for-rs-544-514-interleave.patch +Patch1494: 1494-rdma-mlx5-support-driver-apis-pre-destroy-cq-and-post-destro.patch +Patch1495: 1495-rdma-mlx5-add-multiple-priorities-support-to-rdma-transport-.patch +Patch1496: 1496-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-flow-creat.patch +Patch1497: 1497-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-anchor-cre.patch +Patch1498: 1498-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-devx-creat.patch +Patch1499: 1499-rdma-mlx5-align-mkc-page-size-capability-check-to-prm.patch +Patch1500: 1500-rdma-mlx5-optimize-dmabuf-mkey-page-size.patch +Patch1501: 1501-rdma-mlx5-remove-redundant-check-on-err-on-return-expression.patch +Patch1502: 1502-rdma-mlx5-fix-returned-type-from-mlx5r-umr-zap-mkey.patch +Patch1503: 1503-rdma-mlx5-fix-incorrect-mkey-masking.patch +Patch1504: 1504-rdma-mlx5-add-dmah-object-support.patch +Patch1505: 1505-rdma-mlx5-add-dmah-support-for-reg-user-mr-reg-user-dmabuf-m.patch +Patch1506: 1506-rdma-mlx5-refactor-optional-counters-steering-code.patch +Patch1507: 1507-ib-mlx5-fix-obj-type-mismatch-for-srq-event-subscriptions.patch +Patch1508: 1508-net-mlx5-don-t-use-pk-through-tracepoints.patch +Patch1509: 1509-net-mlx5-mlx5-ifc-add-hardware-definitions-needed-for-adjace.patch +Patch1510: 1510-net-mlx5-e-switch-cache-vport-vhca-id-on-first-cap-query.patch +Patch1511: 1511-net-mlx5-e-switch-set-query-hca-cap-via-vhca-id.patch +Patch1512: 1512-rdma-net-mlx5-export-mlx5-vport-get-vhca-id.patch +Patch1513: 1513-net-mlx5-query-to-see-if-host-pf-is-disabled.patch +Patch1514: 1514-net-mlx5-support-disabling-host-pfs.patch +Patch1515: 1515-net-mlx5e-set-default-burst-period-for-tx-and-rx-reporters.patch +Patch1516: 1516-eth-mlx5-remove-kconfig-co-dependency-with-vxlan.patch +Patch1517: 1517-net-mlx5-fs-convert-vport-acls-root-namespaces-to-xarray.patch +Patch1518: 1518-net-mlx5-e-switch-move-vport-acls-root-namespaces-creation-t.patch +Patch1519: 1519-net-mlx5-e-switch-add-support-for-adjacent-functions-vports-.patch +Patch1520: 1520-net-mlx5-e-switch-create-acls-root-namespace-for-adjacent-vp.patch +Patch1521: 1521-net-mlx5-e-switch-register-representors-for-adjacent-vports.patch +Patch1522: 1522-net-mlx5-e-switch-set-representor-attributes-for-adjacent-vf.patch +Patch1523: 1523-net-mlx5-dr-hws-use-the-cached-vhca-id-for-this-device.patch +Patch1524: 1524-net-mlx5-add-psp-capabilities-structures-and-bits.patch +Patch1525: 1525-net-mlx5-extract-mtctr-register-read-logic-into-helper-funct.patch +Patch1526: 1526-net-mlx5-support-getcyclesx-and-getcrosscycles.patch +Patch1527: 1527-net-mlx5-add-rs-fec-histogram-infrastructure.patch +Patch1528: 1528-net-mlx5-implement-cqe-compress-type-via-devlink-params.patch +Patch1529: 1529-net-mlx5-implement-devlink-enable-sriov-parameter.patch +Patch1530: 1530-net-mlx5-implement-devlink-total-vfs-parameter.patch +Patch1531: 1531-net-mlx5e-make-pcie-congestion-event-thresholds-configurable.patch +Patch1532: 1532-net-mlx5e-add-stale-counter-for-pcie-congestion-events.patch +Patch1533: 1533-net-mlx5-fix-typo-in-pci-irq-c-comment.patch +Patch1534: 1534-net-mlx5-refactor-devcom-to-use-match-attributes.patch +Patch1535: 1535-net-mlx5-lag-move-devcom-registration-to-lag-layer.patch +Patch1536: 1536-net-mlx5-add-net-namespace-support-to-devcom.patch +Patch1537: 1537-net-mlx5-lag-add-net-namespace-support.patch +Patch1538: 1538-net-mlx5-remove-vlan-insertion-fields-from-wqe-ether-segment.patch +Patch1539: 1539-net-mlx5-refactor-macsec-wqe-metadata-shifts.patch +Patch1540: 1540-net-mlx5e-prevent-wqe-metadata-conflicts-between-timestampin.patch +Patch1541: 1541-net-mlx5-fix-typo-of-mlx5-eq-doorbel-offset.patch +Patch1542: 1542-net-mlx5-remove-unused-offset-field-from-mlx5-sq-bfreg.patch +Patch1543: 1543-net-mlx5e-remove-unused-xsk-param-of-mlx5e-build-xdpsq-param.patch +Patch1544: 1544-net-mlx5-store-the-global-doorbell-in-mlx5-priv.patch +Patch1545: 1545-net-mlx5e-prepare-for-using-multiple-tx-doorbells.patch +Patch1546: 1546-net-mlx5e-prepare-for-using-different-cq-doorbells.patch +Patch1547: 1547-net-mlx5e-use-multiple-tx-doorbells.patch +Patch1548: 1548-net-mlx5e-use-multiple-cq-doorbells.patch +Patch1549: 1549-net-mlx5e-use-the-num-doorbells-devlink-param.patch +Patch1550: 1550-net-mlx5e-use-unsigned-for-mlx5e-get-max-num-channels.patch +Patch1551: 1551-net-mlx5-add-uar-access-and-odp-page-fault-counters.patch +Patch1552: 1552-net-mlx5-change-ttc-rules-to-match-on-undecrypted-esp-packet.patch +Patch1553: 1553-net-mlx5e-recirculate-decrypted-packets-into-ttc-table.patch +Patch1554: 1554-net-mlx5e-add-flow-groups-for-the-packets-decrypted-by-crypt.patch +Patch1555: 1555-net-mlx5e-add-flow-rules-for-the-decrypted-esp-packets.patch +Patch1556: 1556-net-mlx5-remove-dead-code-from-total-vfs-setter.patch +Patch1557: 1557-net-mlx5-use-pe-format-specifier-for-error-pointers.patch +Patch1558: 1558-net-mlx5-expose-uar-access-and-odp-page-fault-counters.patch +Patch1559: 1559-net-mlx5-add-ifc-bit-for-tir-sq-order-capability.patch +Patch1560: 1560-net-mlx5-ifc-add-balance-id-and-lag-per-mp-group-bits.patch +Patch1561: 1561-net-mlx5-stop-polling-for-command-response-if-interface-goes.patch +Patch1562: 1562-net-mlx5-pagealloc-fix-reclaim-race-during-command-interface.patch +Patch1563: 1563-net-mlx5-fw-reset-add-reset-timeout-work.patch +Patch1564: 1564-net-mlx5-improve-write-combining-test-reliability-for-arm64-.patch +Patch1565: 1565-net-mlx5-hws-generalize-complex-matchers.patch +Patch1566: 1566-net-mlx5e-prevent-entering-switchdev-mode-with-inconsistent-.patch +Patch1567: 1567-net-mlx5-improve-qos-error-messages-with-actual-depth-values.patch +Patch1568: 1568-net-mlx5e-remove-unused-mdev-param-from-rss-indir-init.patch +Patch1569: 1569-net-mlx5e-introduce-mlx5e-rss-init-params.patch +Patch1570: 1570-net-mlx5e-introduce-mlx5e-rss-params-for-rss-configuration.patch +Patch1571: 1571-net-mlx5e-use-extack-in-set-rxfh-callback.patch +Patch1572: 1572-net-mlx5-prevent-tunnel-mode-conflicts-between-fdb-and-nic-i.patch +Patch1573: 1573-net-mlx5e-prevent-tunnel-reformat-when-tunnel-mode-not-allow.patch +Patch1574: 1574-net-mlx5-fix-pre-2-40-binutils-assembler-error.patch +Patch1575: 1575-net-mlx5e-return-1-instead-of-0-in-invalid-case-in-mlx5e-mpw.patch +Patch1576: 1576-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch +Patch1577: 1577-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch +Patch1578: 1578-net-mlx5-add-pphcr-to-pcam-supported-registers-mask.patch +Patch1579: 1579-net-mlx5-refactor-devcom-to-return-null-on-failure.patch +Patch1580: 1580-net-mlx5-fix-ipsec-cleanup-over-mpv-device.patch +Patch1581: 1581-net-mlx5-don-t-zero-user-count-when-destroying-fdb-tables.patch +Patch1582: 1582-net-mlx5e-fix-return-value-in-case-of-module-eeprom-read-err.patch +Patch1583: 1583-net-mlx5e-fix-missing-error-assignment-in-mlx5e-xfrm-add-sta.patch +Patch1584: 1584-net-mlx5e-trim-the-length-of-the-num-doorbell-error.patch +Patch1585: 1585-net-mlx5e-fix-maxrate-wraparound-in-threshold-between-units.patch +Patch1586: 1586-net-mlx5e-fix-wraparound-in-rate-limiting-for-values-above-2.patch +Patch1587: 1587-net-mlx5e-fix-potentially-misleading-debug-message.patch +Patch1588: 1588-mlx5-fix-default-values-in-create-cq.patch +Patch1589: 1589-net-mlx5-clean-up-only-new-irq-glue-on-request-irq-failure.patch +Patch1590: 1590-net-mlx5e-fix-validation-logic-in-rate-limiting.patch +Patch1591: 1591-rdma-mlx5-enable-data-direct-with-relaxed-ordering.patch +Patch1592: 1592-rdma-mlx5-better-estimate-max-qp-wr-to-reflect-wqe-count.patch +Patch1593: 1593-rdma-mlx5-fix-vport-loopback-forcing-for-mpv-device.patch +Patch1594: 1594-rdma-mlx5-fix-page-size-bitmap-calculation-for-ksm-mode.patch +Patch1595: 1595-rdma-use-pe-format-specifier-for-error-pointers.patch +Patch1596: 1596-rdma-net-mlx5-query-vports-mac-address-from-device.patch +Patch1597: 1597-net-mlx5-use-common-mlx5-same-hw-devs-function.patch +Patch1598: 1598-net-mlx5-add-software-system-image-guid-infrastructure.patch +Patch1599: 1599-net-mlx5-refactor-ptp-clock-devcom-pairing.patch +Patch1600: 1600-net-mlx5-refactor-hca-cap-2-setting.patch +Patch1601: 1601-net-mlx5-add-balance-id-support-for-lag-multiplane-groups.patch +Patch1602: 1602-net-mlx5e-remove-redundant-tstamp-pointer-from-channel-struc.patch +Patch1603: 1603-net-mlx5e-remove-unnecessary-tstamp-local-variable-in-mlx5i-.patch +Patch1604: 1604-net-mlx5e-rename-hwstamp-functions-to-hwtstamp.patch +Patch1605: 1605-net-mlx5e-rename-timestamp-fields-to-hwtstamp-config.patch +Patch1606: 1606-net-mlx5e-convert-to-new-hwtstamp-get-set-interface.patch +Patch1607: 1607-net-mlx5e-enhance-function-structures-for-self-loopback-prev.patch +Patch1608: 1608-net-mlx5e-use-tir-api-in-mlx5e-modify-tirs-lb.patch +Patch1609: 1609-net-mlx5e-allow-setting-self-loopback-prevention-bits-on-tir.patch +Patch1610: 1610-net-mlx5-ipoib-set-self-loopback-prevention-in-tir-init.patch +Patch1611: 1611-net-mlx5e-do-not-re-apply-tir-loopback-configuration-if-not-.patch +Patch1612: 1612-net-mlx5e-pass-old-channels-as-argument-to-mlx5e-switch-priv.patch +Patch1613: 1613-net-mlx5e-defer-channels-closure-to-reduce-interface-down-ti.patch +Patch1614: 1614-pci-tph-expose-pcie-tph-get-st-table-loc.patch +Patch1615: 1615-net-mlx5-add-direct-st-mode-support-for-rdma.patch +Patch1616: 1616-net-mlx5-add-other-eswitch-hw-capabilities.patch +Patch1617: 1617-net-mlx5-fs-add-other-eswitch-support-for-steering-tables.patch +Patch1618: 1618-net-mlx5-fs-set-non-default-device-per-namespace.patch +Patch1619: 1619-net-mlx5-mpfs-add-support-for-dynamic-enable-disable.patch +Patch1620: 1620-net-mlx5-e-switch-support-eswitch-inactive-mode.patch +Patch1621: 1621-net-mlx5-expose-definition-for-1600gbps-link-mode.patch +Patch1622: 1622-mlx5-extract-grxrings-from-get-rxnfc.patch +Patch1623: 1623-net-mlx5-refactor-eeprom-query-error-handling-to-return-stat.patch +Patch1624: 1624-net-mlx5e-recover-sq-on-excessive-ptp-tx-timestamp-delta.patch +Patch1625: 1625-net-mlx5-remove-redundant-bw-share-minimal-value-assignment.patch +Patch1626: 1626-net-mlx5-abort-new-commands-if-all-command-slots-are-stalled.patch +Patch1627: 1627-net-mlx5-use-eopnotsupp-instead-of-enotsupp.patch +Patch1628: 1628-net-mlx5-initialize-events-outside-devlink-lock.patch +Patch1629: 1629-net-mlx5-move-the-esw-mode-notifier-chain-outside-the-devlin.patch +Patch1630: 1630-net-mlx5-move-the-vhca-event-notifier-outside-of-the-devlink.patch +Patch1631: 1631-net-mlx5-move-the-sf-hw-table-notifier-outside-the-devlink-l.patch +Patch1632: 1632-net-mlx5-move-the-sf-table-notifiers-outside-the-devlink-loc.patch +Patch1633: 1633-net-mlx5-move-sf-dev-table-notifier-registration-outside-the.patch +Patch1634: 1634-net-mlx5e-use-u64-instead-of-u64-in-ieee-setmaxrate.patch +Patch1635: 1635-net-mlx5e-rename-upper-limit-mbps-to-upper-limit-100mbps.patch +Patch1636: 1636-net-mlx5e-use-u8-max-instead-of-hard-coded-magic-number.patch +Patch1637: 1637-net-mlx5e-use-standard-unit-definitions-for-bandwidth-conver.patch +Patch1638: 1638-net-mlx5e-update-xdp-features-in-switch-channels.patch +Patch1639: 1639-net-mlx5e-support-xdp-target-xmit-with-dummy-program.patch +Patch1640: 1640-net-mlx5-make-enable-mpesw-idempotent.patch +Patch1641: 1641-net-mlx5-fix-double-unregister-of-hca-ports-component.patch +Patch1642: 1642-net-mlx5-fw-reset-clear-reset-requested-on-drain-fw-reset.patch +Patch1643: 1643-net-mlx5-drain-firmware-reset-in-shutdown-callback.patch +Patch1644: 1644-net-mlx5-fw-tracer-validate-format-string-parameters.patch +Patch1645: 1645-net-mlx5-fw-tracer-handle-escaped-percent-properly.patch +Patch1646: 1646-net-mlx5-serialize-firmware-reset-with-devlink.patch +Patch1647: 1647-net-mlx5e-use-ip6-dst-lookup-instead-of-ipv6-dst-lookup-flow.patch +Patch1648: 1648-net-mlx5e-trigger-neighbor-resolution-for-unresolved-destina.patch +Patch1649: 1649-net-mlx5e-do-not-update-bql-of-old-txqs-during-channel-recon.patch +Patch1650: 1650-net-mlx5-lag-multipath-give-priority-for-routes-with-smaller.patch +Patch1651: 1651-net-mlx5e-fix-null-pointer-dereference-in-ioctl-module-eepro.patch +Patch1652: 1652-net-mlx5e-don-t-print-error-message-due-to-invalid-module.patch +Patch1653: 1653-net-mlx5e-fix-crash-on-profile-change-rollback-failure.patch +Patch1654: 1654-net-mlx5e-don-t-store-mlx5e-priv-in-mlx5e-dev-devlink-priv.patch +Patch1655: 1655-net-mlx5e-pass-netdev-to-mlx5e-destroy-netdev-instead-of-pri.patch +Patch1656: 1656-net-mlx5e-restore-destroying-state-bit-after-profile-cleanup.patch +Patch1657: 1657-net-mlx5-fix-memory-leak-in-esw-acl-ingress-lgcy-setup.patch +Patch1658: 1658-net-mlx5-fix-unbinding-uplink-netdev-in-switchdev-mode.patch +Patch1659: 1659-net-mlx5e-tc-delete-flows-only-for-existing-peers.patch +Patch1660: 1660-net-mlx5e-account-for-netdev-stats-in-ndo-get-stats64.patch +Patch1661: 1661-net-mlx5-fix-return-type-mismatch-in-mlx5-esw-vport-vhca-id.patch +Patch1662: 1662-net-mlx5-fs-fix-inverted-cap-check-in-tx-flow-table-root-dis.patch +Patch1663: 1663-net-mlx5-fix-vhca-id-access-call-trace-use-before-alloc.patch +Patch1664: 1664-net-mlx5e-skip-esn-replay-window-setup-for-ipsec-crypto-offl.patch +Patch1665: 1665-rdma-mlx5-change-default-device-for-lag-slaves-in-rdma-trans.patch +Patch1666: 1666-rdma-mlx5-add-other-eswitch-support-for-devx-destruction.patch +Patch1667: 1667-rdma-mlx5-refactor-get-prio-function.patch +Patch1668: 1668-rdma-mlx5-add-other-eswitch-support-to-userspace-tables.patch +Patch1669: 1669-ib-mlx5-reduce-imr-ksm-size-when-5-level-paging-is-enabled.patch +Patch1670: 1670-net-mlx5e-shampo-fix-header-mapping-for-64k-pages.patch +Patch1671: 1671-net-mlx5e-shampo-fix-skb-size-check-for-64k-pages.patch +Patch1672: 1672-net-mlx5e-shampo-fix-header-formulas-for-higher-mtus-and-64k.patch +Patch1673: 1673-net-mlx5-qos-restrict-rtnl-area-to-avoid-a-lock-cycle.patch +Patch1674: 1674-net-mlx5-fix-peer-miss-rules-host-disabled-checks.patch +Patch1675: 1675-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-legacy-rq.patch +Patch1676: 1676-net-mlx5-fix-crash-when-moving-to-switchdev-mode.patch +Patch1677: 1677-net-mlx5-fix-hca-caps-leak-on-notifier-init-failure.patch +Patch1678: 1678-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-striding-rq.patch +Patch1679: 1679-iavf-fix-vlan-filter-lost-on-add-delete-race.patch +Patch1680: 1680-iavf-rename-iavf-vlan-is-new-to-iavf-vlan-adding.patch +Patch1681: 1681-iavf-stop-removing-vlan-filters-from-pf-on-interface-down.patch +Patch1682: 1682-iavf-wait-for-pf-confirmation-before-removing-vlan-filters.patch +Patch1683: 1683-iavf-add-virtchnl-op-add-vlan-to-success-completion-handler.patch +Patch1684: 1684-netfilter-skip-recording-stale-or-retransmitted-init.patch +Patch1685: 1685-sctp-discard-stale-init-after-handshake-completion.patch +Patch1686: 1686-rdma-vmw-pvrdma-fix-double-free-on-pvrdma-alloc-ucontext-err.patch +Patch1687: 1687-sched-fair-skip-sched-balance-running-cmpxchg-when-balance-i.patch +Patch1688: 1688-sched-fair-have-sd-serialize-affect-newidle-balancing.patch +Patch1689: 1689-powerpc-64-force-inlining-of-prevent-user-access-and-set-kua.patch +Patch1690: 1690-compiler-gcc-h-remove-ancient-workaround-for-gcc-pr-58670.patch +Patch1691: 1691-work-around-gcc-bugs-with-asm-goto-with-outputs.patch +Patch1692: 1692-init-kconfig-fix-cc-has-asm-goto-tied-output-test-with-dash.patch +Patch1693: 1693-update-workarounds-for-gcc-asm-goto-issue.patch +Patch1694: 1694-init-kconfig-remove-config-gcc-asm-goto-output-workaround.patch +Patch1695: 1695-rdma-mlx5-fix-error-path-fall-through-in-mlx5-ib-dev-res-srq.patch # END OF PATCH DEFINITIONS %description @@ -2147,6 +2530,389 @@ ApplyPatch 1309-scsi-qla2xxx-completely-fix-fcport-double-free.patch ApplyPatch 1310-rbd-eliminate-a-race-in-lock-dwork-draining-on-unmap.patch ApplyPatch 1311-net-sched-fix-pedit-partial-cow-leading-to-page-cache-corrup.patch ApplyPatch 1312-kvm-arm64-vgic-its-drop-translation-cache-ref-only-for-eras.patch +ApplyPatch 1313-netfilter-flowtable-strictly-check-for-maximum-number-of-act.patch +ApplyPatch 1314-drm-amd-display-do-not-skip-unrelated-mode-changes-in-dsc-va.patch +ApplyPatch 1315-ipv6-icmp-clear-skb2-cb-in-ip6-err-gen-icmpv6-unreach.patch +ApplyPatch 1316-alsa-aloop-fix-peer-runtime-uaf-during-format-change-stop.patch +ApplyPatch 1317-rdma-iwcm-fix-workqueue-list-corruption-by-removing-work-lis.patch +ApplyPatch 1318-binder-use-cred-instead-of-task-for-selinux-checks.patch +ApplyPatch 1319-locks-fix-toctou-race-when-granting-write-lease.patch +ApplyPatch 1320-fs-use-a-helper-for-opening-kernel-internal-files.patch +ApplyPatch 1321-fs-move-kmem-cache-zalloc-into-alloc-empty-file-helpers.patch +ApplyPatch 1322-fs-use-backing-file-container-for-internal-files-with-fake-f.patch +ApplyPatch 1323-ovl-enable-fsnotify-events-on-underlying-real-files.patch +ApplyPatch 1324-fs-move-cleanup-from-init-file-into-its-callers.patch +ApplyPatch 1325-lsm-constify-the-file-parameter-in-security-binder-transfer-.patch +ApplyPatch 1326-cachefiles-use-kiocb-start-end-write-helpers.patch +ApplyPatch 1327-fs-fix-kernel-doc-warnings.patch +ApplyPatch 1328-fs-rename-mnt-want-drop-write-helpers.patch +ApplyPatch 1329-fs-get-mnt-writers-count-for-an-open-backing-file-s-real-pat.patch +ApplyPatch 1330-fs-create-helper-file-user-path-for-user-displayed-mapped-fi.patch +ApplyPatch 1331-fs-store-real-path-instead-of-fake-path-in-backing-file-f-pa.patch +ApplyPatch 1332-fs-prepare-for-stackable-filesystems-backing-file-helpers.patch +ApplyPatch 1333-fs-factor-out-backing-file-read-write-iter-helpers.patch +ApplyPatch 1334-fs-factor-out-backing-file-splice-read-write-helpers.patch +ApplyPatch 1335-fs-factor-out-backing-file-mmap-helper.patch +ApplyPatch 1336-lsm-add-helper-for-blob-allocations.patch +ApplyPatch 1337-ovl-fix-nested-backing-file-paths.patch +ApplyPatch 1338-fs-constify-file-ptr-in-backing-file-accessor-helpers.patch +ApplyPatch 1339-ovl-remove-unneeded-non-const-conversion.patch +ApplyPatch 1340-ovl-remove-redundant-iocb-dio-caller-comp-clearing.patch +ApplyPatch 1341-perf-core-fix-mmap-event-path-names-with-backing-files.patch +ApplyPatch 1342-fs-prepare-for-adding-lsm-blob-to-backing-file.patch +ApplyPatch 1343-lsm-add-backing-file-lsm-hooks.patch +ApplyPatch 1344-selinux-fix-overlayfs-mmap-and-mprotect-access-checks.patch +ApplyPatch 1345-selinux-rhel-only-hotfix-for-execmem-regression.patch +ApplyPatch 1346-net-mlx5-hws-fix-matcher-action-template-attach.patch +ApplyPatch 1347-net-mlx5-hws-remove-unused-element-array.patch +ApplyPatch 1348-net-mlx5-hws-make-pool-single-resource.patch +ApplyPatch 1349-net-mlx5-hws-refactor-pool-implementation.patch +ApplyPatch 1350-net-mlx5-hws-cleanup-after-pool-refactoring.patch +ApplyPatch 1351-net-mlx5-hws-add-fullness-tracking-to-pool.patch +ApplyPatch 1352-net-mlx5-hws-fix-pool-size-optimization.patch +ApplyPatch 1353-net-mlx5-hws-implement-action-ste-pool.patch +ApplyPatch 1354-net-mlx5-hws-use-the-new-action-ste-pool.patch +ApplyPatch 1355-net-mlx5-hws-cleanup-matcher-action-ste-table.patch +ApplyPatch 1356-net-mlx5-hws-free-unused-action-ste-tables.patch +ApplyPatch 1357-net-mlx5-hws-export-action-ste-tables-to-debugfs.patch +ApplyPatch 1358-net-mlx5e-ethtool-fix-formatting-of-ptp-rq0-csum-complete-ta.patch +ApplyPatch 1359-net-mlx5-fix-spelling-mistakes-in-mlx5-core-dbg-message-and-.patch +ApplyPatch 1360-net-mlx5-hws-fix-ip-version-decision.patch +ApplyPatch 1361-net-mlx5-hws-harden-ip-version-definer-checks.patch +ApplyPatch 1362-net-mlx5-hws-disallow-matcher-ip-version-mixing.patch +ApplyPatch 1363-rdma-mlx5-fix-error-flow-upon-firmware-failure-for-rq-destru.patch +ApplyPatch 1364-net-mlx5-support-software-tx-timestamp.patch +ApplyPatch 1365-net-mlx5-hws-expose-function-mlx5hws-table-ft-set-next-ft-in.patch +ApplyPatch 1366-net-mlx5-hws-add-definer-function-to-get-field-name-str.patch +ApplyPatch 1367-net-mlx5-hws-expose-polling-function-in-header-file.patch +ApplyPatch 1368-net-mlx5-hws-introduce-isolated-matchers.patch +ApplyPatch 1369-net-mlx5-hws-support-complex-matchers.patch +ApplyPatch 1370-net-mlx5-hws-force-rehash-when-rule-insertion-failed.patch +ApplyPatch 1371-net-mlx5-hws-fix-counting-of-rules-in-the-matcher.patch +ApplyPatch 1372-net-mlx5-hws-fix-redundant-extension-of-action-templates.patch +ApplyPatch 1373-net-mlx5-hws-rework-rehash-loop.patch +ApplyPatch 1374-net-mlx5-hws-dump-bad-completion-details.patch +ApplyPatch 1375-net-mlx5-use-to-delayed-work.patch +ApplyPatch 1376-net-mlx5-sws-fix-reformat-id-error-handling.patch +ApplyPatch 1377-net-mlx5-hws-register-reformat-actions-with-fw.patch +ApplyPatch 1378-net-mlx5-hws-fix-typo-nope-to-nop.patch +ApplyPatch 1379-net-mlx5-hws-handle-modify-header-actions-dependency.patch +ApplyPatch 1380-net-mlx5-core-add-error-handling-inmlx5-query-nic-vport-qkey.patch +ApplyPatch 1381-net-mlx5e-allow-setting-mac-address-of-representors.patch +ApplyPatch 1382-net-mlx5-add-error-handling-in-mlx5-query-nic-vport-node-gui.patch +ApplyPatch 1383-net-mlx5-hws-fix-an-error-code-in-mlx5hws-bwc-rule-create-co.patch +ApplyPatch 1384-net-mlx5-ensure-fw-pages-are-always-allocated-on-same-numa.patch +ApplyPatch 1385-net-mlx5-fix-return-value-when-searching-for-existing-flow-g.patch +ApplyPatch 1386-net-mlx5-hws-init-mutex-on-the-correct-path.patch +ApplyPatch 1387-net-mlx5-hws-fix-missing-ip-version-handling-in-definer.patch +ApplyPatch 1388-net-mlx5-hws-make-sure-the-uplink-is-the-last-destination.patch +ApplyPatch 1389-net-mlx5e-fix-leak-of-geneve-tlv-option-object.patch +ApplyPatch 1390-net-mlx5-hws-add-error-checking-to-hws-bwc-rule-complex-hash.patch +ApplyPatch 1391-net-mlx5e-fix-race-between-dim-disable-and-net-dim.patch +ApplyPatch 1392-net-mlx5e-add-new-prio-for-promiscuous-mode.patch +ApplyPatch 1393-net-mlx5-correctly-set-gso-size-when-lro-is-used.patch +ApplyPatch 1394-net-mlx5-fix-memory-leak-in-cmd-exec.patch +ApplyPatch 1395-net-mlx5-e-switch-fix-peer-miss-rules-to-use-peer-eswitch.patch +ApplyPatch 1396-rdma-mlx5-convert-timeouts-to-secs-to-jiffies.patch +ApplyPatch 1397-rdma-mlx5-remove-the-redundant-mlx5-ib-stage-uar-stage.patch +ApplyPatch 1398-rdma-mlx5-add-support-for-200gbps-per-lane-speeds.patch +ApplyPatch 1399-rdma-mlx5-avoid-flexible-array-warning.patch +ApplyPatch 1400-rdma-mlx5-initialize-obj-event-obj-sub-list-before-xa-insert.patch +ApplyPatch 1401-rdma-mlx5-fix-hw-counters-query-for-non-representor-devices.patch +ApplyPatch 1402-rdma-mlx5-fix-cc-counters-query-for-mpv.patch +ApplyPatch 1403-rdma-mlx5-fix-vport-loopback-for-mpv-device.patch +ApplyPatch 1404-net-mlx5-expose-serial-numbers-in-devlink-info.patch +ApplyPatch 1405-net-mlx5e-shampo-reorganize-mlx5-rq-shampo-alloc.patch +ApplyPatch 1406-net-mlx5e-shampo-remove-redundant-params.patch +ApplyPatch 1407-net-mlx5e-shampo-improve-hw-gro-capability-checking.patch +ApplyPatch 1408-net-mlx5e-shampo-separate-pool-for-headers.patch +ApplyPatch 1409-net-mlx5e-implement-queue-mgmt-ops-and-single-channel-swap.patch +ApplyPatch 1410-net-mlx5e-support-ethtool-tcp-data-split-settings.patch +ApplyPatch 1411-net-mlx5-fs-add-multiple-prios-to-rdma-transport-steering-do.patch +ApplyPatch 1412-net-mlx5-small-refactor-for-general-object-capabilities.patch +ApplyPatch 1413-net-mlx5-add-ifc-bits-for-pcie-congestion-event-object.patch +ApplyPatch 1414-rdma-mlx5-allocate-ib-device-with-net-namespace-supplied-fro.patch +ApplyPatch 1415-net-mlx5e-fix-error-handling-in-rq-memory-model-registration.patch +ApplyPatch 1416-net-mlx5-fs-fix-rdma-transport-init-cleanup-flow.patch +ApplyPatch 1417-net-mlx5-check-device-memory-pointer-before-usage.patch +ApplyPatch 1418-net-mlx5-add-no-op-implementation-for-setting-tc-bw-on-rate-.patch +ApplyPatch 1419-net-mlx5-add-support-for-setting-tc-bw-on-nodes.patch +ApplyPatch 1420-net-mlx5-add-traffic-class-scheduling-support-for-vport-qos.patch +ApplyPatch 1421-net-mlx5-manage-tc-arbiter-nodes-and-implement-full-support-.patch +ApplyPatch 1422-net-mlx5-hws-remove-unused-create-dest-array-parameter.patch +ApplyPatch 1423-net-mlx5-hws-remove-incorrect-comment.patch +ApplyPatch 1424-net-mlx5-hws-export-rule-skip-logic.patch +ApplyPatch 1425-net-mlx5-hws-refactor-rule-skip-logic.patch +ApplyPatch 1426-net-mlx5-hws-create-stes-directly-from-matcher.patch +ApplyPatch 1427-net-mlx5-hws-decouple-matcher-rx-and-tx-sizes.patch +ApplyPatch 1428-net-mlx5-hws-track-matcher-sizes-individually.patch +ApplyPatch 1429-net-mlx5-hws-rearrange-to-prevent-forward-declaration.patch +ApplyPatch 1430-net-mlx5-hws-shrink-empty-matchers.patch +ApplyPatch 1431-net-mlx5-add-hws-as-secondary-steering-mode.patch +ApplyPatch 1432-net-mlx5-fix-spelling-mistake-disabliing-disabling.patch +ApplyPatch 1433-eth-mlx5-migrate-to-the-rxfh-context-ops.patch +ApplyPatch 1434-net-mlx5e-remove-unused-vlan-insertion-logic-in-tx-path.patch +ApplyPatch 1435-net-mlx5e-ct-extract-a-memcmp-from-a-spinlock-section.patch +ApplyPatch 1436-net-mlx5e-replace-recursive-vlan-push-handling-with-an-itera.patch +ApplyPatch 1437-net-mlx5-warn-when-write-combining-is-not-supported.patch +ApplyPatch 1438-net-mlx5e-rx-remove-unnecessary-rqt-redirects.patch +ApplyPatch 1439-net-mlx5-expose-hca-capability-bits-for-mkey-max-page-size.patch +ApplyPatch 1440-rdma-mlx5-fix-umr-modifying-of-mkey-page-size.patch +ApplyPatch 1441-net-mlx5-expose-disciplined-fr-counter-through-hca-capabilit.patch +ApplyPatch 1442-net-mlx5-ifc-updates-for-disabled-host-pf.patch +ApplyPatch 1443-net-mlx5e-create-destroy-pcie-congestion-event-object.patch +ApplyPatch 1444-net-mlx5e-add-device-pcie-congestion-ethtool-stats.patch +ApplyPatch 1445-net-mlx5-fix-an-is-err-vs-null-bug-in-esw-qos-move-node.patch +ApplyPatch 1446-net-mlx5-hws-enable-ipsec-hardware-offload-in-legacy-mode.patch +ApplyPatch 1447-net-mlx5e-fix-kdoc-warning-on-eswitch-h.patch +ApplyPatch 1448-net-mlx5e-properly-access-rcu-protected-qdisc-sleeping-varia.patch +ApplyPatch 1449-net-mlx5-add-ifc-bits-to-support-rss-for-ipsec-offload.patch +ApplyPatch 1450-net-mlx5-add-ifc-bits-and-enums-for-buf-ownership.patch +ApplyPatch 1451-net-mlx5-expose-cable-length-field-in-pfcc-register.patch +ApplyPatch 1452-net-mlx5e-shampo-cleanup-reservation-size-formula.patch +ApplyPatch 1453-net-mlx5e-shampo-remove-mlx5e-shampo-get-log-hd-entry-size.patch +ApplyPatch 1454-net-mlx5e-remove-duplicate-mkey-from-shampo-header.patch +ApplyPatch 1455-pci-tph-expose-pcie-tph-get-st-table-size.patch +ApplyPatch 1456-net-mlx5-expose-ifc-bits-for-tph.patch +ApplyPatch 1457-net-mlx5-add-support-for-device-steering-tag.patch +ApplyPatch 1458-net-mlx5-fix-build-wframe-larger-than-warnings.patch +ApplyPatch 1459-net-fix-typos.patch +ApplyPatch 1460-net-mlx5e-clear-read-only-port-buffer-size-in-pbmc-before-up.patch +ApplyPatch 1461-net-mlx5e-remove-skb-secpath-if-xfrm-state-is-not-found.patch +ApplyPatch 1462-net-mlx5e-fix-potential-deadlock-by-deferring-rx-timeout-rec.patch +ApplyPatch 1463-net-mlx5e-support-routed-networks-during-ipsec-macs-initiali.patch +ApplyPatch 1464-net-mlx5e-expose-tis-via-devlink-tx-reporter-diagnose.patch +ApplyPatch 1465-net-mlx5-correctly-set-gso-segs-when-lro-is-used.patch +ApplyPatch 1466-net-mlx5-hws-fix-bad-parameter-in-cq-creation.patch +ApplyPatch 1467-net-mlx5-hws-fix-simple-rules-rehash-error-flow.patch +ApplyPatch 1468-net-mlx5-hws-fix-complex-rules-rehash-error-flow.patch +ApplyPatch 1469-net-mlx5-hws-prevent-rehash-from-filling-up-the-queues.patch +ApplyPatch 1470-net-mlx5-hws-don-t-rehash-on-every-kind-of-insertion-failure.patch +ApplyPatch 1471-net-mlx5-hws-fix-table-creation-uid.patch +ApplyPatch 1472-net-mlx5-ct-use-the-correct-counter-offset.patch +ApplyPatch 1473-net-mlx5-base-ecvf-devlink-port-attrs-from-0.patch +ApplyPatch 1474-net-mlx5-remove-default-qos-group-and-attach-vports-directly.patch +ApplyPatch 1475-net-mlx5e-preserve-tc-bw-during-parent-changes.patch +ApplyPatch 1476-net-mlx5-destroy-vport-qos-element-when-no-configuration-rem.patch +ApplyPatch 1477-net-mlx5-fix-qos-reference-leak-in-vport-enable-error-path.patch +ApplyPatch 1478-net-mlx5-restore-missing-scheduling-node-cleanup-on-vport-en.patch +ApplyPatch 1479-net-mlx5e-query-fw-for-buffer-ownership.patch +ApplyPatch 1480-net-mlx5e-preserve-shared-buffer-capacity-during-headroom-up.patch +ApplyPatch 1481-net-mlx5-hws-fix-memory-leak-in-hws-pool-buddy-init-error-pa.patch +ApplyPatch 1482-net-mlx5-hws-fix-memory-leak-in-hws-action-get-shared-stc-ni.patch +ApplyPatch 1483-net-mlx5-hws-fix-uninitialized-variables-in-mlx5hws-pat-calc.patch +ApplyPatch 1484-net-mlx5-hws-fix-pattern-destruction-in-mlx5hws-pat-get-patt.patch +ApplyPatch 1485-net-mlx5-reload-auxiliary-drivers-on-fw-activate.patch +ApplyPatch 1486-net-mlx5-fix-lockdep-assertion-on-sync-reset-unload-event.patch +ApplyPatch 1487-net-mlx5-nack-sync-reset-when-sfs-are-present.patch +ApplyPatch 1488-net-mlx5-prevent-flow-steering-mode-changes-in-switchdev-mod.patch +ApplyPatch 1489-net-mlx5e-set-local-xoff-after-fw-update.patch +ApplyPatch 1490-net-mlx5e-harden-uplink-netdev-access-against-device-unbind.patch +ApplyPatch 1491-net-mlx5e-add-a-miss-level-for-ipsec-crypto-offload.patch +ApplyPatch 1492-net-mlx5-hws-ignore-flow-level-for-multi-dest-table.patch +ApplyPatch 1493-net-mlx5e-fix-missing-fec-rs-stats-for-rs-544-514-interleave.patch +ApplyPatch 1494-rdma-mlx5-support-driver-apis-pre-destroy-cq-and-post-destro.patch +ApplyPatch 1495-rdma-mlx5-add-multiple-priorities-support-to-rdma-transport-.patch +ApplyPatch 1496-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-flow-creat.patch +ApplyPatch 1497-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-anchor-cre.patch +ApplyPatch 1498-rdma-mlx5-check-cap-net-raw-in-user-namespace-for-devx-creat.patch +ApplyPatch 1499-rdma-mlx5-align-mkc-page-size-capability-check-to-prm.patch +ApplyPatch 1500-rdma-mlx5-optimize-dmabuf-mkey-page-size.patch +ApplyPatch 1501-rdma-mlx5-remove-redundant-check-on-err-on-return-expression.patch +ApplyPatch 1502-rdma-mlx5-fix-returned-type-from-mlx5r-umr-zap-mkey.patch +ApplyPatch 1503-rdma-mlx5-fix-incorrect-mkey-masking.patch +ApplyPatch 1504-rdma-mlx5-add-dmah-object-support.patch +ApplyPatch 1505-rdma-mlx5-add-dmah-support-for-reg-user-mr-reg-user-dmabuf-m.patch +ApplyPatch 1506-rdma-mlx5-refactor-optional-counters-steering-code.patch +ApplyPatch 1507-ib-mlx5-fix-obj-type-mismatch-for-srq-event-subscriptions.patch +ApplyPatch 1508-net-mlx5-don-t-use-pk-through-tracepoints.patch +ApplyPatch 1509-net-mlx5-mlx5-ifc-add-hardware-definitions-needed-for-adjace.patch +ApplyPatch 1510-net-mlx5-e-switch-cache-vport-vhca-id-on-first-cap-query.patch +ApplyPatch 1511-net-mlx5-e-switch-set-query-hca-cap-via-vhca-id.patch +ApplyPatch 1512-rdma-net-mlx5-export-mlx5-vport-get-vhca-id.patch +ApplyPatch 1513-net-mlx5-query-to-see-if-host-pf-is-disabled.patch +ApplyPatch 1514-net-mlx5-support-disabling-host-pfs.patch +ApplyPatch 1515-net-mlx5e-set-default-burst-period-for-tx-and-rx-reporters.patch +ApplyPatch 1516-eth-mlx5-remove-kconfig-co-dependency-with-vxlan.patch +ApplyPatch 1517-net-mlx5-fs-convert-vport-acls-root-namespaces-to-xarray.patch +ApplyPatch 1518-net-mlx5-e-switch-move-vport-acls-root-namespaces-creation-t.patch +ApplyPatch 1519-net-mlx5-e-switch-add-support-for-adjacent-functions-vports-.patch +ApplyPatch 1520-net-mlx5-e-switch-create-acls-root-namespace-for-adjacent-vp.patch +ApplyPatch 1521-net-mlx5-e-switch-register-representors-for-adjacent-vports.patch +ApplyPatch 1522-net-mlx5-e-switch-set-representor-attributes-for-adjacent-vf.patch +ApplyPatch 1523-net-mlx5-dr-hws-use-the-cached-vhca-id-for-this-device.patch +ApplyPatch 1524-net-mlx5-add-psp-capabilities-structures-and-bits.patch +ApplyPatch 1525-net-mlx5-extract-mtctr-register-read-logic-into-helper-funct.patch +ApplyPatch 1526-net-mlx5-support-getcyclesx-and-getcrosscycles.patch +ApplyPatch 1527-net-mlx5-add-rs-fec-histogram-infrastructure.patch +ApplyPatch 1528-net-mlx5-implement-cqe-compress-type-via-devlink-params.patch +ApplyPatch 1529-net-mlx5-implement-devlink-enable-sriov-parameter.patch +ApplyPatch 1530-net-mlx5-implement-devlink-total-vfs-parameter.patch +ApplyPatch 1531-net-mlx5e-make-pcie-congestion-event-thresholds-configurable.patch +ApplyPatch 1532-net-mlx5e-add-stale-counter-for-pcie-congestion-events.patch +ApplyPatch 1533-net-mlx5-fix-typo-in-pci-irq-c-comment.patch +ApplyPatch 1534-net-mlx5-refactor-devcom-to-use-match-attributes.patch +ApplyPatch 1535-net-mlx5-lag-move-devcom-registration-to-lag-layer.patch +ApplyPatch 1536-net-mlx5-add-net-namespace-support-to-devcom.patch +ApplyPatch 1537-net-mlx5-lag-add-net-namespace-support.patch +ApplyPatch 1538-net-mlx5-remove-vlan-insertion-fields-from-wqe-ether-segment.patch +ApplyPatch 1539-net-mlx5-refactor-macsec-wqe-metadata-shifts.patch +ApplyPatch 1540-net-mlx5e-prevent-wqe-metadata-conflicts-between-timestampin.patch +ApplyPatch 1541-net-mlx5-fix-typo-of-mlx5-eq-doorbel-offset.patch +ApplyPatch 1542-net-mlx5-remove-unused-offset-field-from-mlx5-sq-bfreg.patch +ApplyPatch 1543-net-mlx5e-remove-unused-xsk-param-of-mlx5e-build-xdpsq-param.patch +ApplyPatch 1544-net-mlx5-store-the-global-doorbell-in-mlx5-priv.patch +ApplyPatch 1545-net-mlx5e-prepare-for-using-multiple-tx-doorbells.patch +ApplyPatch 1546-net-mlx5e-prepare-for-using-different-cq-doorbells.patch +ApplyPatch 1547-net-mlx5e-use-multiple-tx-doorbells.patch +ApplyPatch 1548-net-mlx5e-use-multiple-cq-doorbells.patch +ApplyPatch 1549-net-mlx5e-use-the-num-doorbells-devlink-param.patch +ApplyPatch 1550-net-mlx5e-use-unsigned-for-mlx5e-get-max-num-channels.patch +ApplyPatch 1551-net-mlx5-add-uar-access-and-odp-page-fault-counters.patch +ApplyPatch 1552-net-mlx5-change-ttc-rules-to-match-on-undecrypted-esp-packet.patch +ApplyPatch 1553-net-mlx5e-recirculate-decrypted-packets-into-ttc-table.patch +ApplyPatch 1554-net-mlx5e-add-flow-groups-for-the-packets-decrypted-by-crypt.patch +ApplyPatch 1555-net-mlx5e-add-flow-rules-for-the-decrypted-esp-packets.patch +ApplyPatch 1556-net-mlx5-remove-dead-code-from-total-vfs-setter.patch +ApplyPatch 1557-net-mlx5-use-pe-format-specifier-for-error-pointers.patch +ApplyPatch 1558-net-mlx5-expose-uar-access-and-odp-page-fault-counters.patch +ApplyPatch 1559-net-mlx5-add-ifc-bit-for-tir-sq-order-capability.patch +ApplyPatch 1560-net-mlx5-ifc-add-balance-id-and-lag-per-mp-group-bits.patch +ApplyPatch 1561-net-mlx5-stop-polling-for-command-response-if-interface-goes.patch +ApplyPatch 1562-net-mlx5-pagealloc-fix-reclaim-race-during-command-interface.patch +ApplyPatch 1563-net-mlx5-fw-reset-add-reset-timeout-work.patch +ApplyPatch 1564-net-mlx5-improve-write-combining-test-reliability-for-arm64-.patch +ApplyPatch 1565-net-mlx5-hws-generalize-complex-matchers.patch +ApplyPatch 1566-net-mlx5e-prevent-entering-switchdev-mode-with-inconsistent-.patch +ApplyPatch 1567-net-mlx5-improve-qos-error-messages-with-actual-depth-values.patch +ApplyPatch 1568-net-mlx5e-remove-unused-mdev-param-from-rss-indir-init.patch +ApplyPatch 1569-net-mlx5e-introduce-mlx5e-rss-init-params.patch +ApplyPatch 1570-net-mlx5e-introduce-mlx5e-rss-params-for-rss-configuration.patch +ApplyPatch 1571-net-mlx5e-use-extack-in-set-rxfh-callback.patch +ApplyPatch 1572-net-mlx5-prevent-tunnel-mode-conflicts-between-fdb-and-nic-i.patch +ApplyPatch 1573-net-mlx5e-prevent-tunnel-reformat-when-tunnel-mode-not-allow.patch +ApplyPatch 1574-net-mlx5-fix-pre-2-40-binutils-assembler-error.patch +ApplyPatch 1575-net-mlx5e-return-1-instead-of-0-in-invalid-case-in-mlx5e-mpw.patch +ApplyPatch 1576-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch +ApplyPatch 1577-net-mlx5e-rx-fix-generating-skb-from-non-linear-xdp-buff-for.patch +ApplyPatch 1578-net-mlx5-add-pphcr-to-pcam-supported-registers-mask.patch +ApplyPatch 1579-net-mlx5-refactor-devcom-to-return-null-on-failure.patch +ApplyPatch 1580-net-mlx5-fix-ipsec-cleanup-over-mpv-device.patch +ApplyPatch 1581-net-mlx5-don-t-zero-user-count-when-destroying-fdb-tables.patch +ApplyPatch 1582-net-mlx5e-fix-return-value-in-case-of-module-eeprom-read-err.patch +ApplyPatch 1583-net-mlx5e-fix-missing-error-assignment-in-mlx5e-xfrm-add-sta.patch +ApplyPatch 1584-net-mlx5e-trim-the-length-of-the-num-doorbell-error.patch +ApplyPatch 1585-net-mlx5e-fix-maxrate-wraparound-in-threshold-between-units.patch +ApplyPatch 1586-net-mlx5e-fix-wraparound-in-rate-limiting-for-values-above-2.patch +ApplyPatch 1587-net-mlx5e-fix-potentially-misleading-debug-message.patch +ApplyPatch 1588-mlx5-fix-default-values-in-create-cq.patch +ApplyPatch 1589-net-mlx5-clean-up-only-new-irq-glue-on-request-irq-failure.patch +ApplyPatch 1590-net-mlx5e-fix-validation-logic-in-rate-limiting.patch +ApplyPatch 1591-rdma-mlx5-enable-data-direct-with-relaxed-ordering.patch +ApplyPatch 1592-rdma-mlx5-better-estimate-max-qp-wr-to-reflect-wqe-count.patch +ApplyPatch 1593-rdma-mlx5-fix-vport-loopback-forcing-for-mpv-device.patch +ApplyPatch 1594-rdma-mlx5-fix-page-size-bitmap-calculation-for-ksm-mode.patch +ApplyPatch 1595-rdma-use-pe-format-specifier-for-error-pointers.patch +ApplyPatch 1596-rdma-net-mlx5-query-vports-mac-address-from-device.patch +ApplyPatch 1597-net-mlx5-use-common-mlx5-same-hw-devs-function.patch +ApplyPatch 1598-net-mlx5-add-software-system-image-guid-infrastructure.patch +ApplyPatch 1599-net-mlx5-refactor-ptp-clock-devcom-pairing.patch +ApplyPatch 1600-net-mlx5-refactor-hca-cap-2-setting.patch +ApplyPatch 1601-net-mlx5-add-balance-id-support-for-lag-multiplane-groups.patch +ApplyPatch 1602-net-mlx5e-remove-redundant-tstamp-pointer-from-channel-struc.patch +ApplyPatch 1603-net-mlx5e-remove-unnecessary-tstamp-local-variable-in-mlx5i-.patch +ApplyPatch 1604-net-mlx5e-rename-hwstamp-functions-to-hwtstamp.patch +ApplyPatch 1605-net-mlx5e-rename-timestamp-fields-to-hwtstamp-config.patch +ApplyPatch 1606-net-mlx5e-convert-to-new-hwtstamp-get-set-interface.patch +ApplyPatch 1607-net-mlx5e-enhance-function-structures-for-self-loopback-prev.patch +ApplyPatch 1608-net-mlx5e-use-tir-api-in-mlx5e-modify-tirs-lb.patch +ApplyPatch 1609-net-mlx5e-allow-setting-self-loopback-prevention-bits-on-tir.patch +ApplyPatch 1610-net-mlx5-ipoib-set-self-loopback-prevention-in-tir-init.patch +ApplyPatch 1611-net-mlx5e-do-not-re-apply-tir-loopback-configuration-if-not-.patch +ApplyPatch 1612-net-mlx5e-pass-old-channels-as-argument-to-mlx5e-switch-priv.patch +ApplyPatch 1613-net-mlx5e-defer-channels-closure-to-reduce-interface-down-ti.patch +ApplyPatch 1614-pci-tph-expose-pcie-tph-get-st-table-loc.patch +ApplyPatch 1615-net-mlx5-add-direct-st-mode-support-for-rdma.patch +ApplyPatch 1616-net-mlx5-add-other-eswitch-hw-capabilities.patch +ApplyPatch 1617-net-mlx5-fs-add-other-eswitch-support-for-steering-tables.patch +ApplyPatch 1618-net-mlx5-fs-set-non-default-device-per-namespace.patch +ApplyPatch 1619-net-mlx5-mpfs-add-support-for-dynamic-enable-disable.patch +ApplyPatch 1620-net-mlx5-e-switch-support-eswitch-inactive-mode.patch +ApplyPatch 1621-net-mlx5-expose-definition-for-1600gbps-link-mode.patch +ApplyPatch 1622-mlx5-extract-grxrings-from-get-rxnfc.patch +ApplyPatch 1623-net-mlx5-refactor-eeprom-query-error-handling-to-return-stat.patch +ApplyPatch 1624-net-mlx5e-recover-sq-on-excessive-ptp-tx-timestamp-delta.patch +ApplyPatch 1625-net-mlx5-remove-redundant-bw-share-minimal-value-assignment.patch +ApplyPatch 1626-net-mlx5-abort-new-commands-if-all-command-slots-are-stalled.patch +ApplyPatch 1627-net-mlx5-use-eopnotsupp-instead-of-enotsupp.patch +ApplyPatch 1628-net-mlx5-initialize-events-outside-devlink-lock.patch +ApplyPatch 1629-net-mlx5-move-the-esw-mode-notifier-chain-outside-the-devlin.patch +ApplyPatch 1630-net-mlx5-move-the-vhca-event-notifier-outside-of-the-devlink.patch +ApplyPatch 1631-net-mlx5-move-the-sf-hw-table-notifier-outside-the-devlink-l.patch +ApplyPatch 1632-net-mlx5-move-the-sf-table-notifiers-outside-the-devlink-loc.patch +ApplyPatch 1633-net-mlx5-move-sf-dev-table-notifier-registration-outside-the.patch +ApplyPatch 1634-net-mlx5e-use-u64-instead-of-u64-in-ieee-setmaxrate.patch +ApplyPatch 1635-net-mlx5e-rename-upper-limit-mbps-to-upper-limit-100mbps.patch +ApplyPatch 1636-net-mlx5e-use-u8-max-instead-of-hard-coded-magic-number.patch +ApplyPatch 1637-net-mlx5e-use-standard-unit-definitions-for-bandwidth-conver.patch +ApplyPatch 1638-net-mlx5e-update-xdp-features-in-switch-channels.patch +ApplyPatch 1639-net-mlx5e-support-xdp-target-xmit-with-dummy-program.patch +ApplyPatch 1640-net-mlx5-make-enable-mpesw-idempotent.patch +ApplyPatch 1641-net-mlx5-fix-double-unregister-of-hca-ports-component.patch +ApplyPatch 1642-net-mlx5-fw-reset-clear-reset-requested-on-drain-fw-reset.patch +ApplyPatch 1643-net-mlx5-drain-firmware-reset-in-shutdown-callback.patch +ApplyPatch 1644-net-mlx5-fw-tracer-validate-format-string-parameters.patch +ApplyPatch 1645-net-mlx5-fw-tracer-handle-escaped-percent-properly.patch +ApplyPatch 1646-net-mlx5-serialize-firmware-reset-with-devlink.patch +ApplyPatch 1647-net-mlx5e-use-ip6-dst-lookup-instead-of-ipv6-dst-lookup-flow.patch +ApplyPatch 1648-net-mlx5e-trigger-neighbor-resolution-for-unresolved-destina.patch +ApplyPatch 1649-net-mlx5e-do-not-update-bql-of-old-txqs-during-channel-recon.patch +ApplyPatch 1650-net-mlx5-lag-multipath-give-priority-for-routes-with-smaller.patch +ApplyPatch 1651-net-mlx5e-fix-null-pointer-dereference-in-ioctl-module-eepro.patch +ApplyPatch 1652-net-mlx5e-don-t-print-error-message-due-to-invalid-module.patch +ApplyPatch 1653-net-mlx5e-fix-crash-on-profile-change-rollback-failure.patch +ApplyPatch 1654-net-mlx5e-don-t-store-mlx5e-priv-in-mlx5e-dev-devlink-priv.patch +ApplyPatch 1655-net-mlx5e-pass-netdev-to-mlx5e-destroy-netdev-instead-of-pri.patch +ApplyPatch 1656-net-mlx5e-restore-destroying-state-bit-after-profile-cleanup.patch +ApplyPatch 1657-net-mlx5-fix-memory-leak-in-esw-acl-ingress-lgcy-setup.patch +ApplyPatch 1658-net-mlx5-fix-unbinding-uplink-netdev-in-switchdev-mode.patch +ApplyPatch 1659-net-mlx5e-tc-delete-flows-only-for-existing-peers.patch +ApplyPatch 1660-net-mlx5e-account-for-netdev-stats-in-ndo-get-stats64.patch +ApplyPatch 1661-net-mlx5-fix-return-type-mismatch-in-mlx5-esw-vport-vhca-id.patch +ApplyPatch 1662-net-mlx5-fs-fix-inverted-cap-check-in-tx-flow-table-root-dis.patch +ApplyPatch 1663-net-mlx5-fix-vhca-id-access-call-trace-use-before-alloc.patch +ApplyPatch 1664-net-mlx5e-skip-esn-replay-window-setup-for-ipsec-crypto-offl.patch +ApplyPatch 1665-rdma-mlx5-change-default-device-for-lag-slaves-in-rdma-trans.patch +ApplyPatch 1666-rdma-mlx5-add-other-eswitch-support-for-devx-destruction.patch +ApplyPatch 1667-rdma-mlx5-refactor-get-prio-function.patch +ApplyPatch 1668-rdma-mlx5-add-other-eswitch-support-to-userspace-tables.patch +ApplyPatch 1669-ib-mlx5-reduce-imr-ksm-size-when-5-level-paging-is-enabled.patch +ApplyPatch 1670-net-mlx5e-shampo-fix-header-mapping-for-64k-pages.patch +ApplyPatch 1671-net-mlx5e-shampo-fix-skb-size-check-for-64k-pages.patch +ApplyPatch 1672-net-mlx5e-shampo-fix-header-formulas-for-higher-mtus-and-64k.patch +ApplyPatch 1673-net-mlx5-qos-restrict-rtnl-area-to-avoid-a-lock-cycle.patch +ApplyPatch 1674-net-mlx5-fix-peer-miss-rules-host-disabled-checks.patch +ApplyPatch 1675-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-legacy-rq.patch +ApplyPatch 1676-net-mlx5-fix-crash-when-moving-to-switchdev-mode.patch +ApplyPatch 1677-net-mlx5-fix-hca-caps-leak-on-notifier-init-failure.patch +ApplyPatch 1678-net-mlx5e-rx-fix-xdp-multi-buf-frag-counting-for-striding-rq.patch +ApplyPatch 1679-iavf-fix-vlan-filter-lost-on-add-delete-race.patch +ApplyPatch 1680-iavf-rename-iavf-vlan-is-new-to-iavf-vlan-adding.patch +ApplyPatch 1681-iavf-stop-removing-vlan-filters-from-pf-on-interface-down.patch +ApplyPatch 1682-iavf-wait-for-pf-confirmation-before-removing-vlan-filters.patch +ApplyPatch 1683-iavf-add-virtchnl-op-add-vlan-to-success-completion-handler.patch +ApplyPatch 1684-netfilter-skip-recording-stale-or-retransmitted-init.patch +ApplyPatch 1685-sctp-discard-stale-init-after-handshake-completion.patch +ApplyPatch 1686-rdma-vmw-pvrdma-fix-double-free-on-pvrdma-alloc-ucontext-err.patch +ApplyPatch 1687-sched-fair-skip-sched-balance-running-cmpxchg-when-balance-i.patch +ApplyPatch 1688-sched-fair-have-sd-serialize-affect-newidle-balancing.patch +ApplyPatch 1689-powerpc-64-force-inlining-of-prevent-user-access-and-set-kua.patch +ApplyPatch 1690-compiler-gcc-h-remove-ancient-workaround-for-gcc-pr-58670.patch +ApplyPatch 1691-work-around-gcc-bugs-with-asm-goto-with-outputs.patch +ApplyPatch 1692-init-kconfig-fix-cc-has-asm-goto-tied-output-test-with-dash.patch +ApplyPatch 1693-update-workarounds-for-gcc-asm-goto-issue.patch +ApplyPatch 1694-init-kconfig-remove-config-gcc-asm-goto-output-workaround.patch +ApplyPatch 1695-rdma-mlx5-fix-error-path-fall-through-in-mlx5-ib-dev-res-srq.patch # END OF PATCH APPLICATIONS # Any further pre-build tree manipulations happen here. @@ -4221,6 +4987,398 @@ fi # # %changelog +* Mon Jun 29 2026 Andrew Lukoshko - 5.14.0-687.19.1 +- Recreate RHEL 5.14.0-687.19.1 from CentOS Stream 9 and upstream stable backports (1313-1695) +- Retain AlmaLinux ahead-of-RHEL fix for CVE-2026-46316 (1312) +- RHEL changelog for 687.18.1..687.19.1 follows: + +* Thu Jun 25 2026 CKI KWF Bot [5.14.0-687.19.1.el9_8] +- RDMA/mlx5: Fix error path fall-through in mlx5_ib_dev_res_srq_init() (CKI Backport Bot) [RHEL-179994] {CVE-2026-46176} +- init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND (Waiman Long) [RHEL-183183] +- update workarounds for gcc "asm goto" issue (Waiman Long) [RHEL-183183] +- init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash (Waiman Long) [RHEL-183183] +- work around gcc bugs with 'asm goto' with outputs (Waiman Long) [RHEL-183183] +- compiler-gcc.h: remove ancient workaround for gcc PR 58670 (Waiman Long) [RHEL-183183] +- powerpc/64: Force inlining of prevent_user_access() and set_kuap() (Waiman Long) [RHEL-183183] +- sched/fair: Have SD_SERIALIZE affect newidle balancing (CKI Backport Bot) [RHEL-182776] +- sched/fair: Skip sched_balance_running cmpxchg when balance is not due (CKI Backport Bot) [RHEL-182776] +- RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() error path (CKI Backport Bot) [RHEL-179955] {CVE-2026-46189} +- sctp: discard stale INIT after handshake completion (CKI Backport Bot) [RHEL-178273] +- netfilter: skip recording stale or retransmitted INIT (CKI Backport Bot) [RHEL-178273] +- iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler (CKI Backport Bot) [RHEL-172993] +- iavf: wait for PF confirmation before removing VLAN filters (CKI Backport Bot) [RHEL-172993] +- iavf: stop removing VLAN filters from PF on interface down (CKI Backport Bot) [RHEL-172993] +- iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING (CKI Backport Bot) [RHEL-172993] +- iavf: fix VLAN filter lost on add/delete race (CKI Backport Bot) [RHEL-172993] + +* Tue Jun 23 2026 CKI KWF Bot [5.14.0-687.18.1.el9_8] +- net/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQ (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix HCA caps leak on notifier init failure (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix crash when moving to switchdev mode (Kamal Heib) [RHEL-169057] +- net/mlx5e: RX, Fix XDP multi-buf frag counting for legacy RQ (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix peer miss rules host disabled checks (Kamal Heib) [RHEL-169057] +- net/mlx5: qos: Restrict RTNL area to avoid a lock cycle (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO, Fix skb size check for 64K pages (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO, Fix header mapping for 64K pages (Kamal Heib) [RHEL-169057] +- IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Add other eswitch support to userspace tables (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Refactor _get_prio() function (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Add other_eswitch support for devx destruction (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Change default device for LAG slaves in RDMA TRANSPORT namespaces (Kamal Heib) [RHEL-169057] +- net/mlx5e: Skip ESN replay window setup for IPsec crypto offload (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix vhca_id access call trace use before alloc (Kamal Heib) [RHEL-169057] +- net/mlx5: fs, Fix inverted cap check in tx flow table root disconnect (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix return type mismatch in mlx5_esw_vport_vhca_id() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Account for netdev stats in ndo_get_stats64 (Kamal Heib) [RHEL-169057] +- net/mlx5e: TC, delete flows only for existing peers (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix Unbinding uplink-netdev in switchdev mode (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix memory leak in esw_acl_ingress_lgcy_setup() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Restore destroying state bit after profile cleanup (Kamal Heib) [RHEL-169057] +- net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv (Kamal Heib) [RHEL-169057] +- net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix crash on profile change rollback failure (Kamal Heib) [RHEL-169057] +- net/mlx5e: Don't print error message due to invalid module (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix NULL pointer dereference in ioctl module EEPROM query (Kamal Heib) [RHEL-169057] +- net/mlx5: Lag, multipath, give priority for routes with smaller network prefix (Kamal Heib) [RHEL-169057] +- net/mlx5e: Do not update BQL of old txqs during channel reconfiguration (Kamal Heib) [RHEL-169057] +- net/mlx5e: Trigger neighbor resolution for unresolved destinations (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init (Kamal Heib) [RHEL-169057] +- net/mlx5: Serialize firmware reset with devlink (Kamal Heib) [RHEL-169057] +- net/mlx5: fw_tracer, Handle escaped percent properly (Kamal Heib) [RHEL-169057] +- net/mlx5: fw_tracer, Validate format string parameters (Kamal Heib) [RHEL-169057] +- net/mlx5: Drain firmware reset in shutdown callback (Kamal Heib) [RHEL-169057] +- net/mlx5: fw reset, clear reset requested on drain_fw_reset (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix double unregister of HCA_PORTS component (Kamal Heib) [RHEL-169057] +- net/mlx5: make enable_mpesw idempotent (Kamal Heib) [RHEL-169057] +- net/mlx5e: Support XDP target xmit with dummy program (Kamal Heib) [RHEL-169057] +- net/mlx5e: Update XDP features in switch channels (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use standard unit definitions for bandwidth conversion (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use U8_MAX instead of hard coded magic number (Kamal Heib) [RHEL-169057] +- net/mlx5e: Rename upper_limit_mbps to upper_limit_100mbps (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use u64 instead of __u64 in ieee_setmaxrate (Kamal Heib) [RHEL-169057] +- net/mlx5: Move SF dev table notifier registration outside the PF devlink lock (Kamal Heib) [RHEL-169057] +- net/mlx5: Move the SF table notifiers outside the devlink lock (Kamal Heib) [RHEL-169057] +- net/mlx5: Move the SF HW table notifier outside the devlink lock (Kamal Heib) [RHEL-169057] +- net/mlx5: Move the vhca event notifier outside of the devlink lock (Kamal Heib) [RHEL-169057] +- net/mlx5: Move the esw mode notifier chain outside the devlink lock (Kamal Heib) [RHEL-169057] +- net/mlx5: Initialize events outside devlink lock (Kamal Heib) [RHEL-169057] +- net/mlx5: Use EOPNOTSUPP instead of ENOTSUPP (Kamal Heib) [RHEL-169057] +- net/mlx5: Abort new commands if all command slots are stalled (Kamal Heib) [RHEL-169057] +- net/mlx5: Remove redundant bw_share minimal value assignment (Kamal Heib) [RHEL-169057] +- net/mlx5e: Recover SQ on excessive PTP TX timestamp delta (Kamal Heib) [RHEL-169057] +- net/mlx5: Refactor EEPROM query error handling to return status separately (Kamal Heib) [RHEL-169057] +- mlx5: extract GRXRINGS from .get_rxnfc (Kamal Heib) [RHEL-169057] +- net/mlx5: Expose definition for 1600Gbps link mode (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, support eswitch inactive mode (Kamal Heib) [RHEL-169057] +- net/mlx5: MPFS, add support for dynamic enable/disable (Kamal Heib) [RHEL-169057] +- net/mlx5: fs, set non default device per namespace (Kamal Heib) [RHEL-169057] +- net/mlx5: fs, Add other_eswitch support for steering tables (Kamal Heib) [RHEL-169057] +- net/mlx5: Add OTHER_ESWITCH HW capabilities (Kamal Heib) [RHEL-169057] +- net/mlx5: Add direct ST mode support for RDMA (Kamal Heib) [RHEL-169057] +- PCI/TPH: Expose pcie_tph_get_st_table_loc() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Defer channels closure to reduce interface down time (Kamal Heib) [RHEL-169057] +- net/mlx5e: Pass old channels as argument to mlx5e_switch_priv_channels (Kamal Heib) [RHEL-169057] +- net/mlx5e: Do not re-apply TIR loopback configuration if not necessary (Kamal Heib) [RHEL-169057] +- net/mlx5: IPoIB, set self loopback prevention in TIR init (Kamal Heib) [RHEL-169057] +- net/mlx5e: Allow setting self loopback prevention bits on TIR init (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use TIR API in mlx5e_modify_tirs_lb() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Enhance function structures for self loopback prevention application (Kamal Heib) [RHEL-169057] +- net/mlx5e: Convert to new hwtstamp_get/set interface (Kamal Heib) [RHEL-169057] +- net/mlx5e: Rename timestamp fields to hwtstamp_config (Kamal Heib) [RHEL-169057] +- net/mlx5e: Rename hwstamp functions to hwtstamp (Kamal Heib) [RHEL-169057] +- net/mlx5e: Remove unnecessary tstamp local variable in mlx5i_complete_rx_cqe (Kamal Heib) [RHEL-169057] +- net/mlx5e: Remove redundant tstamp pointer from channel structures (Kamal Heib) [RHEL-169057] +- net/mlx5: Add balance ID support for LAG multiplane groups (Kamal Heib) [RHEL-169057] +- net/mlx5: Refactor HCA cap 2 setting (Kamal Heib) [RHEL-169057] +- net/mlx5: Refactor PTP clock devcom pairing (Kamal Heib) [RHEL-169057] +- net/mlx5: Add software system image GUID infrastructure (Kamal Heib) [RHEL-169057] +- net/mlx5: Use common mlx5_same_hw_devs function (Kamal Heib) [RHEL-169057] +- {rdma,net}/mlx5: Query vports mac address from device (Kamal Heib) [RHEL-169057] +- RDMA: Use %pe format specifier for error pointers (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix page size bitmap calculation for KSM mode (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix vport loopback forcing for MPV device (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Better estimate max_qp_wr to reflect WQE count (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Enable Data-Direct with Relaxed Ordering (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix validation logic in rate limiting (Kamal Heib) [RHEL-169057] +- net/mlx5: Clean up only new IRQ glue on request_irq() failure (Kamal Heib) [RHEL-169057] +- mlx5: Fix default values in create CQ (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix potentially misleading debug message (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix maxrate wraparound in threshold between units (Kamal Heib) [RHEL-169057] +- net/mlx5e: Trim the length of the num_doorbell error (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix missing error assignment in mlx5e_xfrm_add_state() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix return value in case of module EEPROM read error (Kamal Heib) [RHEL-169057] +- net/mlx5: Don't zero user_count when destroying FDB tables (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix IPsec cleanup over MPV device (Kamal Heib) [RHEL-169057] +- net/mlx5: Refactor devcom to return NULL on failure (Kamal Heib) [RHEL-169057] +- net/mlx5: Add PPHCR to PCAM supported registers mask (Kamal Heib) [RHEL-169057] +- net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ (Kamal Heib) [RHEL-169057] +- net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ (Kamal Heib) [RHEL-169057] +- net/mlx5e: Return 1 instead of 0 in invalid case in mlx5e_mpwrq_umr_entry_size() (Kamal Heib) [RHEL-169057] +- net/mlx5: fix pre-2.40 binutils assembler error (Kamal Heib) [RHEL-169057] +- net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed (Kamal Heib) [RHEL-169057] +- net/mlx5: Prevent tunnel mode conflicts between FDB and NIC IPsec tables (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use extack in set rxfh callback (Kamal Heib) [RHEL-169057] +- net/mlx5e: Introduce mlx5e_rss_params for RSS configuration (Kamal Heib) [RHEL-169057] +- net/mlx5e: Introduce mlx5e_rss_init_params (Kamal Heib) [RHEL-169057] +- net/mlx5e: Remove unused mdev param from RSS indir init (Kamal Heib) [RHEL-169057] +- net/mlx5: Improve QoS error messages with actual depth values (Kamal Heib) [RHEL-169057] +- net/mlx5e: Prevent entering switchdev mode with inconsistent netns (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Generalize complex matchers (Kamal Heib) [RHEL-169057] +- net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs (Kamal Heib) [RHEL-169057] +- net/mlx5: fw reset, add reset timeout work (Kamal Heib) [RHEL-169057] +- net/mlx5: pagealloc: Fix reclaim race during command interface teardown (Kamal Heib) [RHEL-169057] +- net/mlx5: Stop polling for command response if interface goes down (Kamal Heib) [RHEL-169057] +- net/mlx5: IFC add balance ID and LAG per MP group bits (Kamal Heib) [RHEL-169057] +- net/mlx5: Add IFC bit for TIR/SQ order capability (Kamal Heib) [RHEL-169057] +- net/mlx5: Expose uar access and odp page fault counters (Kamal Heib) [RHEL-169057] +- net/mlx5: Use %pe format specifier for error pointers (Kamal Heib) [RHEL-169057] +- net/mlx5: Remove dead code from total_vfs setter (Kamal Heib) [RHEL-169057] +- net/mlx5e: Add flow rules for the decrypted ESP packets (Kamal Heib) [RHEL-169057] +- net/mlx5e: Add flow groups for the packets decrypted by crypto offload (Kamal Heib) [RHEL-169057] +- net/mlx5e: Recirculate decrypted packets into TTC table (Kamal Heib) [RHEL-169057] +- net/mlx5: Change TTC rules to match on undecrypted ESP packets (Kamal Heib) [RHEL-169057] +- net/mlx5: Add uar access and odp page fault counters (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use unsigned for mlx5e_get_max_num_channels (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use the 'num_doorbells' devlink param (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use multiple CQ doorbells (Kamal Heib) [RHEL-169057] +- net/mlx5e: Use multiple TX doorbells (Kamal Heib) [RHEL-169057] +- net/mlx5e: Prepare for using different CQ doorbells (Kamal Heib) [RHEL-169057] +- net/mlx5e: Prepare for using multiple TX doorbells (Kamal Heib) [RHEL-169057] +- net/mlx5: Store the global doorbell in mlx5_priv (Kamal Heib) [RHEL-169057] +- net/mlx5e: Remove unused 'xsk' param of mlx5e_build_xdpsq_param (Kamal Heib) [RHEL-169057] +- net/mlx5: Remove unused 'offset' field from mlx5_sq_bfreg (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix typo of MLX5_EQ_DOORBEL_OFFSET (Kamal Heib) [RHEL-169057] +- net/mlx5e: Prevent WQE metadata conflicts between timestamping and offloads (Kamal Heib) [RHEL-169057] +- net/mlx5: Refactor MACsec WQE metadata shifts (Kamal Heib) [RHEL-169057] +- net/mlx5: Remove VLAN insertion fields from WQE Ether segment (Kamal Heib) [RHEL-169057] +- net/mlx5: Lag, add net namespace support (Kamal Heib) [RHEL-169057] +- net/mlx5: Add net namespace support to devcom (Kamal Heib) [RHEL-169057] +- net/mlx5: Lag, move devcom registration to LAG layer (Kamal Heib) [RHEL-169057] +- net/mlx5: Refactor devcom to use match attributes (Kamal Heib) [RHEL-169057] +- net/mlx5: fix typo in pci_irq.c comment (Kamal Heib) [RHEL-169057] +- net/mlx5e: Add stale counter for PCIe congestion events (Kamal Heib) [RHEL-169057] +- net/mlx5e: Make PCIe congestion event thresholds configurable (Kamal Heib) [RHEL-169057] +- net/mlx5: Implement devlink total_vfs parameter (Kamal Heib) [RHEL-169057] +- net/mlx5: Implement devlink enable_sriov parameter (Kamal Heib) [RHEL-169057] +- net/mlx5: Implement cqe_compress_type via devlink params (Kamal Heib) [RHEL-169057] +- net/mlx5: Add RS FEC histogram infrastructure (Kamal Heib) [RHEL-169057] +- net/mlx5: Support getcyclesx and getcrosscycles (Kamal Heib) [RHEL-169057] +- net/mlx5: Extract MTCTR register read logic into helper function (Kamal Heib) [RHEL-169057] +- net/mlx5: Add PSP capabilities structures and bits (Kamal Heib) [RHEL-169057] +- net/mlx5: {DR,HWS}, Use the cached vhca_id for this device (Kamal Heib) [RHEL-169057] +- net/mlx5: E-switch, Set representor attributes for adjacent VFs (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, Register representors for adjacent vports (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, Create acls root namespace for adjacent vports (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, Add support for adjacent functions vports discovery (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, Move vport acls root namespaces creation to eswitch (Kamal Heib) [RHEL-169057] +- net/mlx5: FS, Convert vport acls root namespaces to xarray (Kamal Heib) [RHEL-169057] +- eth: mlx5: remove Kconfig co-dependency with VXLAN (Kamal Heib) [RHEL-169057] +- net/mlx5e: Set default burst period for TX and RX reporters (Kamal Heib) [RHEL-169057] +- net/mlx5: Support disabling host PFs (Kamal Heib) [RHEL-169057] +- net/mlx5: Query to see if host PF is disabled (Kamal Heib) [RHEL-169057] +- {rdma,net}/mlx5: export mlx5_vport_get_vhca_id (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, Set/Query hca cap via vhca id (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, Cache vport vhca id on first cap query (Kamal Heib) [RHEL-169057] +- net/mlx5: mlx5_ifc, Add hardware definitions needed for adjacent vports (Kamal Heib) [RHEL-169057] +- net/mlx5: Don't use %pK through tracepoints (Kamal Heib) [RHEL-169057] +- IB/mlx5: Fix obj_type mismatch for SRQ event subscriptions (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Refactor optional counters steering code (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Add DMAH object support (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix incorrect MKEY masking (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: remove redundant check on err on return expression (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Optimize DMABUF mkey page size (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Align mkc page size capability check to PRM (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Check CAP_NET_RAW in user namespace for devx create (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Check CAP_NET_RAW in user namespace for anchor create (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Check CAP_NET_RAW in user namespace for flow create (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Add multiple priorities support to RDMA TRANSPORT userspace tables (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Support driver APIs pre_destroy_cq and post_destroy_cq (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, ignore flow level for multi-dest table (Kamal Heib) [RHEL-169057] +- net/mlx5e: Add a miss level for ipsec crypto offload (Kamal Heib) [RHEL-169057] +- net/mlx5e: Harden uplink netdev access against device unbind (Kamal Heib) [RHEL-169057] +- net/mlx5e: Set local Xoff after FW update (Kamal Heib) [RHEL-169057] +- net/mlx5: Prevent flow steering mode changes in switchdev mode (Kamal Heib) [RHEL-169057] +- net/mlx5: Nack sync reset when SFs are present (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix lockdep assertion on sync reset unload event (Kamal Heib) [RHEL-169057] +- net/mlx5: Reload auxiliary drivers on fw_activate (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix pattern destruction in mlx5hws_pat_get_pattern error path (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix uninitialized variables in mlx5hws_pat_calc_nop error flow (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix memory leak in hws_action_get_shared_stc_nic error flow (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix memory leak in hws_pool_buddy_init error path (Kamal Heib) [RHEL-169057] +- net/mlx5e: Preserve shared buffer capacity during headroom updates (Kamal Heib) [RHEL-169057] +- net/mlx5e: Query FW for buffer ownership (Kamal Heib) [RHEL-169057] +- net/mlx5: Restore missing scheduling node cleanup on vport enable failure (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix QoS reference leak in vport enable error path (Kamal Heib) [RHEL-169057] +- net/mlx5: Destroy vport QoS element when no configuration remains (Kamal Heib) [RHEL-169057] +- net/mlx5e: Preserve tc-bw during parent changes (Kamal Heib) [RHEL-169057] +- net/mlx5: Remove default QoS group and attach vports directly to root TSAR (Kamal Heib) [RHEL-169057] +- net/mlx5: Base ECVF devlink port attrs from 0 (Kamal Heib) [RHEL-169057] +- net/mlx5: CT: Use the correct counter offset (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix table creation UID (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, don't rehash on every kind of insertion failure (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, prevent rehash from filling up the queues (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, fix complex rules rehash error flow (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, fix simple rules rehash error flow (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, fix bad parameter in CQ creation (Kamal Heib) [RHEL-169057] +- net/mlx5: Correctly set gso_segs when LRO is used (Kamal Heib) [RHEL-169057] +- net/mlx5e: Expose TIS via devlink tx reporter diagnose (Kamal Heib) [RHEL-169057] +- net/mlx5e: Support routed networks during IPsec MACs initialization (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix potential deadlock by deferring RX timeout recovery (Kamal Heib) [RHEL-169057] +- net/mlx5e: Remove skb secpath if xfrm state is not found (Kamal Heib) [RHEL-169057] +- net/mlx5e: Clear Read-Only port buffer size in PBMC before update (Kamal Heib) [RHEL-169057] +- net: Fix typos (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix build -Wframe-larger-than warnings (Kamal Heib) [RHEL-169057] +- net/mlx5: Add support for device steering tag (Kamal Heib) [RHEL-169057] +- net/mlx5: Expose IFC bits for TPH (Kamal Heib) [RHEL-169057] +- PCI/TPH: Expose pcie_tph_get_st_table_size() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Remove duplicate mkey from SHAMPO header (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO, Remove mlx5e_shampo_get_log_hd_entry_size() (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO, Cleanup reservation size formula (Kamal Heib) [RHEL-169057] +- net/mlx5: Expose cable_length field in PFCC register (Kamal Heib) [RHEL-169057] +- net/mlx5: Add IFC bits and enums for buf_ownership (Kamal Heib) [RHEL-169057] +- net/mlx5: Add IFC bits to support RSS for IPSec offload (Kamal Heib) [RHEL-169057] +- net/mlx5e: Properly access RCU protected qdisc_sleeping variable (Kamal Heib) [RHEL-169057] +- net/mlx5e: fix kdoc warning on eswitch.h (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Enable IPSec hardware offload in legacy mode (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix an IS_ERR() vs NULL bug in esw_qos_move_node() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Add device PCIe congestion ethtool stats (Kamal Heib) [RHEL-169057] +- net/mlx5e: Create/destroy PCIe Congestion Event object (Kamal Heib) [RHEL-169057] +- net/mlx5: IFC updates for disabled host PF (Kamal Heib) [RHEL-169057] +- net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix UMR modifying of mkey page size (Kamal Heib) [RHEL-169057] +- net/mlx5: Expose HCA capability bits for mkey max page size (Kamal Heib) [RHEL-169057] +- net/mlx5e: RX, Remove unnecessary RQT redirects (Kamal Heib) [RHEL-169057] +- net/mlx5: Warn when write combining is not supported (Kamal Heib) [RHEL-169057] +- net/mlx5e: Replace recursive VLAN push handling with an iterative loop (Kamal Heib) [RHEL-169057] +- net/mlx5e: CT: extract a memcmp from a spinlock section (Kamal Heib) [RHEL-169057] +- net/mlx5e: Remove unused VLAN insertion logic in TX path (Kamal Heib) [RHEL-169057] +- eth: mlx5: migrate to the *_rxfh_context ops (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix spelling mistake "disabliing" -> "disabling" (Kamal Heib) [RHEL-169057] +- net/mlx5: Add HWS as secondary steering mode (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Shrink empty matchers (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Rearrange to prevent forward declaration (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Track matcher sizes individually (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Decouple matcher RX and TX sizes (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Create STEs directly from matcher (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Refactor rule skip logic (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Export rule skip logic (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, remove incorrect comment (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, remove unused create_dest_array parameter (Kamal Heib) [RHEL-169057] +- net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw (Kamal Heib) [RHEL-169057] +- net/mlx5: Add traffic class scheduling support for vport QoS (Kamal Heib) [RHEL-169057] +- net/mlx5: Add support for setting tc-bw on nodes (Kamal Heib) [RHEL-169057] +- net/mlx5: Add no-op implementation for setting tc-bw on rate objects (Kamal Heib) [RHEL-169057] +- net/mlx5: Check device memory pointer before usage (Kamal Heib) [RHEL-169057] +- net/mlx5: fs, fix RDMA TRANSPORT init cleanup flow (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix error handling in RQ memory model registration (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Allocate IB device with net namespace supplied from core dev (Kamal Heib) [RHEL-169057] +- net/mlx5: Add IFC bits for PCIe Congestion Event object (Kamal Heib) [RHEL-169057] +- net/mlx5: Small refactor for general object capabilities (Kamal Heib) [RHEL-169057] +- net/mlx5: fs, add multiple prios to RDMA TRANSPORT steering domain (Kamal Heib) [RHEL-169057] +- net/mlx5e: Support ethtool tcp-data-split settings (Kamal Heib) [RHEL-169057] +- net/mlx5e: Implement queue mgmt ops and single channel swap (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO: Separate pool for headers (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO: Improve hw gro capability checking (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO: Remove redundant params (Kamal Heib) [RHEL-169057] +- net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc (Kamal Heib) [RHEL-169057] +- net/mlx5: Expose serial numbers in devlink info (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix vport loopback for MPV device (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix CC counters query for MPV (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix HW counters query for non-representor devices (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Initialize obj_event->obj_sub_list before xa_insert (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Avoid flexible array warning (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Add support for 200Gbps per lane speeds (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Remove the redundant MLX5_IB_STAGE_UAR stage (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: convert timeouts to secs_to_jiffies() (Kamal Heib) [RHEL-169057] +- net/mlx5: E-Switch, Fix peer miss rules to use peer eswitch (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix memory leak in cmd_exec() (Kamal Heib) [RHEL-169057] +- net/mlx5: Correctly set gso_size when LRO is used (Kamal Heib) [RHEL-169057] +- net/mlx5e: Add new prio for promiscuous mode (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix race between DIM disable and net_dim() (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Fix leak of Geneve TLV option object (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, make sure the uplink is the last destination (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, fix missing ip_version handling in definer (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Init mutex on the correct path (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix return value when searching for existing flow group (Kamal Heib) [RHEL-169057] +- net/mlx5: Ensure fw pages are always allocated on same NUMA (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix an error code in mlx5hws_bwc_rule_create_complex() (Kamal Heib) [RHEL-169057] +- net/mlx5: Add error handling in mlx5_query_nic_vport_node_guid() (Kamal Heib) [RHEL-169057] +- net/mlx5e: Allow setting MAC address of representors (Kamal Heib) [RHEL-169057] +- net/mlx5_core: Add error handling inmlx5_query_nic_vport_qkey_viol_cntr() (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, handle modify header actions dependency (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, fix typo - 'nope' to 'nop' (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, register reformat actions with fw (Kamal Heib) [RHEL-169057] +- net/mlx5: SWS, fix reformat id error handling (Kamal Heib) [RHEL-169057] +- net/mlx5: Use to_delayed_work() (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, dump bad completion details (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, rework rehash loop (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, fix redundant extension of action templates (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, fix counting of rules in the matcher (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, force rehash when rule insertion failed (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, support complex matchers (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, introduce isolated matchers (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, expose polling function in header file (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, add definer function to get field name str (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, expose function mlx5hws_table_ft_set_next_ft in header (Kamal Heib) [RHEL-169057] +- net/mlx5: support software TX timestamp (Kamal Heib) [RHEL-169057] +- RDMA/mlx5: Fix error flow upon firmware failure for RQ destruction (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Disallow matcher IP version mixing (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Harden IP version definer checks (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix IP version decision (Kamal Heib) [RHEL-169057] +- net/mlx5: Fix spelling mistakes in mlx5_core_dbg message and comments (Kamal Heib) [RHEL-169057] +- net/mlx5e: ethtool: Fix formatting of ptp_rq0_csum_complete_tail_slow (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Export action STE tables to debugfs (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Free unused action STE tables (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Cleanup matcher action STE table (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Use the new action STE pool (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Implement action STE pool (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix pool size optimization (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Add fullness tracking to pool (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Cleanup after pool refactoring (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Refactor pool implementation (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Make pool single resource (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Remove unused element array (Kamal Heib) [RHEL-169057] +- net/mlx5: HWS, Fix matcher action template attach (Kamal Heib) [RHEL-169057] +- selinux: RHEL-only hotfix for execmem regression (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- selinux: fix overlayfs mmap() and mprotect() access checks (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- lsm: add backing_file LSM hooks (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: prepare for adding LSM blob to backing_file (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- perf/core: Fix MMAP event path names with backing files (Ondrej Mosnacek) [RHEL-179444] +- ovl: remove redundant IOCB_DIO_CALLER_COMP clearing (Ondrej Mosnacek) [RHEL-179444] +- ovl: remove unneeded non-const conversion (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: constify file ptr in backing_file accessor helpers (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- ovl: Fix nested backing file paths (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- lsm: add helper for blob allocations (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: factor out backing_file_mmap() helper (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: factor out backing_file_splice_{read,write}() helpers (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: factor out backing_file_{read,write}_iter() helpers (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: prepare for stackable filesystems backing file helpers (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: store real path instead of fake path in backing file f_path (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: create helper file_user_path() for user displayed mapped file path (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: get mnt_writers count for an open backing file's real path (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: rename __mnt_{want,drop}_write*() helpers (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: Fix kernel-doc warnings (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- cachefiles: use kiocb_{start,end}_write() helpers (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- lsm: constify the 'file' parameter in security_binder_transfer_file() (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: move cleanup from init_file() into its callers (Ondrej Mosnacek) [RHEL-179444] +- ovl: enable fsnotify events on underlying real files (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: use backing_file container for internal files with "fake" f_path (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: move kmem_cache_zalloc() into alloc_empty_file*() helpers (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- fs: use a helper for opening kernel internal files (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- locks: fix TOCTOU race when granting write lease (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- binder: use cred instead of task for selinux checks (Ondrej Mosnacek) [RHEL-179444] {CVE-2026-46054} +- RDMA/iwcm: Fix workqueue list corruption by removing work_list (CKI Backport Bot) [RHEL-179664] {CVE-2026-45898} +- ALSA: aloop: Fix peer runtime UAF during format-change stop (CKI Backport Bot) [RHEL-179310] {CVE-2026-46090} +- ipv6: icmp: clear skb2->cb[] in ip6_err_gen_icmpv6_unreach() (Guillaume Nault) [RHEL-172670] {CVE-2026-43038} +- drm/amd/display: Do not skip unrelated mode changes in DSC validation (CKI Backport Bot) [RHEL-178836] {CVE-2026-31488} +- netfilter: flowtable: strictly check for maximum number of actions (CKI Backport Bot) [RHEL-176927] {CVE-2026-43329} + * Tue Jun 23 2026 Andrew Lukoshko - 5.14.0-687.17.1 - Add fix for CVE-2026-46316 (KVM arm64 vgic-its translation-cache use-after-free) ahead of RHEL (1312)