282 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			282 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 | |
| .. include:: <isonum.txt>
 | |
| 
 | |
| =========
 | |
| Switchdev
 | |
| =========
 | |
| 
 | |
| :Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 | |
| 
 | |
| .. _mlx5_bridge_offload:
 | |
| 
 | |
| Bridge offload
 | |
| ==============
 | |
| 
 | |
| The mlx5 driver implements support for offloading bridge rules when in switchdev
 | |
| mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev
 | |
| representor is attached to bridge.
 | |
| 
 | |
| - Change device to switchdev mode::
 | |
| 
 | |
|     $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
 | |
| 
 | |
| - Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1'::
 | |
| 
 | |
|     $ ip link set enp8s0f0 master bridge1
 | |
| 
 | |
| VLANs
 | |
| -----
 | |
| 
 | |
| Following bridge VLAN functions are supported by mlx5:
 | |
| 
 | |
| - VLAN filtering (including multiple VLANs per port)::
 | |
| 
 | |
|     $ ip link set bridge1 type bridge vlan_filtering 1
 | |
|     $ bridge vlan add dev enp8s0f0 vid 2-3
 | |
| 
 | |
| - VLAN push on bridge ingress::
 | |
| 
 | |
|     $ bridge vlan add dev enp8s0f0 vid 3 pvid
 | |
| 
 | |
| - VLAN pop on bridge egress::
 | |
| 
 | |
|     $ bridge vlan add dev enp8s0f0 vid 3 untagged
 | |
| 
 | |
| Subfunction
 | |
| ===========
 | |
| 
 | |
| Subfunction which are spawned over the E-switch are created only with devlink
 | |
| device, and by default all the SF auxiliary devices are disabled.
 | |
| This will allow user to configure the SF before the SF have been fully probed,
 | |
| which will save time.
 | |
| 
 | |
| Usage example:
 | |
| 
 | |
| - Create SF::
 | |
| 
 | |
|     $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
 | |
|     $ devlink port function set pci/0000:08:00.0/32768 hw_addr 00:00:00:00:00:11 state active
 | |
| 
 | |
| - Enable ETH auxiliary device::
 | |
| 
 | |
|     $ devlink dev param set auxiliary/mlx5_core.sf.1 name enable_eth value true cmode driverinit
 | |
| 
 | |
| - Now, in order to fully probe the SF, use devlink reload::
 | |
| 
 | |
|     $ devlink dev reload auxiliary/mlx5_core.sf.1
 | |
| 
 | |
| mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`).
 | |
| 
 | |
| mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
 | |
| 
 | |
| A subfunction has its own function capabilities and its own resources. This
 | |
| means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
 | |
| queues are neither shared nor stolen from the parent PCI function.
 | |
| 
 | |
| When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
 | |
| resources neither shared nor stolen from the parent PCI function.
 | |
| 
 | |
| A subfunction has a dedicated window in PCI BAR space that is not shared
 | |
| with the other subfunctions or the parent PCI function. This ensures that all
 | |
| devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
 | |
| PCI BAR space.
 | |
| 
 | |
| A subfunction supports eswitch representation through which it supports tc
 | |
| offloads. The user configures eswitch to send/receive packets from/to
 | |
| the subfunction port.
 | |
| 
 | |
| Subfunctions share PCI level resources such as PCI MSI-X IRQs with
 | |
| other subfunctions and/or with its parent PCI function.
 | |
| 
 | |
| Example mlx5 software, system, and device view::
 | |
| 
 | |
|        _______
 | |
|       | admin |
 | |
|       | user  |----------
 | |
|       |_______|         |
 | |
|           |             |
 | |
|       ____|____       __|______            _________________
 | |
|      |         |     |         |          |                 |
 | |
|      | devlink |     | tc tool |          |    user         |
 | |
|      | tool    |     |_________|          | applications    |
 | |
|      |_________|         |                |_________________|
 | |
|            |             |                   |          |
 | |
|            |             |                   |          |         Userspace
 | |
|  +---------|-------------|-------------------|----------|--------------------+
 | |
|            |             |           +----------+   +----------+   Kernel
 | |
|            |             |           |  netdev  |   | rdma dev |
 | |
|            |             |           +----------+   +----------+
 | |
|    (devlink port add/del |              ^               ^
 | |
|     port function set)   |              |               |
 | |
|            |             |              +---------------|
 | |
|       _____|___          |              |        _______|_______
 | |
|      |         |         |              |       | mlx5 class    |
 | |
|      | devlink |   +------------+       |       |   drivers     |
 | |
|      | kernel  |   | rep netdev |       |       |(mlx5_core,ib) |
 | |
|      |_________|   +------------+       |       |_______________|
 | |
|            |             |              |               ^
 | |
|    (devlink ops)         |              |          (probe/remove)
 | |
|   _________|________     |              |           ____|________
 | |
|  | subfunction      |    |     +---------------+   | subfunction |
 | |
|  | management driver|-----     | subfunction   |---|  driver     |
 | |
|  | (mlx5_core)      |          | auxiliary dev |   | (mlx5_core) |
 | |
|  |__________________|          +---------------+   |_____________|
 | |
|            |                                            ^
 | |
|   (sf add/del, vhca events)                             |
 | |
|            |                                      (device add/del)
 | |
|       _____|____                                    ____|________
 | |
|      |          |                                  | subfunction |
 | |
|      |  PCI NIC |--- activate/deactivate events--->| host driver |
 | |
|      |__________|                                  | (mlx5_core) |
 | |
|                                                    |_____________|
 | |
| 
 | |
| Subfunction is created using devlink port interface.
 | |
| 
 | |
| - Change device to switchdev mode::
 | |
| 
 | |
|     $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
 | |
| 
 | |
| - Add a devlink port of subfunction flavour::
 | |
| 
 | |
|     $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
 | |
|     pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
 | |
|       function:
 | |
|         hw_addr 00:00:00:00:00:00 state inactive opstate detached
 | |
| 
 | |
| - Show a devlink port of the subfunction::
 | |
| 
 | |
|     $ devlink port show pci/0000:06:00.0/32768
 | |
|     pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
 | |
|       function:
 | |
|         hw_addr 00:00:00:00:00:00 state inactive opstate detached
 | |
| 
 | |
| - Delete a devlink port of subfunction after use::
 | |
| 
 | |
|     $ devlink port del pci/0000:06:00.0/32768
 | |
| 
 | |
| Function attributes
 | |
| ===================
 | |
| 
 | |
| The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in
 | |
| a unified way for SmartNIC and non-SmartNIC.
 | |
| 
 | |
| This is supported only when the eswitch mode is set to switchdev. Port function
 | |
| configuration of the PCI VF/SF is supported through devlink eswitch port.
 | |
| 
 | |
| Port function attributes should be set before PCI VF/SF is enumerated by the
 | |
| driver.
 | |
| 
 | |
| MAC address setup
 | |
| -----------------
 | |
| 
 | |
| mlx5 driver support devlink port function attr mechanism to setup MAC
 | |
| address. (refer to Documentation/networking/devlink/devlink-port.rst)
 | |
| 
 | |
| RoCE capability setup
 | |
| ~~~~~~~~~~~~~~~~~~~~~
 | |
| Not all mlx5 PCI devices/SFs require RoCE capability.
 | |
| 
 | |
| When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
 | |
| PCI devices/SF.
 | |
| 
 | |
| mlx5 driver support devlink port function attr mechanism to setup RoCE
 | |
| capability. (refer to Documentation/networking/devlink/devlink-port.rst)
 | |
| 
 | |
| migratable capability setup
 | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | |
| User who wants mlx5 PCI VFs to be able to perform live migration need to
 | |
| explicitly enable the VF migratable capability.
 | |
| 
 | |
| mlx5 driver support devlink port function attr mechanism to setup migratable
 | |
| capability. (refer to Documentation/networking/devlink/devlink-port.rst)
 | |
| 
 | |
| IPsec crypto capability setup
 | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | |
| User who wants mlx5 PCI VFs to be able to perform IPsec crypto offloading need
 | |
| to explicitly enable the VF ipsec_crypto capability. Enabling IPsec capability
 | |
| for VFs is supported starting with ConnectX6dx devices and above. When a VF has
 | |
| IPsec capability enabled, any IPsec offloading is blocked on the PF.
 | |
| 
 | |
| mlx5 driver support devlink port function attr mechanism to setup ipsec_crypto
 | |
| capability. (refer to Documentation/networking/devlink/devlink-port.rst)
 | |
| 
 | |
| IPsec packet capability setup
 | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | |
| User who wants mlx5 PCI VFs to be able to perform IPsec packet offloading need
 | |
| to explicitly enable the VF ipsec_packet capability. Enabling IPsec capability
 | |
| for VFs is supported starting with ConnectX6dx devices and above. When a VF has
 | |
| IPsec capability enabled, any IPsec offloading is blocked on the PF.
 | |
| 
 | |
| mlx5 driver support devlink port function attr mechanism to setup ipsec_packet
 | |
| capability. (refer to Documentation/networking/devlink/devlink-port.rst)
 | |
| 
 | |
| SF state setup
 | |
| --------------
 | |
| 
 | |
| To use the SF, the user must activate the SF using the SF function state
 | |
| attribute.
 | |
| 
 | |
| - Get the state of the SF identified by its unique devlink port index::
 | |
| 
 | |
|    $ devlink port show ens2f0npf0sf88
 | |
|    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
 | |
|      function:
 | |
|        hw_addr 00:00:00:00:88:88 state inactive opstate detached
 | |
| 
 | |
| - Activate the function and verify its state is active::
 | |
| 
 | |
|    $ devlink port function set ens2f0npf0sf88 state active
 | |
| 
 | |
|    $ devlink port show ens2f0npf0sf88
 | |
|    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
 | |
|      function:
 | |
|        hw_addr 00:00:00:00:88:88 state active opstate detached
 | |
| 
 | |
| Upon function activation, the PF driver instance gets the event from the device
 | |
| that a particular SF was activated. It's the cue to put the device on bus, probe
 | |
| it and instantiate the devlink instance and class specific auxiliary devices
 | |
| for it.
 | |
| 
 | |
| - Show the auxiliary device and port of the subfunction::
 | |
| 
 | |
|     $ devlink dev show
 | |
|     devlink dev show auxiliary/mlx5_core.sf.4
 | |
| 
 | |
|     $ devlink port show auxiliary/mlx5_core.sf.4/1
 | |
|     auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
 | |
| 
 | |
|     $ rdma link show mlx5_0/1
 | |
|     link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
 | |
| 
 | |
|     $ rdma dev show
 | |
|     8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
 | |
|     13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
 | |
| 
 | |
| - Subfunction auxiliary device and class device hierarchy::
 | |
| 
 | |
|                  mlx5_core.sf.4
 | |
|           (subfunction auxiliary device)
 | |
|                        /\
 | |
|                       /  \
 | |
|                      /    \
 | |
|                     /      \
 | |
|                    /        \
 | |
|       mlx5_core.eth.4     mlx5_core.rdma.4
 | |
|      (sf eth aux dev)     (sf rdma aux dev)
 | |
|          |                      |
 | |
|          |                      |
 | |
|       p0sf88                  mlx5_0
 | |
|      (sf netdev)          (sf rdma device)
 | |
| 
 | |
| Additionally, the SF port also gets the event when the driver attaches to the
 | |
| auxiliary device of the subfunction. This results in changing the operational
 | |
| state of the function. This provides visibility to the user to decide when is it
 | |
| safe to delete the SF port for graceful termination of the subfunction.
 | |
| 
 | |
| - Show the SF port operational state::
 | |
| 
 | |
|     $ devlink port show ens2f0npf0sf88
 | |
|     pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
 | |
|       function:
 | |
|         hw_addr 00:00:00:00:88:88 state active opstate attached
 |