228 lines
		
	
	
		
			9.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			228 lines
		
	
	
		
			9.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. SPDX-License-Identifier: GPL-2.0+
 | ||
| 
 | ||
| ======================================================
 | ||
| IBM Virtual Management Channel Kernel Driver (IBMVMC)
 | ||
| ======================================================
 | ||
| 
 | ||
| :Authors:
 | ||
| 	Dave Engebretsen <engebret@us.ibm.com>,
 | ||
| 	Adam Reznechek <adreznec@linux.vnet.ibm.com>,
 | ||
| 	Steven Royer <seroyer@linux.vnet.ibm.com>,
 | ||
| 	Bryant G. Ly <bryantly@linux.vnet.ibm.com>,
 | ||
| 
 | ||
| Introduction
 | ||
| ============
 | ||
| 
 | ||
| Note: Knowledge of virtualization technology is required to understand
 | ||
| this document.
 | ||
| 
 | ||
| A good reference document would be:
 | ||
| 
 | ||
| https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf
 | ||
| 
 | ||
| The Virtual Management Channel (VMC) is a logical device which provides an
 | ||
| interface between the hypervisor and a management partition. This interface
 | ||
| is like a message passing interface. This management partition is intended
 | ||
| to provide an alternative to systems that use a Hardware Management
 | ||
| Console (HMC) - based system management.
 | ||
| 
 | ||
| The primary hardware management solution that is developed by IBM relies
 | ||
| on an appliance server named the Hardware Management Console (HMC),
 | ||
| packaged as an external tower or rack-mounted personal computer. In a
 | ||
| Power Systems environment, a single HMC can manage multiple POWER
 | ||
| processor-based systems.
 | ||
| 
 | ||
| Management Application
 | ||
| ----------------------
 | ||
| 
 | ||
| In the management partition, a management application exists which enables
 | ||
| a system administrator to configure the system’s partitioning
 | ||
| characteristics via a command line interface (CLI) or Representational
 | ||
| State Transfer Application (REST API's).
 | ||
| 
 | ||
| The management application runs on a Linux logical partition on a
 | ||
| POWER8 or newer processor-based server that is virtualized by PowerVM.
 | ||
| System configuration, maintenance, and control functions which
 | ||
| traditionally require an HMC can be implemented in the management
 | ||
| application using a combination of HMC to hypervisor interfaces and
 | ||
| existing operating system methods. This tool provides a subset of the
 | ||
| functions implemented by the HMC and enables basic partition configuration.
 | ||
| The set of HMC to hypervisor messages supported by the management
 | ||
| application component are passed to the hypervisor over a VMC interface,
 | ||
| which is defined below.
 | ||
| 
 | ||
| The VMC enables the management partition to provide basic partitioning
 | ||
| functions:
 | ||
| 
 | ||
| - Logical Partitioning Configuration
 | ||
| - Start, and stop actions for individual partitions
 | ||
| - Display of partition status
 | ||
| - Management of virtual Ethernet
 | ||
| - Management of virtual Storage
 | ||
| - Basic system management
 | ||
| 
 | ||
| Virtual Management Channel (VMC)
 | ||
| --------------------------------
 | ||
| 
 | ||
| A logical device, called the Virtual Management Channel (VMC), is defined
 | ||
| for communicating between the management application and the hypervisor. It
 | ||
| basically creates the pipes that enable virtualization management
 | ||
| software. This device is presented to a designated management partition as
 | ||
| a virtual device.
 | ||
| 
 | ||
| This communication device uses Command/Response Queue (CRQ) and the
 | ||
| Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is
 | ||
| defined that must take place to establish that both the hypervisor and
 | ||
| management partition sides of the channel are running prior to
 | ||
| sending/receiving any of the protocol messages.
 | ||
| 
 | ||
| This driver also utilizes Transport Event CRQs. CRQ messages are sent
 | ||
| when the hypervisor detects one of the peer partitions has abnormally
 | ||
| terminated, or one side has called H_FREE_CRQ to close their CRQ.
 | ||
| Two new classes of CRQ messages are introduced for the VMC device. VMC
 | ||
| Administrative messages are used for each partition using the VMC to
 | ||
| communicate capabilities to their partner. HMC Interface messages are used
 | ||
| for the actual flow of HMC messages between the management partition and
 | ||
| the hypervisor. As most HMC messages far exceed the size of a CRQ buffer,
 | ||
| a virtual DMA (RMDA) of the HMC message data is done prior to each HMC
 | ||
| Interface CRQ message. Only the management partition drives RDMA
 | ||
| operations; hypervisors never directly cause the movement of message data.
 | ||
| 
 | ||
| 
 | ||
| Terminology
 | ||
| -----------
 | ||
| RDMA
 | ||
|         Remote Direct Memory Access is DMA transfer from the server to its
 | ||
|         client or from the server to its partner partition. DMA refers
 | ||
|         to both physical I/O to and from memory operations and to memory
 | ||
|         to memory move operations.
 | ||
| CRQ
 | ||
|         Command/Response Queue a facility which is used to communicate
 | ||
|         between partner partitions. Transport events which are signaled
 | ||
|         from the hypervisor to partition are also reported in this queue.
 | ||
| 
 | ||
| Example Management Partition VMC Driver Interface
 | ||
| =================================================
 | ||
| 
 | ||
| This section provides an example for the management application
 | ||
| implementation where a device driver is used to interface to the VMC
 | ||
| device. This driver consists of a new device, for example /dev/ibmvmc,
 | ||
| which provides interfaces to open, close, read, write, and perform
 | ||
| ioctl’s against the VMC device.
 | ||
| 
 | ||
| VMC Interface Initialization
 | ||
| ----------------------------
 | ||
| 
 | ||
| The device driver is responsible for initializing the VMC when the driver
 | ||
| is loaded. It first creates and initializes the CRQ. Next, an exchange of
 | ||
| VMC capabilities is performed to indicate the code version and number of
 | ||
| resources available in both the management partition and the hypervisor.
 | ||
| Finally, the hypervisor requests that the management partition create an
 | ||
| initial pool of VMC buffers, one buffer for each possible HMC connection,
 | ||
| which will be used for management application  session initialization.
 | ||
| Prior to completion of this initialization sequence, the device returns
 | ||
| EBUSY to open() calls. EIO is returned for all open() failures.
 | ||
| 
 | ||
| ::
 | ||
| 
 | ||
|         Management Partition		Hypervisor
 | ||
|                         CRQ INIT
 | ||
|         ---------------------------------------->
 | ||
|         	   CRQ INIT COMPLETE
 | ||
|         <----------------------------------------
 | ||
|         	      CAPABILITIES
 | ||
|         ---------------------------------------->
 | ||
|         	 CAPABILITIES RESPONSE
 | ||
|         <----------------------------------------
 | ||
|               ADD BUFFER (HMC IDX=0,1,..)         _
 | ||
|         <----------------------------------------  |
 | ||
|         	  ADD BUFFER RESPONSE              | - Perform # HMCs Iterations
 | ||
|         ----------------------------------------> -
 | ||
| 
 | ||
| VMC Interface Open
 | ||
| ------------------
 | ||
| 
 | ||
| After the basic VMC channel has been initialized, an HMC session level
 | ||
| connection can be established. The application layer performs an open() to
 | ||
| the VMC device and executes an ioctl() against it, indicating the HMC ID
 | ||
| (32 bytes of data) for this session. If the VMC device is in an invalid
 | ||
| state, EIO will be returned for the ioctl(). The device driver creates a
 | ||
| new HMC session value (ranging from 1 to 255) and HMC index value (starting
 | ||
| at index 0 and ranging to 254) for this HMC ID. The driver then does an
 | ||
| RDMA of the HMC ID to the hypervisor, and then sends an Interface Open
 | ||
| message to the hypervisor to establish the session over the VMC. After the
 | ||
| hypervisor receives this information, it sends Add Buffer messages to the
 | ||
| management partition to seed an initial pool of buffers for the new HMC
 | ||
| connection. Finally, the hypervisor sends an Interface Open Response
 | ||
| message, to indicate that it is ready for normal runtime messaging. The
 | ||
| following illustrates this VMC flow:
 | ||
| 
 | ||
| ::
 | ||
| 
 | ||
|         Management Partition             Hypervisor
 | ||
|         	      RDMA HMC ID
 | ||
|         ---------------------------------------->
 | ||
|         	    Interface Open
 | ||
|         ---------------------------------------->
 | ||
|         	      Add Buffer                  _
 | ||
|         <----------------------------------------  |
 | ||
|         	  Add Buffer Response              | - Perform N Iterations
 | ||
|         ----------------------------------------> -
 | ||
|         	Interface Open Response
 | ||
|         <----------------------------------------
 | ||
| 
 | ||
| VMC Interface Runtime
 | ||
| ---------------------
 | ||
| 
 | ||
| During normal runtime, the management application and the hypervisor
 | ||
| exchange HMC messages via the Signal VMC message and RDMA operations. When
 | ||
| sending data to the hypervisor, the management application performs a
 | ||
| write() to the VMC device, and the driver RDMA’s the data to the hypervisor
 | ||
| and then sends a Signal Message. If a write() is attempted before VMC
 | ||
| device buffers have been made available by the hypervisor, or no buffers
 | ||
| are currently available, EBUSY is returned in response to the write(). A
 | ||
| write() will return EIO for all other errors, such as an invalid device
 | ||
| state. When the hypervisor sends a message to the management, the data is
 | ||
| put into a VMC buffer and an Signal Message is sent to the VMC driver in
 | ||
| the management partition. The driver RDMA’s the buffer into the partition
 | ||
| and passes the data up to the appropriate management application via a
 | ||
| read() to the VMC device. The read() request blocks if there is no buffer
 | ||
| available to read. The management application may use select() to wait for
 | ||
| the VMC device to become ready with data to read.
 | ||
| 
 | ||
| ::
 | ||
| 
 | ||
|         Management Partition             Hypervisor
 | ||
|         		MSG RDMA
 | ||
|         ---------------------------------------->
 | ||
|         		SIGNAL MSG
 | ||
|         ---------------------------------------->
 | ||
|         		SIGNAL MSG
 | ||
|         <----------------------------------------
 | ||
|         		MSG RDMA
 | ||
|         <----------------------------------------
 | ||
| 
 | ||
| VMC Interface Close
 | ||
| -------------------
 | ||
| 
 | ||
| HMC session level connections are closed by the management partition when
 | ||
| the application layer performs a close() against the device. This action
 | ||
| results in an Interface Close message flowing to the hypervisor, which
 | ||
| causes the session to be terminated. The device driver must free any
 | ||
| storage allocated for buffers for this HMC connection.
 | ||
| 
 | ||
| ::
 | ||
| 
 | ||
|         Management Partition             Hypervisor
 | ||
|         	     INTERFACE CLOSE
 | ||
|         ---------------------------------------->
 | ||
|                 INTERFACE CLOSE RESPONSE
 | ||
|         <----------------------------------------
 | ||
| 
 | ||
| Additional Information
 | ||
| ======================
 | ||
| 
 | ||
| For more information on the documentation for CRQ Messages, VMC Messages,
 | ||
| HMC interface Buffers, and signal messages please refer to the Linux on
 | ||
| Power Architecture Platform Reference. Section F.
 |