153 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			153 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| =========================================
 | |
| I915 GuC Submission/DRM Scheduler Section
 | |
| =========================================
 | |
| 
 | |
| Upstream plan
 | |
| =============
 | |
| For upstream the overall plan for landing GuC submission and integrating the
 | |
| i915 with the DRM scheduler is:
 | |
| 
 | |
| * Merge basic GuC submission
 | |
| 	* Basic submission support for all gen11+ platforms
 | |
| 	* Not enabled by default on any current platforms but can be enabled via
 | |
| 	  modparam enable_guc
 | |
| 	* Lots of rework will need to be done to integrate with DRM scheduler so
 | |
| 	  no need to nit pick everything in the code, it just should be
 | |
| 	  functional, no major coding style / layering errors, and not regress
 | |
| 	  execlists
 | |
| 	* Update IGTs / selftests as needed to work with GuC submission
 | |
| 	* Enable CI on supported platforms for a baseline
 | |
| 	* Rework / get CI heathly for GuC submission in place as needed
 | |
| * Merge new parallel submission uAPI
 | |
| 	* Bonding uAPI completely incompatible with GuC submission, plus it has
 | |
| 	  severe design issues in general, which is why we want to retire it no
 | |
| 	  matter what
 | |
| 	* New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
 | |
| 	  which configures a slot with N contexts
 | |
| 	* After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
 | |
| 	  a slot in a single execbuf IOCTL and the batches run on the GPU in
 | |
| 	  paralllel
 | |
| 	* Initially only for GuC submission but execlists can be supported if
 | |
| 	  needed
 | |
| * Convert the i915 to use the DRM scheduler
 | |
| 	* GuC submission backend fully integrated with DRM scheduler
 | |
| 		* All request queues removed from backend (e.g. all backpressure
 | |
| 		  handled in DRM scheduler)
 | |
| 		* Resets / cancels hook in DRM scheduler
 | |
| 		* Watchdog hooks into DRM scheduler
 | |
| 		* Lots of complexity of the GuC backend can be pulled out once
 | |
| 		  integrated with DRM scheduler (e.g. state machine gets
 | |
| 		  simpler, locking gets simpler, etc...)
 | |
| 	* Execlists backend will minimum required to hook in the DRM scheduler
 | |
| 		* Legacy interface
 | |
| 		* Features like timeslicing / preemption / virtual engines would
 | |
| 		  be difficult to integrate with the DRM scheduler and these
 | |
| 		  features are not required for GuC submission as the GuC does
 | |
| 		  these things for us
 | |
| 		* ROI low on fully integrating into DRM scheduler
 | |
| 		* Fully integrating would add lots of complexity to DRM
 | |
| 		  scheduler
 | |
| 	* Port i915 priority inheritance / boosting feature in DRM scheduler
 | |
| 		* Used for i915 page flip, may be useful to other DRM drivers as
 | |
| 		  well
 | |
| 		* Will be an optional feature in the DRM scheduler
 | |
| 	* Remove in-order completion assumptions from DRM scheduler
 | |
| 		* Even when using the DRM scheduler the backends will handle
 | |
| 		  preemption, timeslicing, etc... so it is possible for jobs to
 | |
| 		  finish out of order
 | |
| 	* Pull out i915 priority levels and use DRM priority levels
 | |
| 	* Optimize DRM scheduler as needed
 | |
| 
 | |
| TODOs for GuC submission upstream
 | |
| =================================
 | |
| 
 | |
| * Need an update to GuC firmware / i915 to enable error state capture
 | |
| * Open source tool to decode GuC logs
 | |
| * Public GuC spec
 | |
| 
 | |
| New uAPI for basic GuC submission
 | |
| =================================
 | |
| No major changes are required to the uAPI for basic GuC submission. The only
 | |
| change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
 | |
| This attribute indicates the 2k i915 user priority levels are statically mapped
 | |
| into 3 levels as follows:
 | |
| 
 | |
| * -1k to -1 Low priority
 | |
| * 0 Medium priority
 | |
| * 1 to 1k High priority
 | |
| 
 | |
| This is needed because the GuC only has 4 priority bands. The highest priority
 | |
| band is reserved with the kernel. This aligns with the DRM scheduler priority
 | |
| levels too.
 | |
| 
 | |
| Spec references:
 | |
| ----------------
 | |
| * https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
 | |
| * https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority
 | |
| * https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t
 | |
| 
 | |
| New parallel submission uAPI
 | |
| ============================
 | |
| The existing bonding uAPI is completely broken with GuC submission because
 | |
| whether a submission is a single context submit or parallel submit isn't known
 | |
| until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
 | |
| contexts in parallel with the GuC the context must be explicitly registered with
 | |
| N contexts and all N contexts must be submitted in a single command to the GuC.
 | |
| The GuC interfaces do not support dynamically changing between N contexts as the
 | |
| bonding uAPI does. Hence the need for a new parallel submission interface. Also
 | |
| the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore
 | |
| I915_SUBMIT_FENCE is by design a future fence, so not really something we should
 | |
| continue to support.
 | |
| 
 | |
| The new parallel submission uAPI consists of 3 parts:
 | |
| 
 | |
| * Export engines logical mapping
 | |
| * A 'set_parallel' extension to configure contexts for parallel
 | |
|   submission
 | |
| * Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
 | |
| 
 | |
| Export engines logical mapping
 | |
| ------------------------------
 | |
| Certain use cases require BBs to be placed on engine instances in logical order
 | |
| (e.g. split-frame on gen11+). The logical mapping of engine instances can change
 | |
| based on fusing. Rather than making UMDs be aware of fusing, simply expose the
 | |
| logical mapping with the existing query engine info IOCTL. Also the GuC
 | |
| submission interface currently only supports submitting multiple contexts to
 | |
| engines in logical order which is a new requirement compared to execlists.
 | |
| Lastly, all current platforms have at most 2 engine instances and the logical
 | |
| order is the same as uAPI order. This will change on platforms with more than 2
 | |
| engine instances.
 | |
| 
 | |
| A single bit will be added to drm_i915_engine_info.flags indicating that the
 | |
| logical instance has been returned and a new field,
 | |
| drm_i915_engine_info.logical_instance, returns the logical instance.
 | |
| 
 | |
| A 'set_parallel' extension to configure contexts for parallel submission
 | |
| ------------------------------------------------------------------------
 | |
| The 'set_parallel' extension configures a slot for parallel submission of N BBs.
 | |
| It is a setup step that must be called before using any of the contexts. See
 | |
| I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
 | |
| similar existing examples. Once a slot is configured for parallel submission the
 | |
| execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only
 | |
| supports GuC submission. Execlists supports can be added later if needed.
 | |
| 
 | |
| Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
 | |
| drm_i915_context_engines_parallel_submit to the uAPI to implement this
 | |
| extension.
 | |
| 
 | |
| .. c:namespace-push:: rfc
 | |
| 
 | |
| .. kernel-doc:: include/uapi/drm/i915_drm.h
 | |
|         :functions: i915_context_engines_parallel_submit
 | |
| 
 | |
| .. c:namespace-pop::
 | |
| 
 | |
| Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
 | |
| -------------------------------------------------------------------
 | |
| Contexts that have been configured with the 'set_parallel' extension can only
 | |
| submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects
 | |
| in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is
 | |
| set. The number of BBs is implicit based on the slot submitted and how it has
 | |
| been configured by 'set_parallel' or other extensions. No uAPI changes are
 | |
| required to the execbuf2 IOCTL.
 |