298 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			298 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| ===============================================
 | |
| The irq_domain interrupt number mapping library
 | |
| ===============================================
 | |
| 
 | |
| The current design of the Linux kernel uses a single large number
 | |
| space where each separate IRQ source is assigned a different number.
 | |
| This is simple when there is only one interrupt controller, but in
 | |
| systems with multiple interrupt controllers the kernel must ensure
 | |
| that each one gets assigned non-overlapping allocations of Linux
 | |
| IRQ numbers.
 | |
| 
 | |
| The number of interrupt controllers registered as unique irqchips
 | |
| show a rising tendency: for example subdrivers of different kinds
 | |
| such as GPIO controllers avoid reimplementing identical callback
 | |
| mechanisms as the IRQ core system by modelling their interrupt
 | |
| handlers as irqchips, i.e. in effect cascading interrupt controllers.
 | |
| 
 | |
| Here the interrupt number loose all kind of correspondence to
 | |
| hardware interrupt numbers: whereas in the past, IRQ numbers could
 | |
| be chosen so they matched the hardware IRQ line into the root
 | |
| interrupt controller (i.e. the component actually fireing the
 | |
| interrupt line to the CPU) nowadays this number is just a number.
 | |
| 
 | |
| For this reason we need a mechanism to separate controller-local
 | |
| interrupt numbers, called hardware irq's, from Linux IRQ numbers.
 | |
| 
 | |
| The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of
 | |
| irq numbers, but they don't provide any support for reverse mapping of
 | |
| the controller-local IRQ (hwirq) number into the Linux IRQ number
 | |
| space.
 | |
| 
 | |
| The irq_domain library adds mapping between hwirq and IRQ numbers on
 | |
| top of the irq_alloc_desc*() API.  An irq_domain to manage mapping is
 | |
| preferred over interrupt controller drivers open coding their own
 | |
| reverse mapping scheme.
 | |
| 
 | |
| irq_domain also implements translation from an abstract irq_fwspec
 | |
| structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
 | |
| be easily extended to support other IRQ topology data sources.
 | |
| 
 | |
| irq_domain usage
 | |
| ================
 | |
| 
 | |
| An interrupt controller driver creates and registers an irq_domain by
 | |
| calling one of the irq_domain_add_*() or irq_domain_create_*() functions
 | |
| (each mapping method has a different allocator function, more on that later).
 | |
| The function will return a pointer to the irq_domain on success. The caller
 | |
| must provide the allocator function with an irq_domain_ops structure.
 | |
| 
 | |
| In most cases, the irq_domain will begin empty without any mappings
 | |
| between hwirq and IRQ numbers.  Mappings are added to the irq_domain
 | |
| by calling irq_create_mapping() which accepts the irq_domain and a
 | |
| hwirq number as arguments.  If a mapping for the hwirq doesn't already
 | |
| exist then it will allocate a new Linux irq_desc, associate it with
 | |
| the hwirq, and call the .map() callback so the driver can perform any
 | |
| required hardware setup.
 | |
| 
 | |
| Once a mapping has been established, it can be retrieved or used via a
 | |
| variety of methods:
 | |
| 
 | |
| - irq_resolve_mapping() returns a pointer to the irq_desc structure
 | |
|   for a given domain and hwirq number, and NULL if there was no
 | |
|   mapping.
 | |
| - irq_find_mapping() returns a Linux IRQ number for a given domain and
 | |
|   hwirq number, and 0 if there was no mapping
 | |
| - irq_linear_revmap() is now identical to irq_find_mapping(), and is
 | |
|   deprecated
 | |
| - generic_handle_domain_irq() handles an interrupt described by a
 | |
|   domain and a hwirq number
 | |
| 
 | |
| Note that irq domain lookups must happen in contexts that are
 | |
| compatible with a RCU read-side critical section.
 | |
| 
 | |
| The irq_create_mapping() function must be called *atleast once*
 | |
| before any call to irq_find_mapping(), lest the descriptor will not
 | |
| be allocated.
 | |
| 
 | |
| If the driver has the Linux IRQ number or the irq_data pointer, and
 | |
| needs to know the associated hwirq number (such as in the irq_chip
 | |
| callbacks) then it can be directly obtained from irq_data->hwirq.
 | |
| 
 | |
| Types of irq_domain mappings
 | |
| ============================
 | |
| 
 | |
| There are several mechanisms available for reverse mapping from hwirq
 | |
| to Linux irq, and each mechanism uses a different allocation function.
 | |
| Which reverse map type should be used depends on the use case.  Each
 | |
| of the reverse map types are described below:
 | |
| 
 | |
| Linear
 | |
| ------
 | |
| 
 | |
| ::
 | |
| 
 | |
| 	irq_domain_add_linear()
 | |
| 	irq_domain_create_linear()
 | |
| 
 | |
| The linear reverse map maintains a fixed size table indexed by the
 | |
| hwirq number.  When a hwirq is mapped, an irq_desc is allocated for
 | |
| the hwirq, and the IRQ number is stored in the table.
 | |
| 
 | |
| The Linear map is a good choice when the maximum number of hwirqs is
 | |
| fixed and a relatively small number (~ < 256).  The advantages of this
 | |
| map are fixed time lookup for IRQ numbers, and irq_descs are only
 | |
| allocated for in-use IRQs.  The disadvantage is that the table must be
 | |
| as large as the largest possible hwirq number.
 | |
| 
 | |
| irq_domain_add_linear() and irq_domain_create_linear() are functionally
 | |
| equivalent, except for the first argument is different - the former
 | |
| accepts an Open Firmware specific 'struct device_node', while the latter
 | |
| accepts a more general abstraction 'struct fwnode_handle'.
 | |
| 
 | |
| The majority of drivers should use the linear map.
 | |
| 
 | |
| Tree
 | |
| ----
 | |
| 
 | |
| ::
 | |
| 
 | |
| 	irq_domain_add_tree()
 | |
| 	irq_domain_create_tree()
 | |
| 
 | |
| The irq_domain maintains a radix tree map from hwirq numbers to Linux
 | |
| IRQs.  When an hwirq is mapped, an irq_desc is allocated and the
 | |
| hwirq is used as the lookup key for the radix tree.
 | |
| 
 | |
| The tree map is a good choice if the hwirq number can be very large
 | |
| since it doesn't need to allocate a table as large as the largest
 | |
| hwirq number.  The disadvantage is that hwirq to IRQ number lookup is
 | |
| dependent on how many entries are in the table.
 | |
| 
 | |
| irq_domain_add_tree() and irq_domain_create_tree() are functionally
 | |
| equivalent, except for the first argument is different - the former
 | |
| accepts an Open Firmware specific 'struct device_node', while the latter
 | |
| accepts a more general abstraction 'struct fwnode_handle'.
 | |
| 
 | |
| Very few drivers should need this mapping.
 | |
| 
 | |
| No Map
 | |
| ------
 | |
| 
 | |
| ::
 | |
| 
 | |
| 	irq_domain_add_nomap()
 | |
| 
 | |
| The No Map mapping is to be used when the hwirq number is
 | |
| programmable in the hardware.  In this case it is best to program the
 | |
| Linux IRQ number into the hardware itself so that no mapping is
 | |
| required.  Calling irq_create_direct_mapping() will allocate a Linux
 | |
| IRQ number and call the .map() callback so that driver can program the
 | |
| Linux IRQ number into the hardware.
 | |
| 
 | |
| Most drivers cannot use this mapping, and it is now gated on the
 | |
| CONFIG_IRQ_DOMAIN_NOMAP option. Please refrain from introducing new
 | |
| users of this API.
 | |
| 
 | |
| Legacy
 | |
| ------
 | |
| 
 | |
| ::
 | |
| 
 | |
| 	irq_domain_add_simple()
 | |
| 	irq_domain_add_legacy()
 | |
| 	irq_domain_create_simple()
 | |
| 	irq_domain_create_legacy()
 | |
| 
 | |
| The Legacy mapping is a special case for drivers that already have a
 | |
| range of irq_descs allocated for the hwirqs.  It is used when the
 | |
| driver cannot be immediately converted to use the linear mapping.  For
 | |
| example, many embedded system board support files use a set of #defines
 | |
| for IRQ numbers that are passed to struct device registrations.  In that
 | |
| case the Linux IRQ numbers cannot be dynamically assigned and the legacy
 | |
| mapping should be used.
 | |
| 
 | |
| As the name implies, the \*_legacy() functions are deprecated and only
 | |
| exist to ease the support of ancient platforms. No new users should be
 | |
| added. Same goes for the \*_simple() functions when their use results
 | |
| in the legacy behaviour.
 | |
| 
 | |
| The legacy map assumes a contiguous range of IRQ numbers has already
 | |
| been allocated for the controller and that the IRQ number can be
 | |
| calculated by adding a fixed offset to the hwirq number, and
 | |
| visa-versa.  The disadvantage is that it requires the interrupt
 | |
| controller to manage IRQ allocations and it requires an irq_desc to be
 | |
| allocated for every hwirq, even if it is unused.
 | |
| 
 | |
| The legacy map should only be used if fixed IRQ mappings must be
 | |
| supported.  For example, ISA controllers would use the legacy map for
 | |
| mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ
 | |
| numbers.
 | |
| 
 | |
| Most users of legacy mappings should use irq_domain_add_simple() or
 | |
| irq_domain_create_simple() which will use a legacy domain only if an IRQ range
 | |
| is supplied by the system and will otherwise use a linear domain mapping.
 | |
| The semantics of this call are such that if an IRQ range is specified then
 | |
| descriptors will be allocated on-the-fly for it, and if no range is
 | |
| specified it will fall through to irq_domain_add_linear() or
 | |
| irq_domain_create_linear() which means *no* irq descriptors will be allocated.
 | |
| 
 | |
| A typical use case for simple domains is where an irqchip provider
 | |
| is supporting both dynamic and static IRQ assignments.
 | |
| 
 | |
| In order to avoid ending up in a situation where a linear domain is
 | |
| used and no descriptor gets allocated it is very important to make sure
 | |
| that the driver using the simple domain call irq_create_mapping()
 | |
| before any irq_find_mapping() since the latter will actually work
 | |
| for the static IRQ assignment case.
 | |
| 
 | |
| irq_domain_add_simple() and irq_domain_create_simple() as well as
 | |
| irq_domain_add_legacy() and irq_domain_create_legacy() are functionally
 | |
| equivalent, except for the first argument is different - the former
 | |
| accepts an Open Firmware specific 'struct device_node', while the latter
 | |
| accepts a more general abstraction 'struct fwnode_handle'.
 | |
| 
 | |
| Hierarchy IRQ domain
 | |
| --------------------
 | |
| 
 | |
| On some architectures, there may be multiple interrupt controllers
 | |
| involved in delivering an interrupt from the device to the target CPU.
 | |
| Let's look at a typical interrupt delivering path on x86 platforms::
 | |
| 
 | |
|   Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
 | |
| 
 | |
| There are three interrupt controllers involved:
 | |
| 
 | |
| 1) IOAPIC controller
 | |
| 2) Interrupt remapping controller
 | |
| 3) Local APIC controller
 | |
| 
 | |
| To support such a hardware topology and make software architecture match
 | |
| hardware architecture, an irq_domain data structure is built for each
 | |
| interrupt controller and those irq_domains are organized into hierarchy.
 | |
| When building irq_domain hierarchy, the irq_domain near to the device is
 | |
| child and the irq_domain near to CPU is parent. So a hierarchy structure
 | |
| as below will be built for the example above::
 | |
| 
 | |
| 	CPU Vector irq_domain (root irq_domain to manage CPU vectors)
 | |
| 		^
 | |
| 		|
 | |
| 	Interrupt Remapping irq_domain (manage irq_remapping entries)
 | |
| 		^
 | |
| 		|
 | |
| 	IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
 | |
| 
 | |
| There are four major interfaces to use hierarchy irq_domain:
 | |
| 
 | |
| 1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
 | |
|    controller related resources to deliver these interrupts.
 | |
| 2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
 | |
|    related resources associated with these interrupts.
 | |
| 3) irq_domain_activate_irq(): activate interrupt controller hardware to
 | |
|    deliver the interrupt.
 | |
| 4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
 | |
|    to stop delivering the interrupt.
 | |
| 
 | |
| Following changes are needed to support hierarchy irq_domain:
 | |
| 
 | |
| 1) a new field 'parent' is added to struct irq_domain; it's used to
 | |
|    maintain irq_domain hierarchy information.
 | |
| 2) a new field 'parent_data' is added to struct irq_data; it's used to
 | |
|    build hierarchy irq_data to match hierarchy irq_domains. The irq_data
 | |
|    is used to store irq_domain pointer and hardware irq number.
 | |
| 3) new callbacks are added to struct irq_domain_ops to support hierarchy
 | |
|    irq_domain operations.
 | |
| 
 | |
| With support of hierarchy irq_domain and hierarchy irq_data ready, an
 | |
| irq_domain structure is built for each interrupt controller, and an
 | |
| irq_data structure is allocated for each irq_domain associated with an
 | |
| IRQ. Now we could go one step further to support stacked(hierarchy)
 | |
| irq_chip. That is, an irq_chip is associated with each irq_data along
 | |
| the hierarchy. A child irq_chip may implement a required action by
 | |
| itself or by cooperating with its parent irq_chip.
 | |
| 
 | |
| With stacked irq_chip, interrupt controller driver only needs to deal
 | |
| with the hardware managed by itself and may ask for services from its
 | |
| parent irq_chip when needed. So we could achieve a much cleaner
 | |
| software architecture.
 | |
| 
 | |
| For an interrupt controller driver to support hierarchy irq_domain, it
 | |
| needs to:
 | |
| 
 | |
| 1) Implement irq_domain_ops.alloc and irq_domain_ops.free
 | |
| 2) Optionally implement irq_domain_ops.activate and
 | |
|    irq_domain_ops.deactivate.
 | |
| 3) Optionally implement an irq_chip to manage the interrupt controller
 | |
|    hardware.
 | |
| 4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap,
 | |
|    they are unused with hierarchy irq_domain.
 | |
| 
 | |
| Hierarchy irq_domain is in no way x86 specific, and is heavily used to
 | |
| support other architectures, such as ARM, ARM64 etc.
 | |
| 
 | |
| Debugging
 | |
| =========
 | |
| 
 | |
| Most of the internals of the IRQ subsystem are exposed in debugfs by
 | |
| turning CONFIG_GENERIC_IRQ_DEBUGFS on.
 |