89 lines
		
	
	
		
			2.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			89 lines
		
	
	
		
			2.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| ===============
 | |
| Persistent data
 | |
| ===============
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| The more-sophisticated device-mapper targets require complex metadata
 | |
| that is managed in kernel.  In late 2010 we were seeing that various
 | |
| different targets were rolling their own data structures, for example:
 | |
| 
 | |
| - Mikulas Patocka's multisnap implementation
 | |
| - Heinz Mauelshagen's thin provisioning target
 | |
| - Another btree-based caching target posted to dm-devel
 | |
| - Another multi-snapshot target based on a design of Daniel Phillips
 | |
| 
 | |
| Maintaining these data structures takes a lot of work, so if possible
 | |
| we'd like to reduce the number.
 | |
| 
 | |
| The persistent-data library is an attempt to provide a re-usable
 | |
| framework for people who want to store metadata in device-mapper
 | |
| targets.  It's currently used by the thin-provisioning target and an
 | |
| upcoming hierarchical storage target.
 | |
| 
 | |
| Overview
 | |
| ========
 | |
| 
 | |
| The main documentation is in the header files which can all be found
 | |
| under drivers/md/persistent-data.
 | |
| 
 | |
| The block manager
 | |
| -----------------
 | |
| 
 | |
| dm-block-manager.[hc]
 | |
| 
 | |
| This provides access to the data on disk in fixed sized-blocks.  There
 | |
| is a read/write locking interface to prevent concurrent accesses, and
 | |
| keep data that is being used in the cache.
 | |
| 
 | |
| Clients of persistent-data are unlikely to use this directly.
 | |
| 
 | |
| The transaction manager
 | |
| -----------------------
 | |
| 
 | |
| dm-transaction-manager.[hc]
 | |
| 
 | |
| This restricts access to blocks and enforces copy-on-write semantics.
 | |
| The only way you can get hold of a writable block through the
 | |
| transaction manager is by shadowing an existing block (ie. doing
 | |
| copy-on-write) or allocating a fresh one.  Shadowing is elided within
 | |
| the same transaction so performance is reasonable.  The commit method
 | |
| ensures that all data is flushed before it writes the superblock.
 | |
| On power failure your metadata will be as it was when last committed.
 | |
| 
 | |
| The Space Maps
 | |
| --------------
 | |
| 
 | |
| dm-space-map.h
 | |
| dm-space-map-metadata.[hc]
 | |
| dm-space-map-disk.[hc]
 | |
| 
 | |
| On-disk data structures that keep track of reference counts of blocks.
 | |
| Also acts as the allocator of new blocks.  Currently two
 | |
| implementations: a simpler one for managing blocks on a different
 | |
| device (eg. thinly-provisioned data blocks); and one for managing
 | |
| the metadata space.  The latter is complicated by the need to store
 | |
| its own data within the space it's managing.
 | |
| 
 | |
| The data structures
 | |
| -------------------
 | |
| 
 | |
| dm-btree.[hc]
 | |
| dm-btree-remove.c
 | |
| dm-btree-spine.c
 | |
| dm-btree-internal.h
 | |
| 
 | |
| Currently there is only one data structure, a hierarchical btree.
 | |
| There are plans to add more.  For example, something with an
 | |
| array-like interface would see a lot of use.
 | |
| 
 | |
| The btree is 'hierarchical' in that you can define it to be composed
 | |
| of nested btrees, and take multiple keys.  For example, the
 | |
| thin-provisioning target uses a btree with two levels of nesting.
 | |
| The first maps a device id to a mapping tree, and that in turn maps a
 | |
| virtual block to a physical block.
 | |
| 
 | |
| Values stored in the btrees can have arbitrary size.  Keys are always
 | |
| 64bits, although nesting allows you to use multiple keys.
 |