Blame view

Documentation/prio_tree.txt 5.23 KB
1da177e4c   Linus Torvalds   Linux-2.6.12-rc2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
  The prio_tree.c code indexes vmas using 3 different indexes:
  	* heap_index  = vm_pgoff + vm_size_in_pages : end_vm_pgoff
  	* radix_index = vm_pgoff : start_vm_pgoff
  	* size_index = vm_size_in_pages
  
  A regular radix-priority-search-tree indexes vmas using only heap_index and
  radix_index. The conditions for indexing are:
  	* ->heap_index >= ->left->heap_index &&
  		->heap_index >= ->right->heap_index
  	* if (->heap_index == ->left->heap_index)
  		then ->radix_index < ->left->radix_index;
  	* if (->heap_index == ->right->heap_index)
  		then ->radix_index < ->right->radix_index;
  	* nodes are hashed to left or right subtree using radix_index
  	  similar to a pure binary radix tree.
  
  A regular radix-priority-search-tree helps to store and query
  intervals (vmas). However, a regular radix-priority-search-tree is only
  suitable for storing vmas with different radix indices (vm_pgoff).
  
  Therefore, the prio_tree.c extends the regular radix-priority-search-tree
  to handle many vmas with the same vm_pgoff. Such vmas are handled in
  2 different ways: 1) All vmas with the same radix _and_ heap indices are
  linked using vm_set.list, 2) if there are many vmas with the same radix
  index, but different heap indices and if the regular radix-priority-search
  tree cannot index them all, we build an overflow-sub-tree that indexes such
  vmas using heap and size indices instead of heap and radix indices. For
  example, in the figure below some vmas with vm_pgoff = 0 (zero) are
  indexed by regular radix-priority-search-tree whereas others are pushed
  into an overflow-subtree. Note that all vmas in an overflow-sub-tree have
  the same vm_pgoff (radix_index) and if necessary we build different
  overflow-sub-trees to handle each possible radix_index. For example,
  in figure we have 3 overflow-sub-trees corresponding to radix indices
  0, 2, and 4.
  
  In the final tree the first few (prio_tree_root->index_bits) levels
  are indexed using heap and radix indices whereas the overflow-sub-trees below
  those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are
  indexed using heap and size indices. In overflow-sub-trees the size_index
  is used for hashing the nodes to appropriate places.
  
  Now, an example prio_tree:
  
    vmas are represented [radix_index, size_index, heap_index]
                   i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff]
  
  level  prio_tree_root->index_bits = 3
  -----
  												_
    0			 				[0,7,7]					 |
    							/     \					 |
  				      ------------------       ------------			 |     Regular
    				     /					   \			 |  radix priority
    1		 		[1,6,7]					  [4,3,7]		 |   search tree
    				/     \					  /     \		 |
  			 -------       -----			    ------       -----		 |  heap-and-radix
  			/		    \			   /		      \		 |      indexed
    2		    [0,6,6]	 	   [2,5,7]		[5,2,7]		    [6,1,7]	 |
  		    /     \		   /     \		/     \		    /     \	 |
    3		[0,5,5]	[1,5,6]		[2,4,6]	[3,4,7]	    [4,2,6] [5,1,6]	[6,0,6]	[7,0,7]	 |
  		   /			   /		       /		   		_
                    /		          /		      /					_
    4	      [0,4,4]		      [2,3,5]		   [4,1,5]				 |
    		 /			 /		      /					 |
    5	     [0,3,3]		     [2,2,4]		  [4,0,4]				 |  Overflow-sub-trees
    		/			/							 |
    6	    [0,2,2]		    [2,1,3]							 |    heap-and-size
    	       /		       /							 |       indexed
    7	   [0,1,1]		   [2,0,2]							 |
    	      /											 |
    8	  [0,0,0]										 |
    												_
  
  Note that we use prio_tree_root->index_bits to optimize the height
  of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is
  set according to the maximum end_vm_pgoff mapped, we are sure that all
  bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore,
  we only use the first prio_tree_root->index_bits as radix_index.
  Whenever index_bits is increased in prio_tree_expand, we shuffle the tree
  to make sure that the first prio_tree_root->index_bits levels of the tree
  is indexed properly using heap and radix indices.
  
  We do not optimize the height of overflow-sub-trees using index_bits.
  The reason is: there can be many such overflow-sub-trees and all of
  them have to be suffled whenever the index_bits increases. This may involve
  walking the whole prio_tree in prio_tree_insert->prio_tree_expand code
  path which is not desirable. Hence, we do not optimize the height of the
  heap-and-size indexed overflow-sub-trees using prio_tree->index_bits.
  Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits
  of size_index. This may lead to skewed sub-trees because most of the
670e9f34e   Paolo Ornati   Documentation: re...
91
  higher significant bits of the size_index are likely to be 0 (zero). In
1da177e4c   Linus Torvalds   Linux-2.6.12-rc2
92
93
94
95
96
97
98
99
100
101
102
103
104
105
  the example above, all 3 overflow-sub-trees are skewed. This may marginally
  affect the performance. However, processes rarely map many vmas with the
  same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally
  do not require overflow-sub-trees to index all vmas.
  
  From the above discussion it is clear that the maximum height of
  a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG.
  However, in most of the common cases we do not need overflow-sub-trees,
  so the tree height in the common cases will be prio_tree_root->index_bits.
  
  It is fair to mention here that the prio_tree_root->index_bits
  is increased on demand, however, the index_bits is not decreased when
  vmas are removed from the prio_tree. That's tricky to do. Hence, it's
  left as a home work problem.