Commit 3b98b087fc2daab67518d2baa8aef19a6ad82723
Committed by
Linus Torvalds
1 parent
1678df37be
Exists in
master
and in
20 other branches
[PATCH] fix NUMA interleaving for huge pages
Since vma->vm_pgoff is in units of smallpages, VMAs for huge pages have the lower HPAGE_SHIFT - PAGE_SHIFT bits always cleared, which results in badd offsets to the interleave functions. Take this difference from small pages into account when calculating the offset. This does add a 0-bit shift into the small-page path (via alloc_page_vma()), but I think that is negligible. Also add a BUG_ON to prevent the offset from growing due to a negative right-shift, which probably shouldn't be allowed anyways. Tested on an 8-memory node ppc64 NUMA box and got the interleaving I expected. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Andi Kleen <ak@muc.de> Acked-by: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Showing 1 changed file with 9 additions and 1 deletions Side-by-side Diff
mm/mempolicy.c
... | ... | @@ -1176,7 +1176,15 @@ |
1176 | 1176 | if (vma) { |
1177 | 1177 | unsigned long off; |
1178 | 1178 | |
1179 | - off = vma->vm_pgoff; | |
1179 | + /* | |
1180 | + * for small pages, there is no difference between | |
1181 | + * shift and PAGE_SHIFT, so the bit-shift is safe. | |
1182 | + * for huge pages, since vm_pgoff is in units of small | |
1183 | + * pages, we need to shift off the always 0 bits to get | |
1184 | + * a useful offset. | |
1185 | + */ | |
1186 | + BUG_ON(shift < PAGE_SHIFT); | |
1187 | + off = vma->vm_pgoff >> (shift - PAGE_SHIFT); | |
1180 | 1188 | off += (addr - vma->vm_start) >> shift; |
1181 | 1189 | return offset_il_node(pol, vma, off); |
1182 | 1190 | } else |