11 Aug, 2010

3 commits

  • If error hugepage is not in-use, we can fully recovery from error
    by dequeuing it from freelist, so return RECOVERY.
    Otherwise whether or not we can recovery depends on user processes,
    so return DELAYED.

    Dependency:
    "HWPOISON, hugetlb: enable error handling path for hugepage"

    Signed-off-by: Naoya Horiguchi
    Cc: Andrew Morton
    Acked-by: Fengguang Wu
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • This patch adds reverse mapping feature for hugepage by introducing
    mapcount for shared/private-mapped hugepage and anon_vma for
    private-mapped hugepage.

    While hugepage is not currently swappable, reverse mapping can be useful
    for memory error handler.

    Without this patch, memory error handler cannot identify processes
    using the bad hugepage nor unmap it from them. That is:
    - for shared hugepage:
    we can collect processes using a hugepage through pagecache,
    but can not unmap the hugepage because of the lack of mapcount.
    - for privately mapped hugepage:
    we can neither collect processes nor unmap the hugepage.
    This patch solves these problems.

    This patch include the bug fix given by commit 23be7468e8, so reverts it.

    Dependency:
    "hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"

    ChangeLog since May 24.
    - create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
    - move functions setting up anon_vma for hugepage into mm/rmap.c.

    ChangeLog since May 13.
    - rebased to 2.6.34
    - fix logic error (in case that private mapping and shared mapping coexist)
    - move is_vm_hugetlb_page() into include/linux/mm.h to use this function
    from linear_page_index()
    - define and use linear_hugepage_index() instead of compound_order()
    - use page_move_anon_rmap() in hugetlb_cow()
    - copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
    - revert commit 24be7468 completely

    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Acked-by: Fengguang Wu
    Acked-by: Mel Gorman
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • is_vm_hugetlb_page() is a widely used inline function to insert hooks
    into hugetlb code.
    But we can't use it in pagemap.h because of circular dependency of
    the header files. This patch removes this limitation.

    Acked-by: Mel Gorman
    Acked-by: Fengguang Wu
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     

16 Dec, 2009

1 commit

  • This patch derives a "nodes_allowed" node mask from the numa mempolicy of
    the task modifying the number of persistent huge pages to control the
    allocation, freeing and adjusting of surplus huge pages when the pool page
    count is modified via the new sysctl or sysfs attribute
    "nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows:

    * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
    is produced. This will cause the hugetlb subsystem to use
    node_online_map as the "nodes_allowed". This preserves the
    behavior before this patch.
    * For "preferred" mempolicy, including explicit local allocation,
    a nodemask with the single preferred node will be produced.
    "local" policy will NOT track any internode migrations of the
    task adjusting nr_hugepages.
    * For "bind" and "interleave" policy, the mempolicy's nodemask
    will be used.
    * Other than to inform the construction of the nodes_allowed node
    mask, the actual mempolicy mode is ignored. That is, all modes
    behave like interleave over the resulting nodes_allowed mask
    with no "fallback".

    See the updated documentation [next patch] for more information
    about the implications of this patch.

    Examples:

    Starting with:

    Node 0 HugePages_Total: 0
    Node 1 HugePages_Total: 0
    Node 2 HugePages_Total: 0
    Node 3 HugePages_Total: 0

    Default behavior [with or without this patch] balances persistent
    hugepage allocation across nodes [with sufficient contiguous memory]:

    sysctl vm.nr_hugepages[_mempolicy]=32

    yields:

    Node 0 HugePages_Total: 8
    Node 1 HugePages_Total: 8
    Node 2 HugePages_Total: 8
    Node 3 HugePages_Total: 8

    Of course, we only have nr_hugepages_mempolicy with the patch,
    but with default mempolicy, nr_hugepages_mempolicy behaves the
    same as nr_hugepages.

    Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
    '--membind' because it allows multiple nodes to be specified
    and it's easy to type]--we can allocate huge pages on
    individual nodes or sets of nodes. So, starting from the
    condition above, with 8 huge pages per node, add 8 more to
    node 2 using:

    numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40

    This yields:

    Node 0 HugePages_Total: 8
    Node 1 HugePages_Total: 8
    Node 2 HugePages_Total: 16
    Node 3 HugePages_Total: 8

    The incremental 8 huge pages were restricted to node 2 by the
    specified mempolicy.

    Similarly, we can use mempolicy to free persistent huge pages
    from specified nodes:

    numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32

    yields:

    Node 0 HugePages_Total: 4
    Node 1 HugePages_Total: 4
    Node 2 HugePages_Total: 16
    Node 3 HugePages_Total: 8

    The 8 huge pages freed were balanced over nodes 0 and 1.

    [rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
    Signed-off-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Acked-by: Mel Gorman
    Reviewed-by: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: Randy Dunlap
    Cc: Nishanth Aravamudan
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Cc: Eric Whitney
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

28 Sep, 2009

1 commit


25 Sep, 2009

1 commit

  • Why macros are always wrong:

    mm/mmap.c: In function 'do_mmap_pgoff':
    mm/mmap.c:953: warning: unused variable 'user'

    also, move a couple of struct forward-decls outside `#ifdef
    CONFIG_HUGETLB_PAGE' - it's pointless and frequently harmful to make these
    conditional (eg, this patch needed `struct user_struct').

    Cc: Lee Schermerhorn
    Cc: Mel Gorman
    Cc: Nishanth Aravamudan
    Cc: David Rientjes
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Cc: Eric Whitney
    Cc: Eric B Munson
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

22 Sep, 2009

4 commits

  • Add a flag for mmap that will be used to request a huge page region that
    will look like anonymous memory to userspace. This is accomplished by
    using a file on the internal vfsmount. MAP_HUGETLB is a modifier of
    MAP_ANONYMOUS and so must be specified with it. The region will behave
    the same as a MAP_ANONYMOUS region using small pages.

    [akpm@linux-foundation.org: fix arch definitions of MAP_HUGETLB]
    Signed-off-by: Eric B Munson
    Acked-by: David Rientjes
    Cc: Mel Gorman
    Cc: Adam Litke
    Cc: David Gibson
    Cc: Lee Schermerhorn
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     
  • This patchset adds a flag to mmap that allows the user to request that an
    anonymous mapping be backed with huge pages. This mapping will borrow
    functionality from the huge page shm code to create a file on the kernel
    internal mount and use it to approximate an anonymous mapping. The
    MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
    both flags being preset.

    A new flag is necessary because there is no other way to hook into huge
    pages without creating a file on a hugetlbfs mount which wouldn't be
    MAP_ANONYMOUS.

    To userspace, this mapping will behave just like an anonymous mapping
    because the file is not accessible outside of the kernel.

    This patchset is meant to simplify the programming model. Presently there
    is a large chunk of boiler platecode, contained in libhugetlbfs, required
    to create private, hugepage backed mappings. This patch set would allow
    use of hugepages without linking to libhugetlbfs or having hugetblfs
    mounted.

    Unification of the VM code would provide these same benefits, but it has
    been resisted each time that it has been suggested for several reasons: it
    would break PAGE_SIZE assumptions across the kernel, it makes page-table
    abstractions really expensive, and it does not provide any benefit on
    architectures that do not support huge pages, incurring fast path
    penalties without providing any benefit on these architectures.

    This patch:

    There are two means of creating mappings backed by huge pages:

    1. mmap() a file created on hugetlbfs
    2. Use shm which creates a file on an internal mount which essentially
    maps it MAP_SHARED

    The internal mount is only used for shared mappings but there is very
    little that stops it being used for private mappings. This patch extends
    hugetlbfs_file_setup() to deal with the creation of files that will be
    mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
    used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.

    Signed-off-by: Eric Munson
    Acked-by: David Rientjes
    Cc: Mel Gorman
    Cc: Adam Litke
    Cc: David Gibson
    Cc: Lee Schermerhorn
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     
  • follow_hugetlb_page() shouldn't be guessing about the coredump case
    either: pass the foll_flags down to it, instead of just the write bit.

    Remove that obscure huge_zeropage_ok() test. The decision is easy,
    though unlike the non-huge case - here vm_ops->fault is always set.
    But we know that a fault would serve up zeroes, unless there's
    already a hugetlbfs pagecache page to back the range.

    (Alternatively, since hugetlb pages aren't swapped out under pressure,
    you could save more dump space by arguing that a page not yet faulted
    into this process cannot be relevant to the dump; but that would be
    more surprising.)

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Free huges pages from nodes in round robin fashion in an attempt to keep
    [persistent a.k.a static] hugepages balanced across nodes

    New function free_pool_huge_page() is modeled on and performs roughly the
    inverse of alloc_fresh_huge_page(). Replaces dequeue_huge_page() which
    now has no callers, so this patch removes it.

    Helper function hstate_next_node_to_free() uses new hstate member
    next_to_free_nid to distribute "frees" across all nodes with huge pages.

    Acked-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Acked-by: Mel Gorman
    Cc: Nishanth Aravamudan
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Cc: Eric Whitney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

25 Aug, 2009

1 commit

  • 2.6.30's commit 8a0bdec194c21c8fdef840989d0d7b742bb5d4bc removed
    user_shm_lock() calls in hugetlb_file_setup() but left the
    user_shm_unlock call in shm_destroy().

    In detail:
    Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
    is not called in hugetlb_file_setup(). However, user_shm_unlock() is
    called in any case in shm_destroy() and in the following
    atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
    up->__count gets zero, also cleanup_user_struct() is scheduled.

    Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
    However, the ref counter up->__count gets unexpectedly non-positive and
    the corresponding structs are freed even though there are live
    references to them, resulting in a kernel oops after a lots of
    shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.

    Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
    time of shm_destroy() may give a different answer from at the time
    of hugetlb_file_setup(). And fixed newseg()'s no_id error path,
    which has missed user_shm_unlock() ever since it came in 2.6.9.

    Reported-by: Stefan Huber
    Signed-off-by: Hugh Dickins
    Tested-by: Stefan Huber
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

24 Jun, 2009

1 commit


17 Jun, 2009

1 commit

  • A series of patches to enhance the /proc/pagemap interface and to add a
    userspace executable which can be used to present the pagemap data.

    Export 10 more flags to end users (and more for kernel developers):

    11. KPF_MMAP (pseudo flag) memory mapped page
    12. KPF_ANON (pseudo flag) memory mapped page (anonymous)
    13. KPF_SWAPCACHE page is in swap cache
    14. KPF_SWAPBACKED page is swap/RAM backed
    15. KPF_COMPOUND_HEAD (*)
    16. KPF_COMPOUND_TAIL (*)
    17. KPF_HUGE hugeTLB pages
    18. KPF_UNEVICTABLE page is in the unevictable LRU list
    19. KPF_HWPOISON hardware detected corruption
    20. KPF_NOPAGE (pseudo flag) no page frame at the address

    (*) For compound pages, exporting _both_ head/tail info enables
    users to tell where a compound page starts/ends, and its order.

    a simple demo of the page-types tool

    # ./page-types -h
    page-types [options]
    -r|--raw Raw mode, for kernel developers
    -a|--addr addr-spec Walk a range of pages
    -b|--bits bits-spec Walk pages with specified bits
    -l|--list Show page details in ranges
    -L|--list-each Show page details one by one
    -N|--no-summary Don't show summay info
    -h|--help Show this usage message
    addr-spec:
    N one page at offset N (unit: pages)
    N+M pages range from N to N+M-1
    N,M pages range from N to M-1
    N, pages range from N to end
    ,M pages range from 0 to M
    bits-spec:
    bit1,bit2 (flags & (bit1|bit2)) != 0
    bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1
    bit1,~bit2 (flags & (bit1|bit2)) == bit1
    =bit1,bit2 flags == (bit1|bit2)
    bit-names:
    locked error referenced uptodate
    dirty lru active slab
    writeback reclaim buddy mmap
    anonymous swapcache swapbacked compound_head
    compound_tail huge unevictable hwpoison
    nopage reserved(r) mlocked(r) mappedtodisk(r)
    private(r) private_2(r) owner_private(r) arch(r)
    uncached(r) readahead(o) slob_free(o) slub_frozen(o)
    slub_debug(o)
    (r) raw mode bits (o) overloaded bits

    # ./page-types
    flags page-count MB symbolic-flags long-symbolic-flags
    0x0000000000000000 487369 1903 _________________________________
    0x0000000000000014 5 0 __R_D____________________________ referenced,dirty
    0x0000000000000020 1 0 _____l___________________________ lru
    0x0000000000000024 34 0 __R__l___________________________ referenced,lru
    0x0000000000000028 3838 14 ___U_l___________________________ uptodate,lru
    0x0001000000000028 48 0 ___U_l_______________________I___ uptodate,lru,readahead
    0x000000000000002c 6478 25 __RU_l___________________________ referenced,uptodate,lru
    0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead
    0x0000000000000040 8344 32 ______A__________________________ active
    0x0000000000000060 1 0 _____lA__________________________ lru,active
    0x0000000000000068 348 1 ___U_lA__________________________ uptodate,lru,active
    0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead
    0x000000000000006c 988 3 __RU_lA__________________________ referenced,uptodate,lru,active
    0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead
    0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked
    0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked
    0x0000000000000400 503 1 __________B______________________ buddy
    0x0000000000000804 1 0 __R________M_____________________ referenced,mmap
    0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap
    0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead
    0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap
    0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead
    0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap
    0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead
    0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap
    0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead
    0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked
    0x0000000000001000 492 1 ____________a____________________ anonymous
    0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked
    0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked
    0x000000000000586c 30 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
    total 513968 2007

    # ./page-types -r
    flags page-count MB symbolic-flags long-symbolic-flags
    0x0000000000000000 468002 1828 _________________________________
    0x0000000100000000 19102 74 _____________________r___________ reserved
    0x0000000000008000 41 0 _______________H_________________ compound_head
    0x0000000000010000 188 0 ________________T________________ compound_tail
    0x0000000000008014 1 0 __R_D__________H_________________ referenced,dirty,compound_head
    0x0000000000010014 4 0 __R_D___________T________________ referenced,dirty,compound_tail
    0x0000000000000020 1 0 _____l___________________________ lru
    0x0000000800000024 34 0 __R__l__________________P________ referenced,lru,private
    0x0000000000000028 3794 14 ___U_l___________________________ uptodate,lru
    0x0001000000000028 46 0 ___U_l_______________________I___ uptodate,lru,readahead
    0x0000000400000028 44 0 ___U_l_________________d_________ uptodate,lru,mappedtodisk
    0x0001000400000028 2 0 ___U_l_________________d_____I___ uptodate,lru,mappedtodisk,readahead
    0x000000000000002c 6434 25 __RU_l___________________________ referenced,uptodate,lru
    0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead
    0x000000040000002c 14 0 __RU_l_________________d_________ referenced,uptodate,lru,mappedtodisk
    0x000000080000002c 30 0 __RU_l__________________P________ referenced,uptodate,lru,private
    0x0000000800000040 8124 31 ______A_________________P________ active,private
    0x0000000000000040 219 0 ______A__________________________ active
    0x0000000800000060 1 0 _____lA_________________P________ lru,active,private
    0x0000000000000068 322 1 ___U_lA__________________________ uptodate,lru,active
    0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead
    0x0000000400000068 13 0 ___U_lA________________d_________ uptodate,lru,active,mappedtodisk
    0x0000000800000068 12 0 ___U_lA_________________P________ uptodate,lru,active,private
    0x000000000000006c 977 3 __RU_lA__________________________ referenced,uptodate,lru,active
    0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead
    0x000000040000006c 5 0 __RU_lA________________d_________ referenced,uptodate,lru,active,mappedtodisk
    0x000000080000006c 3 0 __RU_lA_________________P________ referenced,uptodate,lru,active,private
    0x0000000c0000006c 3 0 __RU_lA________________dP________ referenced,uptodate,lru,active,mappedtodisk,private
    0x0000000c00000068 1 0 ___U_lA________________dP________ uptodate,lru,active,mappedtodisk,private
    0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked
    0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked
    0x0000000000000400 538 2 __________B______________________ buddy
    0x0000000000000804 1 0 __R________M_____________________ referenced,mmap
    0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap
    0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead
    0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap
    0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead
    0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap
    0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead
    0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap
    0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead
    0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked
    0x0000000000001000 492 1 ____________a____________________ anonymous
    0x0000000000005008 2 0 ___U________a_b__________________ uptodate,anonymous,swapbacked
    0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked
    0x000000000000580c 1 0 __RU_______Ma_b__________________ referenced,uptodate,mmap,anonymous,swapbacked
    0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked
    0x000000000000586c 29 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
    total 513968 2007

    # ./page-types --raw --list --no-summary --bits reserved
    offset count flags
    0 15 _____________________r___________
    31 4 _____________________r___________
    159 97 _____________________r___________
    4096 2067 _____________________r___________
    6752 2390 _____________________r___________
    9355 3 _____________________r___________
    9728 14526 _____________________r___________

    This patch:

    Introduce PageHuge(), which identifies huge/gigantic pages by their
    dedicated compound destructor functions.

    Also move prep_compound_gigantic_page() to hugetlb.c and make
    __free_pages_ok() non-static.

    Signed-off-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Andi Kleen
    Cc: Matt Mackall
    Cc: Alexey Dobriyan
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

11 Feb, 2009

2 commits

  • Fix regression due to 5a6fe125950676015f5108fb71b2a67441755003,
    "Do not account for the address space used by hugetlbfs using VM_ACCOUNT"
    which added an argument to the function hugetlb_file_setup() but not to
    the macro hugetlb_file_setup().

    Reported-by: Chris Clayton
    Signed-off-by: Stefan Richter
    Acked-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Stefan Richter
     
  • When overcommit is disabled, the core VM accounts for pages used by anonymous
    shared, private mappings and special mappings. It keeps track of VMAs that
    should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
    with VM_NORESERVE.

    Overcommit for hugetlbfs is much riskier than overcommit for base pages
    due to contiguity requirements. It avoids overcommiting on both shared and
    private mappings using reservation counters that are checked and updated
    during mmap(). This ensures (within limits) that hugepages exist in the
    future when faults occurs or it is too easy to applications to be SIGKILLed.

    As hugetlbfs makes its own reservations of a different unit to the base page
    size, VM_ACCOUNT should never be set. Even if the units were correct, we would
    double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
    be set because an application can request no reserves be made for hugetlbfs
    at the risk of getting killed later.

    With commit fc8744adc870a8d4366908221508bb113d8b72ee, VM_NORESERVE and
    VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
    breaks the accounting for both the core VM and hugetlbfs, can trigger an
    OOM storm when hugepage pools are too small lockups and corrupted counters
    otherwise are used. This patch brings hugetlbfs more in line with how the
    core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.

    Signed-off-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

07 Jan, 2009

2 commits

  • The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
    kernel to back a VMA. This matches the size used by the MMU in the
    majority of cases. However, one counter-example occurs on PPC64 kernels
    whereby a kernel using 64K as a base pagesize may still use 4K pages for
    the MMU on older processor. To distinguish, this patch reports
    MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.

    Signed-off-by: Mel Gorman
    Cc: "KOSAKI Motohiro"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • It is useful to verify a hugepage-aware application is using the expected
    pagesizes for its memory regions. This patch creates an entry called
    KernelPageSize in /proc/pid/smaps that is the size of page used by the
    kernel to back a VMA. The entry is not called PageSize as it is possible
    the MMU uses a different size. This extension should not break any sensible
    parser that skips lines containing unrecognised information.

    Signed-off-by: Mel Gorman
    Acked-by: "KOSAKI Motohiro"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

23 Oct, 2008

1 commit


27 Jul, 2008

1 commit

  • Remove the following warning when CONFIG_HUGETLB_PAGE is not set:

    ipc/shm.c: In function `shm_get_stat':
    ipc/shm.c:565: warning: unused variable `h'

    [akpm@linux-foundation.org: use tabs, not spaces]
    Signed-off-by: Andrea Righi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Righi
     

25 Jul, 2008

8 commits

  • Allow alloc_bootmem_huge_page() to be overridden by architectures that
    can't always use bootmem. This requires huge_boot_pages to be available
    for use by this function.

    This is required for powerpc 16G pages, which have to be reserved prior to
    boot-time. The location of these pages are indicated in the device tree.

    Acked-by: Adam Litke
    Signed-off-by: Jon Tollefson
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jon Tollefson
     
  • Straight forward extensions for huge pages located in the PUD instead of
    PMDs.

    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Provide new hugepages user APIs that are more suited to multiple hstates
    in sysfs. There is a new directory, /sys/kernel/hugepages. Underneath
    that directory there will be a directory per-supported hugepage size,
    e.g.:

    /sys/kernel/hugepages/hugepages-64kB
    /sys/kernel/hugepages/hugepages-16384kB
    /sys/kernel/hugepages/hugepages-16777216kB

    corresponding to 64k, 16m and 16g respectively. Within each
    hugepages-size directory there are a number of files, corresponding to the
    tracked counters in the hstate, e.g.:

    /sys/kernel/hugepages/hugepages-64/nr_hugepages
    /sys/kernel/hugepages/hugepages-64/nr_overcommit_hugepages
    /sys/kernel/hugepages/hugepages-64/free_hugepages
    /sys/kernel/hugepages/hugepages-64/resv_hugepages
    /sys/kernel/hugepages/hugepages-64/surplus_hugepages

    Of these files, the first two are read-write and the latter three are
    read-only. The size of the hugepage being manipulated is trivially
    deducible from the enclosing directory and is always expressed in kB (to
    match meminfo).

    [dave@linux.vnet.ibm.com: fix build]
    [nacc@us.ibm.com: hugetlb: hang off of /sys/kernel/mm rather than /sys/kernel]
    [nacc@us.ibm.com: hugetlb: remove CONFIG_SYSFS dependency]
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Nick Piggin
    Cc: Dave Hansen
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • Add the ability to configure the hugetlb hstate used on a per mount basis.

    - Add a new pagesize= option to the hugetlbfs mount that allows setting
    the page size
    - This option causes the mount code to find the hstate corresponding to the
    specified size, and sets up a pointer to the hstate in the mount's
    superblock.
    - Change the hstate accessors to use this information rather than the
    global_hstate they were using (requires a slight change in mm/memory.c
    so we don't NULL deref in the error-unmap path -- see comments).

    [np: take hstate out of hugetlbfs inode and vma->vm_private_data]

    Acked-by: Adam Litke
    Acked-by: Nishanth Aravamudan
    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Add basic support for more than one hstate in hugetlbfs. This is the key
    to supporting multiple hugetlbfs page sizes at once.

    - Rather than a single hstate, we now have an array, with an iterator
    - default_hstate continues to be the struct hstate which we use by default
    - Add functions for architectures to register new hstates

    [akpm@linux-foundation.org: coding-style fixes]
    Acked-by: Adam Litke
    Acked-by: Nishanth Aravamudan
    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • The goal of this patchset is to support multiple hugetlb page sizes. This
    is achieved by introducing a new struct hstate structure, which
    encapsulates the important hugetlb state and constants (eg. huge page
    size, number of huge pages currently allocated, etc).

    The hstate structure is then passed around the code which requires these
    fields, they will do the right thing regardless of the exact hstate they
    are operating on.

    This patch adds the hstate structure, with a single global instance of it
    (default_hstate), and does the basic work of converting hugetlb to use the
    hstate.

    Future patches will add more hstate structures to allow for different
    hugetlbfs mounts to have different page sizes.

    [akpm@linux-foundation.org: coding-style fixes]
    Acked-by: Adam Litke
    Acked-by: Nishanth Aravamudan
    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • …n hugetlbfs will succeed

    After patch 2 in this series, a process that successfully calls mmap() for
    a MAP_PRIVATE mapping will be guaranteed to successfully fault until a
    process calls fork(). At that point, the next write fault from the parent
    could fail due to COW if the child still has a reference.

    We only reserve pages for the parent but a copy must be made to avoid
    leaking data from the parent to the child after fork(). Reserves could be
    taken for both parent and child at fork time to guarantee faults but if
    the mapping is large it is highly likely we will not have sufficient pages
    for the reservation, and it is common to fork only to exec() immediatly
    after. A failure here would be very undesirable.

    Note that the current behaviour of mainline with MAP_PRIVATE pages is
    pretty bad. The following situation is allowed to occur today.

    1. Process calls mmap(MAP_PRIVATE)
    2. Process calls mlock() to fault all pages and makes sure it succeeds
    3. Process forks()
    4. Process writes to MAP_PRIVATE mapping while child still exists
    5. If the COW fails at this point, the process gets SIGKILLed even though it
    had taken care to ensure the pages existed

    This patch improves the situation by guaranteeing the reliability of the
    process that successfully calls mmap(). When the parent performs COW, it
    will try to satisfy the allocation without using reserves. If that fails
    the parent will steal the page leaving any children without a page.
    Faults from the child after that point will result in failure. If the
    child COW happens first, an attempt will be made to allocate the page
    without reserves and the child will get SIGKILLed on failure.

    To summarise the new behaviour:

    1. If the original mapper performs COW on a private mapping with multiple
    references, it will attempt to allocate a hugepage from the pool or
    the buddy allocator without using the existing reserves. On fail, VMAs
    mapping the same area are traversed and the page being COW'd is unmapped
    where found. It will then steal the original page as the last mapper in
    the normal way.

    2. The VMAs the pages were unmapped from are flagged to note that pages
    with data no longer exist. Future no-page faults on those VMAs will
    terminate the process as otherwise it would appear that data was corrupted.
    A warning is printed to the console that this situation occured.

    2. If the child performs COW first, it will attempt to satisfy the COW
    from the pool if there are enough pages or via the buddy allocator if
    overcommit is allowed and the buddy allocator can satisfy the request. If
    it fails, the child will be killed.

    If the pool is large enough, existing applications will not notice that
    the reserves were a factor. Existing applications depending on the
    no-reserves been set are unlikely to exist as for much of the history of
    hugetlbfs, pages were prefaulted at mmap(), allocating the pages at that
    point or failing the mmap().

    [npiggin@suse.de: fix CONFIG_HUGETLB=n build]
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Acked-by: Adam Litke <agl@us.ibm.com>
    Cc: Andy Whitcroft <apw@shadowen.org>
    Cc: William Lee Irwin III <wli@holomorphy.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Cc: Nick Piggin <npiggin@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     
  • This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in
    a similar manner to the reservations taken for MAP_SHARED mappings. The
    reserve count is accounted both globally and on a per-VMA basis for
    private mappings. This guarantees that a process that successfully calls
    mmap() will successfully fault all pages in the future unless fork() is
    called.

    The characteristics of private mappings of hugetlbfs files behaviour after
    this patch are;

    1. The process calling mmap() is guaranteed to succeed all future faults until
    it forks().
    2. On fork(), the parent may die due to SIGKILL on writes to the private
    mapping if enough pages are not available for the COW. For reasonably
    reliable behaviour in the face of a small huge page pool, children of
    hugepage-aware processes should not reference the mappings; such as
    might occur when fork()ing to exec().
    3. On fork(), the child VMAs inherit no reserves. Reads on pages already
    faulted by the parent will succeed. Successful writes will depend on enough
    huge pages being free in the pool.
    4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
    and at fault time otherwise.

    Before this patch, all reads or writes in the child potentially needs page
    allocations that can later lead to the death of the parent. This applies
    to reads and writes of uninstantiated pages as well as COW. After the
    patch it is only a write to an instantiated page that causes problems.

    Signed-off-by: Mel Gorman
    Acked-by: Adam Litke
    Cc: Andy Whitcroft
    Cc: William Lee Irwin III
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

28 Apr, 2008

1 commit

  • This patch moves all architecture functions for hugetlb to architecture header
    files (include/asm-foo/hugetlb.h) and converts all macros to inline functions.
    It also removes (!) ARCH_HAS_HUGEPAGE_ONLY_RANGE,
    ARCH_HAS_HUGETLB_FREE_PGD_RANGE, ARCH_HAS_PREPARE_HUGEPAGE_RANGE,
    ARCH_HAS_SETCLEAR_HUGE_PTE and ARCH_HAS_HUGETLB_PREFAULT_HOOK.

    Getting rid of the ARCH_HAS_xxx #ifdef and macro fugliness should increase
    readability and maintainability, at the price of some code duplication. An
    asm-generic common part would have reduced the loc, but we would end up with
    new ARCH_HAS_xxx defines eventually.

    Acked-by: Martin Schwidefsky
    Signed-off-by: Gerald Schaefer
    Cc: Paul Mundt
    Cc: "Luck, Tony"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     

14 Feb, 2008

1 commit

  • proc_doulongvec_minmax() calls copy_to_user()/copy_from_user(), so we can't
    hold hugetlb_lock over the call. Use a dummy variable to store the sysctl
    result, like in hugetlb_sysctl_handler(), then grab the lock to update
    nr_overcommit_huge_pages.

    Signed-off-by: Nishanth Aravamudan
    Reported-by: Miles Lane
    Cc: Adam Litke
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     

09 Feb, 2008

1 commit

  • When I replaced hugetlb_dynamic_pool with nr_overcommit_hugepages I used
    proc_doulongvec_minmax() directly. However, hugetlb.c's locking rules
    require that all counter modifications occur under the hugetlb_lock. Add a
    callback into the hugetlb code similar to the one for nr_hugepages. Grab
    the lock around the manipulation of nr_overcommit_hugepages in
    proc_doulongvec_minmax().

    Signed-off-by: Nishanth Aravamudan
    Acked-by: Adam Litke
    Cc: David Gibson
    Cc: William Lee Irwin III
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     

18 Dec, 2007

2 commits

  • This reverts commit 54f9f80d6543fb7b157d3b11e2e7911dc1379790 ("hugetlb:
    Add hugetlb_dynamic_pool sysctl")

    Given the new sysctl nr_overcommit_hugepages, the boolean dynamic pool
    sysctl is not needed, as its semantics can be expressed by 0 in the
    overcommit sysctl (no dynamic pool) and non-0 in the overcommit sysctl
    (pool enabled).

    (Needed in 2.6.24 since it reverts a post-2.6.23 userspace-visible change)

    Signed-off-by: Nishanth Aravamudan
    Acked-by: Adam Litke
    Cc: William Lee Irwin III
    Cc: Dave Hansen
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • hugetlb: introduce nr_overcommit_hugepages sysctl

    While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
    became convinced that having a boolean sysctl was insufficient:

    1) To support per-node control of hugepages, I have previously submitted
    patches to add a sysfs attribute related to nr_hugepages. However, with
    a boolean global value and per-mount quota enforcement constraining the
    dynamic pool, adding corresponding control of the dynamic pool on a
    per-node basis seems inconsistent to me.

    2) Administration of the hugetlb dynamic pool with multiple hugetlbfs
    mount points is, arguably, more arduous than it needs to be. Each quota
    would need to be set separately, and the sum would need to be monitored.

    To ease the administration, and to help make the way for per-node
    control of the static & dynamic hugepage pool, I added a separate
    sysctl, nr_overcommit_hugepages. This value serves as a high watermark
    for the overall hugepage pool, while nr_hugepages serves as a low
    watermark. The boolean sysctl can then be removed, as the condition

    nr_overcommit_hugepages > 0

    indicates the same administrative setting as

    hugetlb_dynamic_pool == 1

    Quotas still serve as local enforcement of the size of the pool on a
    per-mount basis.

    A few caveats:

    1) There is a race whereby the global surplus huge page counter is
    incremented before a hugepage has allocated. Another process could then
    try grow the pool, and fail to convert a surplus huge page to a normal
    huge page and instead allocate a fresh huge page. I believe this is
    benign, as no memory is leaked (the actual pages are still tracked
    correctly) and the counters won't go out of sync.

    2) Shrinking the static pool while a surplus is in effect will allow the
    number of surplus huge pages to exceed the overcommit value. As long as
    this condition holds, however, no more surplus huge pages will be
    allowed on the system until one of the two sysctls are increased
    sufficiently, or the surplus huge pages go out of use and are freed.

    Successfully tested on x86_64 with the current libhugetlbfs snapshot,
    modified to use the new sysctl.

    Signed-off-by: Nishanth Aravamudan
    Acked-by: Adam Litke
    Cc: William Lee Irwin III
    Cc: Dave Hansen
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     

15 Nov, 2007

3 commits

  • For administrative purpose, we want to query actual block usage for
    hugetlbfs file via fstat. Currently, hugetlbfs always return 0. Fix that
    up since kernel already has all the information to track it properly.

    Signed-off-by: Ken Chen
    Acked-by: Adam Litke
    Cc: Badari Pulavarty
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Chen
     
  • Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
    allow bulk updating of the sbinfo->free_blocks counter. This will be used by
    the next patch in the series.

    Signed-off-by: Adam Litke
    Cc: Ken Chen
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • When calling get_user_pages(), a write flag is passed in by the caller to
    indicate if write access is required on the faulted-in pages. Currently,
    follow_hugetlb_page() ignores this flag and always faults pages for
    read-only access. This can cause data corruption because a device driver
    that calls get_user_pages() with write set will not expect COW faults to
    occur on the returned pages.

    This patch passes the write flag down to follow_hugetlb_page() and makes
    sure hugetlb_fault() is called with the right write_access parameter.

    [ezk@cs.sunysb.edu: build fix]
    Signed-off-by: Adam Litke
    Reviewed-by: Ken Chen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Cc: Badari Pulavarty
    Signed-off-by: Erez Zadok
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     

17 Oct, 2007

1 commit

  • The maximum size of the huge page pool can be controlled using the overall
    size of the hugetlb filesystem (via its 'size' mount option). However in the
    common case the this will not be set as the pool is traditionally fixed in
    size at boot time. In order to maintain the expected semantics, we need to
    prevent the pool expanding by default.

    This patch introduces a new sysctl controlling dynamic pool resizing. When
    this is enabled the pool will expand beyond its base size up to the size of
    the hugetlb filesystem. It is disabled by default.

    Signed-off-by: Adam Litke
    Acked-by: Andy Whitcroft
    Acked-by: Dave McCracken
    Cc: William Irwin
    Cc: David Gibson
    Cc: Ken Chen
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     

31 Aug, 2007

1 commit

  • For hugepage mappings, the file offset, like the address and size, needs to
    be aligned to the size of a hugepage.

    In commit 68589bc353037f233fe510ad9ff432338c95db66, the check for this was
    moved into prepare_hugepage_range() along with the address and size checks.
    But since BenH's rework of the get_unmapped_area() paths leading up to
    commit 4b1d89290b62bb2db476c94c82cf7442aab440c8, prepare_hugepage_range()
    is only called for MAP_FIXED mappings, not for other mappings. This means
    we're no longer ever checking for an aligned offset - I've confirmed that
    mmap() will (apparently) succeed with a misaligned offset on both powerpc
    and i386 at least.

    This patch restores the check, removing it from prepare_hugepage_range()
    and putting it back into hugetlbfs_file_mmap(). I'm putting it there,
    rather than in the get_unmapped_area() path so it only needs to go in one
    place, than separately in the half-dozen or so arch-specific
    implementations of hugetlb_get_unmapped_area().

    Signed-off-by: David Gibson
    Cc: Adam Litke
    Cc: Andi Kleen
    Cc: "David S. Miller"
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Gibson
     

30 Jul, 2007

1 commit

  • Remove fs.h from mm.h. For this,
    1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
    2) Add back fs.h or less bloated headers (err.h) to files that need it.

    As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
    rebuilt down to 3444 (-12.3%).

    Cross-compile tested without regressions on my two usual configs and (sigh):

    alpha arm-mx1ads mips-bigsur powerpc-ebony
    alpha-allnoconfig arm-neponset mips-capcella powerpc-g5
    alpha-defconfig arm-netwinder mips-cobalt powerpc-holly
    alpha-up arm-netx mips-db1000 powerpc-iseries
    arm arm-ns9xxx mips-db1100 powerpc-linkstation
    arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200
    arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple
    arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2
    arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads
    arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb
    arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds
    arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb
    arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx
    arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp
    arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds
    arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds
    arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads
    arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds
    arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads
    arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds
    arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds
    arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn
    arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads
    arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads
    arm-ep93xx i386-up mips-pb1100 powerpc-pasemi
    arm-footbridge ia64 mips-pb1500 powerpc-pmac32
    arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64
    arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800
    arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3
    arm-h7202 ia64-gensparse mips-qemu powerpc-pseries
    arm-hackkit ia64-sim mips-rbhma4200 powerpc-up
    arm-integrator ia64-sn2 mips-rbhma4500 s390
    arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig
    arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig
    arm-iop33x ia64-zx1 mips-sead s390-up
    arm-ixp2000 m68k mips-tb0219 sparc
    arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig
    arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig
    arm-jornada720 m68k-atari mips-workpad sparc-up
    arm-kafa m68k-bvme6000 mips-wrppmc sparc64
    arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig
    arm-ks8695 m68k-mac parisc sparc64-defconfig
    arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up
    arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64
    arm-lpd7a400 m68k-q40 parisc-up x86_64
    arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig
    arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig
    arm-lusl7200 mips powerpc-celleb x86_64-up
    arm-mainstone mips-atlas powerpc-chrp32

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

18 Jul, 2007

1 commit

  • Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
    as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it
    can be used to satisfy hugepage allocations even when the system has been
    running a long time. This allows an administrator to resize the hugepage pool
    at runtime depending on the size of ZONE_MOVABLE.

    This patch adds a new sysctl called hugepages_treat_as_movable. When a
    non-zero value is written to it, future allocations for the huge page pool
    will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
    introduce additional external fragmentation of note as huge pages are always
    the largest contiguous block we care about.

    [akpm@linux-foundation.org: various fixes]
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman