21 Sep, 2010

1 commit


23 Apr, 2010

1 commit


30 Mar, 2009

1 commit


27 Jul, 2008

1 commit


28 Apr, 2008

7 commits

  • Now that we're using "preferred local" policy for system default, we need to
    make this as fast as possible. Because of the variable size of the mempolicy
    structure [based on size of nodemasks], the preferred_node may be in a
    different cacheline from the mode. This can result in accessing an extra
    cacheline in the normal case of system default policy. Suspect this is the
    cause of an observed 2-3% slowdown in page fault testing relative to kernel
    without this patch series.

    To alleviate this, use an internal mode flag, MPOL_F_LOCAL in the mempolicy
    flags member which is guaranteed [?] to be in the same cacheline as the mode
    itself.

    Verified that reworked mempolicy now performs slightly better on 25-rc8-mm1
    for both anon and shmem segments with system default and vma [preferred local]
    policy.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • Currently, when one specifies MPOL_DEFAULT via a NUMA memory policy API
    [set_mempolicy(), mbind() and internal versions], the kernel simply installs a
    NULL struct mempolicy pointer in the appropriate context: task policy, vma
    policy, or shared policy. This causes any use of that policy to "fall back"
    to the next most specific policy scope.

    The only use of MPOL_DEFAULT to mean "local allocation" is in the system
    default policy. This requires extra checks/cases for MPOL_DEFAULT in many
    mempolicy.c functions.

    There is another, "preferred" way to specify local allocation via the APIs.
    That is using the MPOL_PREFERRED policy mode with an empty nodemask.
    Internally, the empty nodemask gets converted to a preferred_node id of '-1'.
    All internal usage of MPOL_PREFERRED will convert the '-1' to the id of the
    node local to the cpu where the allocation occurs.

    System default policy, except during boot, is hard-coded to "local
    allocation". By using the MPOL_PREFERRED mode with a negative value of
    preferred node for system default policy, MPOL_DEFAULT will never occur in the
    'policy' member of a struct mempolicy. Thus, we can remove all checks for
    MPOL_DEFAULT when converting policy to a node id/zonelist in the allocation
    paths.

    In slab_node() return local node id when policy pointer is NULL. No need to
    set a pol value to take the switch default. Replace switch default with
    BUG()--i.e., shouldn't happen.

    With this patch MPOL_DEFAULT is only used in the APIs, including internal
    calls to do_set_mempolicy() and in the display of policy in
    /proc//numa_maps. It always means "fall back" to the the next most
    specific policy scope. This simplifies the description of memory policies
    quite a bit, with no visible change in behavior.

    get_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when
    the requested policy [task or vma/shared] is NULL. These are the values one
    would supply via set_mempolicy() or mbind() to achieve that condition--default
    behavior.

    This patch updates Documentation to reflect this change.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • After further discussion with Christoph Lameter, it has become clear that my
    earlier attempts to clean up the mempolicy reference counting were a bit of
    overkill in some areas, resulting in superflous ref/unref in what are usually
    fast paths. In other areas, further inspection reveals that I botched the
    unref for interleave policies.

    A separate patch, suitable for upstream/stable trees, fixes up the known
    errors in the previous attempt to fix reference counting.

    This patch reworks the memory policy referencing counting and, one hopes,
    simplifies the code. Maybe I'll get it right this time.

    See the update to the numa_memory_policy.txt document for a discussion of
    memory policy reference counting that motivates this patch.

    Summary:

    Lookup of mempolicy, based on (vma, address) need only add a reference for
    shared policy, and we need only unref the policy when finished for shared
    policies. So, this patch backs out all of the unneeded extra reference
    counting added by my previous attempt. It then unrefs only shared policies
    when we're finished with them, using the mpol_cond_put() [conditional put]
    helper function introduced by this patch.

    Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
    containing just the policy. read_swap_cache_async() can call alloc_page_vma()
    multiple times, so we can't let alloc_page_vma() unref the shared policy in
    this case. To avoid this, we make a copy of any non-null shared policy and
    remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading
    a page [or multiple pages] from swap, so the overhead should not be an issue
    here.

    I introduced a new static inline function "mpol_cond_copy()" to copy the
    shared policy to an on-stack policy and remove the flags that would require a
    conditional free. The current implementation of mpol_cond_copy() assumes that
    the struct mempolicy contains no pointers to dynamically allocated structures
    that must be duplicated or reference counted during copy.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • The terms 'policy' and 'mode' are both used in various places to describe the
    semantics of the value stored in the 'policy' member of struct mempolicy.
    Furthermore, the term 'policy' is used to refer to that member, to the entire
    struct mempolicy and to the more abstract concept of the tuple consisting of a
    "mode" and an optional node or set of nodes. Recently, we have added "mode
    flags" that are passed in the upper bits of the 'mode' [or sometimes,
    'policy'] member of the numa APIs.

    I'd like to resolve this confusion, which perhaps only exists in my mind, by
    renaming the 'policy' member to 'mode' throughout, and fixing up the
    Documentation. Man pages will be updated separately.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for
    MPOL_PREFERRED policies that were created with an empty nodemask (for purely
    local allocations). They'll never be invalidated because the allowed mems of
    a task changes or need to be rebound relative to a cpuset's placement.

    Also fixes a bug identified by Lee Schermerhorn that disallowed empty
    nodemasks to be passed to MPOL_PREFERRED to specify local allocations. [A
    different, somewhat incomplete, patch already existed in 25-rc5-mm1.]

    Cc: Paul Jackson
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: Randy Dunlap
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Updates Documentation/vm/numa_memory_policy.txt and
    Documentation/filesystems/tmpfs.txt to describe optional mempolicy mode flags.

    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: Randy Dunlap
    Signed-off-by: David Rientjes
    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • The MPOL_BIND policy creates a zonelist that is used for allocations
    controlled by that mempolicy. As the per-node zonelist is already being
    filtered based on a zone id, this patch adds a version of __alloc_pages() that
    takes a nodemask for further filtering. This eliminates the need for
    MPOL_BIND to create a custom zonelist.

    A positive benefit of this is that allocations using MPOL_BIND now use the
    local node's distance-ordered zonelist instead of a custom node-id-ordered
    zonelist. I.e., pages will be allocated from the closest allowed node with
    available memory.

    [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
    [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
    [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
    Signed-off-by: Mel Gorman
    Acked-by: Christoph Lameter
    Signed-off-by: Lee Schermerhorn
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

17 Oct, 2007

1 commit

  • Allow an application to query the memories allowed by its context.

    Updated numa_memory_policy.txt to mention that applications can use this to
    obtain allowed memories for constructing valid policies.

    TODO: update out-of-tree libnuma wrapper[s], or maybe add a new
    wrapper--e.g., numa_get_mems_allowed() ?

    Also, update numa syscall man pages.

    Tested with memtoy V>=0.13.

    Signed-off-by: Lee Schermerhorn
    Acked-by: Christoph Lameter
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

23 Aug, 2007

1 commit

  • I couldn't find any memory policy documentation in the Documentation
    directory, so here is my attempt to document it.

    There's lots more that could be written about the internal design--including
    data structures, functions, etc. However, if you agree that this is better
    that the nothing that exists now, perhaps it could be merged. This will
    provide a baseline for updates to document the many policy patches that are
    currently being worked.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: Andi Kleen
    Cc: Michael Kerrisk
    Acked-by: Rob Landley
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn