17 Oct, 2008

2 commits

  • Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • name and nlen parameters passed to ->strategy hook are unused, remove
    them. In general ->strategy hook should know what it's doing, and don't
    do something tricky for which, say, pointer to original userspace array
    may be needed (name).

    Signed-off-by: Alexey Dobriyan
    Acked-by: David S. Miller [ networking bits ]
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: Matt Mackall
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

27 Jul, 2008

2 commits

  • Incidentally, the name that gives hundreds of false positives on grep
    is not a good idea...

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

8 commits

  • This patch proposes an alternative to the "magical
    positive-versus-negative number trick" Andrew complained about last week
    in http://lkml.org/lkml/2008/6/24/418.

    This had been introduced with the patches that scale msgmni to the amount
    of lowmem. With these patches, msgmni has a registered notification
    routine that recomputes msgmni value upon memory add/remove or ipc
    namespace creation/ removal.

    When msgmni is changed from user space (i.e. value written to the proc
    file), that notification routine is unregistered, and the way to make it
    registered back is to write a negative value into the proc file. This is
    the "magical positive-versus-negative number trick".

    To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
    This file acts as ON/OFF for msgmni automatic recomputing.

    With this patch, the process is the following:
    1) kernel boots in "automatic recomputing mode"
    /proc/sys/kernel/msgmni contains the value that has been computed (depends
    on lowmem)
    /proc/sys/kernel/automatic_msgmni contains "1"

    2) echo > /proc/sys/kernel/msgmni
    . sets msg_ctlmni to
    . de-activates automatic recomputing (i.e. if, say, some memory is added
    msgmni won't be recomputed anymore)
    . /proc/sys/kernel/automatic_msgmni now contains "0"

    3) echo "0" > /proc/sys/kernel/automatic_msgmni
    . de-activates msgmni automatic recomputing
    this has the same effect as 2) except that msg_ctlmni's value stays
    blocked at its current value)

    3) echo "1" > /proc/sys/kernel/automatic_msgmni
    . recomputes msgmni's value based on the current available memory size
    and number of ipc namespaces
    . re-activates automatic recomputing for msgmni.

    Signed-off-by: Nadia Derbey
    Cc: Solofo Ramangalahy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Also this patch kills unneccesary trailing NULL character.

    Signed-off-by: Akinobu Mita
    Cc: Nadia Derbey
    Cc: Manfred Spraul
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The attached patch:
    - reverses the locking order of ulp->lock and sem_lock:
    Previously, it was first ulp->lock, then inside sem_lock.
    Now it's the other way around.
    - converts the undo structure to rcu.

    Benefits:
    - With the old locking order, IPC_RMID could not kfree the undo structures.
    The stale entries remained in the linked lists and were released later.
    - The patch fixes a a race in semtimedop(): if both IPC_RMID and a semget() that
    recreates exactly the same id happen between find_alloc_undo() and sem_lock,
    then semtimedop() would access already kfree'd memory.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Reviewed-by: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • sem_array.sem_pending is a double linked list, the attached patch converts
    it to struct list_head.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Reviewed-by: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • sem_queue.sma and sem_queue.id were never used, the attached patch removes
    them.

    Signed-off-by: Manfred Spraul
    Reviewed-by: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The undo structures contain two linked lists, the attached patch replaces
    them with generic struct list_head lists.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Remove the ipc_lock_down() routines: they used to call idr_find() locklessly
    (given that the ipc ids lock was already held), so they are not needed
    anymore.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Call idr_find() locklessly from ipc_lock(), since the idr tree is now RCU
    protected.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

25 Jul, 2008

1 commit

  • The goal of this patchset is to support multiple hugetlb page sizes. This
    is achieved by introducing a new struct hstate structure, which
    encapsulates the important hugetlb state and constants (eg. huge page
    size, number of huge pages currently allocated, etc).

    The hstate structure is then passed around the code which requires these
    fields, they will do the right thing regardless of the exact hstate they
    are operating on.

    This patch adds the hstate structure, with a single global instance of it
    (default_hstate), and does the basic work of converting hugetlb to use the
    hstate.

    Future patches will add more hstate structures to allow for different
    hugetlbfs mounts to have different page sizes.

    [akpm@linux-foundation.org: coding-style fixes]
    Acked-by: Adam Litke
    Acked-by: Nishanth Aravamudan
    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

14 Jun, 2008

1 commit


13 Jun, 2008

1 commit

  • sysvipc_shm_proc_show() picks between format strings (based on the
    expected maximum length of a SHM segment) in a way that prevents gcc from
    performing format checks on the seq_printf() parameters. This hid two
    format errors - shp->shm_segsz and shp->shm_nattach are both unsigned
    long, but were being printed as unsigned int and signed int respectively.
    This leads to 32-bit truncation of SHM segment sizes reported in
    /proc/sysvipc/shm. (And for nattach, but that's less of a problem for
    most users).

    This patch makes the format string directly visible to gcc's format
    specifier checker, and fixes the two broken format specifiers.

    Signed-off-by: Paul Menage
    Cc: Nadia Derbey
    Cc: Manfred Spraul
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

10 Jun, 2008

1 commit


07 Jun, 2008

1 commit

  • When posting:
    [PATCH 1/8] Scaling msgmni to the amount of lowmem
    (see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message
    that is output each time msgmni is recomputed.

    In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this
    message references an ipc namespace address that is useless.

    I first thought of using an audit_log instead of a printk, as suggested by
    Serge Hallyn. But unfortunately, we do not have any other information
    than the namespace address to provide here too. So I chose to move the
    message and output it only at boot time, removing the reference to the
    namespace.

    Signed-off-by: Nadia Derbey
    Cc: Pierre Peiffer
    Cc: Manfred Spraul
    Acked-by: Tony Luck
    Cc: "Serge E. Hallyn"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

06 Jun, 2008

1 commit


04 May, 2008

1 commit

  • A very small cleanup for mq_open.

    We do not have to call set_close_on_exit if we create the file
    descriptor right away with the flag set. We have a function for this
    now. The resulting code is smaller and a tiny bit faster.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

29 Apr, 2008

19 commits

  • Use proc_create_data() to make sure that ->proc_fops and ->data be setup
    before gluing PDE to main tree.

    Signed-off-by: Denis V. Lunev
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis V. Lunev
     
  • sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
    cause a kernel memory corruption. CLONE_NEWIPC must detach from the existing
    undo lists.

    Fix, part 1: add support for sys_unshare(CLONE_SYSVSEM)

    The original reason to not support it was the potential (inevitable?)
    confusion due to the fact that sys_unshare(CLONE_SYSVSEM) has the
    inverse meaning of clone(CLONE_SYSVSEM).

    Our two most reasonable options then appear to be (1) fully support
    CLONE_SYSVSEM, or (2) continue to refuse explicit CLONE_SYSVSEM,
    but always do it anyway on unshare(CLONE_SYSVSEM). This patch does
    (1).

    Changelog:
    Apr 16: SEH: switch to Manfred's alternative patch which
    removes the unshare_semundo() function which
    always refused CLONE_SYSVSEM.

    Signed-off-by: Manfred Spraul
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Michael Kerrisk
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
    implementation might also use it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Zhang Yanmin
    Reviewed-by: Christoph Lameter
    Cc: Nadia Derbey
    Cc: "Pierre Peiffer"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     
  • semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
    of commands for each kind of IPC. They all start to do the same job (they
    retrieve the ipc and do some permission checks) before handling the commands
    on their own.

    This patch proposes to consolidate this by moving these same pieces of code
    into one common function called ipcctl_pre_down().

    It simplifies a little these xxxctl_down() functions and increases a little
    the maintainability.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The IPC_SET command performs the same permission setting for all IPCs. This
    patch introduces a common ipc_update_perm() function to update these
    permissions and makes use of it for all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
    command. This is not really needed and, moreover, it complicates a little bit
    the code.

    This patch gets rid of the use of it and uses directly the semid64_ds/
    msgid64_ds/shmid64_ds structure.

    In addition of removing one struture declaration, it also simplifies and
    improves a little bit the common 64-bits path.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • semctl_down() takes one unused parameter: semnum. This patch proposes to get
    rid of it.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • semctl_down is called with the rwmutex (the one which protects the list of
    ipcs) taken in write mode.

    This patch moves this rwmutex taken in write-mode inside semctl_down.

    This has the advantages of reducing a little bit the window during which this
    rwmutex is taken, clarifying sys_semctl, and finally of having a coherent
    behaviour with [shm|msg]ctl_down

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, sys_msgctl is not easy to read.

    This patch tries to improve that by introducing the msgctl_down function to
    handle all commands requiring the rwmutex to be taken in write mode (ie
    IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down
    for message queues.

    This greatly changes the readability of sys_msgctl and also harmonizes the way
    these commands are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, the way the different commands are handled in sys_shmctl introduces
    some duplicated code.

    This patch introduces the shmctl_down function to handle all the commands
    requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for
    now). It is the equivalent function of semctl_down for shared memory.

    This removes some duplicated code for handling these both commands and
    harmonizes the way they are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Trivial patch which adds some small locking functions and makes use of them to
    factorize some part of the code and to make it cleaner.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The enhancement as asked for by Yasunori: if msgmni is set to a negative
    value, register it back into the ipcns notifier chain.

    A new interface has been added to the notification mechanism:
    notifier_chain_cond_register() registers a notifier block only if not already
    registered. With that new interface we avoid taking care of the states
    changes in procfs.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Make msgmni not recomputed anymore upon ipc namespace creation / removal or
    memory add/remove, as soon as it has been set from userland.

    As soon as msgmni is explicitly set via procfs or sysctl(), the associated
    callback routine is unregistered from the ipc namespace notifier chain.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce a notification mechanism that aims at recomputing msgmni each time
    an ipc namespace is created or removed.

    The ipc namespace notifier chain already defined for memory hotplug management
    is used for that purpose too.

    Each time a new ipc namespace is allocated or an existing ipc namespace is
    removed, the ipcns notifier chain is notified. The callback routine for each
    registered ipc namespace is then activated in order to recompute msgmni for
    that namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Make the memory hotplug chain's mutex held for a shorter time: when memory is
    offlined or onlined a work item is added to the global workqueue. When the
    work item is run, it notifies the ipcns notifier chain with the
    IPCNS_MEMCHANGED event.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Since all the namespaces see the same amount of memory (the total one) this
    patch introduces a new variable that counts the ipc namespaces and divides
    msg_ctlmni by this counter.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • On large systems we'd like to allow a larger number of message queues. In
    some cases up to 32K. However simply setting MSGMNI to a larger value may
    cause problems for smaller systems.

    The first patch of this series introduces a default maximum number of message
    queue ids that scales with the amount of lowmem.

    Since msgmni is per namespace and there is no amount of memory dedicated to
    each namespace so far, the second patch of this series scales msgmni to the
    number of ipc namespaces too.

    Since msgmni depends on the amount of memory, it becomes necessary to
    recompute it upon memory add/remove. In the 4th patch, memory hotplug
    management is added: a notifier block is registered into the memory hotplug
    notifier chain for the ipc subsystem. Since the ipc namespaces are not linked
    together, they have their own notification chain: one notifier_block is
    defined per ipc namespace. Each time an ipc namespace is created (removed) it
    registers (unregisters) its notifier block in (from) the ipcns chain. The
    callback routine registered in the memory chain invokes the ipcns notifier
    chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the
    ipcns namespace, in turn, recomputes msgmni for the owning namespace.

    The 5th patch makes it possible to keep the memory hotplug notifier chain's
    lock for a lesser amount of time: instead of directly notifying the ipcns
    notifier chain upon memory add/remove, a work item is added to the global
    workqueue. When activated, this work item is the one who notifies the ipcns
    notifier chain.

    Since msgmni depends on the number of ipc namespaces, it becomes necessary to
    recompute it upon ipc namespace creation / removal. The 6th patch uses the
    ipc namespace notifier chain for that purpose: that chain is notified each
    time an ipc namespace is created or removed. This makes it possible to
    recompute msgmni for all the namespaces each time one of them is created or
    removed.

    When msgmni is explicitely set from userspace, we should avoid recomputing it
    upon memory add/remove or ipcns creation/removal. This is what the 7th patch
    does: it simply unregisters the ipcns callback routine as soon as msgmni has
    been changed from procfs or sysctl().

    Even if msgmni is set by hand, it should be possible to make it back
    automatically recomputed upon memory add/remove or ipcns creation/removal.
    This what is achieved in patch 8: if set to a negative value, msgmni is added
    back to the ipcns notifier chain, making it automatically recomputed again.

    This patch:

    Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is
    now set to make the message queues occupy 1/32 of the available lowmem.

    Some cleaning has also been done for the MSGPOOL constant: the msgctl man page
    says it's not used, but it also defines it as a size in bytes (the code
    expresses it in Kbytes).

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • By continuing to consolidate a little the IPC code, each id can be built
    directly in ipc_addid() instead of having it built from each callers of
    ipc_addid()

    And I also remove shm_addid() in order to have, as much as possible, the
    same code for shm/sem/msg.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     

28 Apr, 2008

2 commits

  • After further discussion with Christoph Lameter, it has become clear that my
    earlier attempts to clean up the mempolicy reference counting were a bit of
    overkill in some areas, resulting in superflous ref/unref in what are usually
    fast paths. In other areas, further inspection reveals that I botched the
    unref for interleave policies.

    A separate patch, suitable for upstream/stable trees, fixes up the known
    errors in the previous attempt to fix reference counting.

    This patch reworks the memory policy referencing counting and, one hopes,
    simplifies the code. Maybe I'll get it right this time.

    See the update to the numa_memory_policy.txt document for a discussion of
    memory policy reference counting that motivates this patch.

    Summary:

    Lookup of mempolicy, based on (vma, address) need only add a reference for
    shared policy, and we need only unref the policy when finished for shared
    policies. So, this patch backs out all of the unneeded extra reference
    counting added by my previous attempt. It then unrefs only shared policies
    when we're finished with them, using the mpol_cond_put() [conditional put]
    helper function introduced by this patch.

    Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
    containing just the policy. read_swap_cache_async() can call alloc_page_vma()
    multiple times, so we can't let alloc_page_vma() unref the shared policy in
    this case. To avoid this, we make a copy of any non-null shared policy and
    remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading
    a page [or multiple pages] from swap, so the overhead should not be an issue
    here.

    I introduced a new static inline function "mpol_cond_copy()" to copy the
    shared policy to an on-stack policy and remove the flags that would require a
    conditional free. The current implementation of mpol_cond_copy() assumes that
    the struct mempolicy contains no pointers to dynamically allocated structures
    that must be duplicated or reference counted during copy.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • get_vma_policy() is not handling fallback to task policy correctly when the
    get_policy() vm_op returns NULL. The NULL overwrites the 'pol' variable that
    was holding the fallback task mempolicy. So, it was falling back directly to
    system default policy.

    Fix get_vma_policy() to use only non-NULL policy returned from the vma
    get_policy op.

    shm_get_policy() was falling back to current task's mempolicy if the "backing
    file system" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and
    the vma policy is null. This is incorrect for show_numa_maps() which is
    likely querying the numa_maps of some task other than current. Remove this
    fallback.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn