25 May, 2011

40 commits

  • This function has been superseded by gather_hugetbl_stats() and is no
    longer needed.

    Signed-off-by: Stephen Wilson
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Lee Schermerhorn
    Cc: Alexey Dobriyan
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Wilson
     
  • Improve the prototype of gather_stats() to take a struct numa_maps as
    argument instead of a generic void *. Update all callers to make the
    required type explicit.

    Since gather_stats() is not needed before its definition and is scheduled
    to be moved out of mempolicy.c the declaration is removed as well.

    Signed-off-by: Stephen Wilson
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Lee Schermerhorn
    Cc: Alexey Dobriyan
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Wilson
     
  • Mapping statistics in a NUMA environment is now computed using the generic
    walk_page_range() logic. Remove the old/equivalent functionality.

    Signed-off-by: Stephen Wilson
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Lee Schermerhorn
    Cc: Alexey Dobriyan
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Wilson
     
  • Converting show_numa_map() to use the generic routine decouples the
    function from mempolicy.c, allowing it to be moved out of the mm subsystem
    and into fs/proc.

    Also, include KSM pages in /proc/pid/numa_maps statistics. The pagewalk
    logic implemented by check_pte_range() failed to account for such pages as
    they were not applicable to the page migration case.

    Signed-off-by: Stephen Wilson
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Lee Schermerhorn
    Cc: Alexey Dobriyan
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Wilson
     
  • In commit 48fce3429d ("mempolicies: unexport get_vma_policy()")
    get_vma_policy() was marked static as all clients were local to
    mempolicy.c.

    However, the decision to generate /proc/pid/numa_maps in the numa memory
    policy code and outside the procfs subsystem introduces an artificial
    interdependency between the two systems. Exporting get_vma_policy() once
    again is the first step to clean up this interdependency.

    Signed-off-by: Stephen Wilson
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Lee Schermerhorn
    Cc: Alexey Dobriyan
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Wilson
     
  • Remove noMMU declaration of shmem_get_unmapped_area() from mm.h: it fell
    out of use in 2.6.21 and ceased to exist in 2.6.29.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Implement generic xattrs for tmpfs filesystems. The Feodra project, while
    trying to replace suid apps with file capabilities, realized that tmpfs,
    which is used on the build systems, does not support file capabilities and
    thus cannot be used to build packages which use file capabilities. Xattrs
    are also needed for overlayfs.

    The xattr interface is a bit odd. If a filesystem does not implement any
    {get,set,list}xattr functions the VFS will call into some random LSM hooks
    and the running LSM can then implement some method for handling xattrs.
    SELinux for example provides a method to support security.selinux but no
    other security.* xattrs.

    As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
    xattr handler routines specifically to handle acls. Because of this tmpfs
    would loose the VFS/LSM helpers to support the running LSM. To make up
    for that tmpfs had stub functions that did nothing but call into the LSM
    hooks which implement the helpers.

    This new patch does not use the LSM fallback functions and instead just
    implements a native get/set/list xattr feature for the full security.* and
    trusted.* namespace like a normal filesystem. This means that tmpfs can
    now support both security.selinux and security.capability, which was not
    previously possible.

    The basic implementation is that I attach a:

    struct shmem_xattr {
    struct list_head list; /* anchored by shmem_inode_info->xattr_list */
    char *name;
    size_t size;
    char value[0];
    };

    Into the struct shmem_inode_info for each xattr that is set. This
    implementation could easily support the user.* namespace as well, except
    some care needs to be taken to prevent large amounts of unswappable memory
    being allocated for unprivileged users.

    [mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
    Signed-off-by: Eric Paris
    Signed-off-by: Miklos Szeredi
    Acked-by: Serge Hallyn
    Tested-by: Serge Hallyn
    Cc: Kyle McMartin
    Acked-by: Hugh Dickins
    Tested-by: Jordi Pujol
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • The bootmem wrapper with memblock supports top-down now, so we no longer
    need this trick.

    Signed-off-by: Yinghai LU
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Olaf Hering
    Cc: Tejun Heo
    Cc: Lucas De Marchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • The bootmem wrapper with memblock supports top-down now, so we do not need
    to set the low limit to __pa(MAX_DMA_ADDRESS).

    The logic should be: good to allocate above __pa(MAX_DMA_ADDRESS), but it
    is ok if we can not find memory above 16M on system that has a small
    amount of RAM.

    Signed-off-by: Yinghai LU
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Olaf Hering
    Cc: Tejun Heo
    Cc: Lucas De Marchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • The problem with having two different types of counters is that developers
    adding new code need to keep in mind whether it's safe to use both the
    atomic and non-atomic implementations. For example, when adding new
    callers of the *_mm_counter() functions a developer needs to ensure that
    those paths are always executed with page_table_lock held, in case we're
    using the non-atomic implementation of mm counters.

    Hugh Dickins introduced the atomic mm counters in commit f412ac08c986
    ("[PATCH] mm: fix rss and mmlist locking"). When asked why he left the
    non-atomic counters around he said,

    | The only reason was to avoid adding costly atomic operations into a
    | configuration that had no need for them there: the page_table_lock
    | sufficed.
    |
    | Certainly it would be simpler just to delete the non-atomic variant.
    |
    | And I think it's fair to say that any configuration on which we're
    | measuring performance to that degree (rather than "does it boot fast?"
    | type measurements), would already be going the split ptlocks route.

    Removing the non-atomic counters eases the maintenance burden because
    developers no longer have to mindful of the two implementations when using
    *_mm_counter().

    Note that all architectures provide a means of atomically updating
    atomic_long_t variables, even if they have to revert to the generic
    spinlock implementation because they don't support 64-bit atomic
    instructions (see lib/atomic64.c).

    Signed-off-by: Matt Fleming
    Acked-by: Dave Hansen
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Fleming
     
  • The page allocator will improperly return a page from ZONE_NORMAL even
    when __GFP_DMA is passed if CONFIG_ZONE_DMA is disabled. The caller
    expects DMA memory, perhaps for ISA devices with 16-bit address registers,
    and may get higher memory resulting in undefined behavior.

    This patch causes the page allocator to return NULL in such circumstances
    with a warning emitted to the kernel log on the first occurrence.

    Signed-off-by: David Rientjes
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Do not define PFN_SECTION_SHIFT if !CONFIG_SPARSEMEM.

    Signed-off-by: Daniel Kiper
    Acked-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     
  • pfn_to_section_nr()/section_nr_to_pfn() is valid only in CONFIG_SPARSEMEM
    context. Move it to proper place.

    Signed-off-by: Daniel Kiper
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     
  • set_page_section() is meaningful only in CONFIG_SPARSEMEM and
    !CONFIG_SPARSEMEM_VMEMMAP context. Move it to proper place and amend
    accordingly functions which are using it.

    Signed-off-by: Daniel Kiper
    Acked-by: Dave Hansen
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     
  • online_pages() is only compiled for CONFIG_MEMORY_HOTPLUG_SPARSE, so there
    is no need to support CONFIG_FLATMEM code within it.

    This patch removes code that is never used.

    Signed-off-by: Daniel Kiper
    Acked-by: Dave Hansen
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     
  • It's pointless that deactive_page's operates on unevictable pages. This
    patch removes unnecessary overhead which might be a bit problem in case
    that there are many unevictable page in system(ex, mprotect workload)

    [akpm@linux-foundation.org: tidy up comment]
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Minchan Kim
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Previously the mmap sequential readahead is triggered by updating
    ra->prev_pos on each page fault and compare it with current page offset.

    It costs dirtying the cache line on each _minor_ page fault. So remove
    the ra->prev_pos recording, and instead tag PG_readahead to trigger the
    possible sequential readahead. It's not only more simple, but also will
    work more reliably and reduce cache line bouncing on concurrent page
    faults on shared struct file.

    In the mosbench exim benchmark which does multi-threaded page faults on
    shared struct file, the ra->mmap_miss and ra->prev_pos updates are found
    to cause excessive cache line bouncing on tmpfs, which actually disabled
    readahead totally (shmem_backing_dev_info.ra_pages == 0).

    So remove the ra->prev_pos recording, and instead tag PG_readahead to
    trigger the possible sequential readahead. It's not only more simple, but
    also will work more reliably on concurrent reads on shared struct file.

    Signed-off-by: Wu Fengguang
    Tested-by: Tim Chen
    Reported-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • The original INT_MAX is too large, reduce it to

    - avoid unnecessarily dirtying/bouncing the cache line

    - restore mmap read-around faster on changed access pattern

    Background: in the mosbench exim benchmark which does multi-threaded page
    faults on shared struct file, the ra->mmap_miss updates are found to cause
    excessive cache line bouncing on tmpfs. The ra state updates are needless
    for tmpfs because it actually disabled readahead totally
    (shmem_backing_dev_info.ra_pages == 0).

    Tested-by: Tim Chen
    Signed-off-by: Andi Kleen
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Reduce readahead overheads by returning early in do_sync_mmap_readahead().

    tmpfs has ra_pages=0 and it can page fault really fast (not constraint by
    IO if not swapping).

    Signed-off-by: Wu Fengguang
    Tested-by: Tim Chen
    Reported-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Change each shrinker's API by consolidating the existing parameters into
    shrink_control struct. This will simplify any further features added w/o
    touching each file of shrinker.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: fix warning]
    [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
    [akpm@linux-foundation.org: fix xfs warning]
    [akpm@linux-foundation.org: update gfs2]
    Signed-off-by: Ying Han
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Acked-by: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     
  • Consolidate the existing parameters to shrink_slab() into a new
    shrink_control struct. This is needed later to pass the same struct to
    shrinkers.

    Signed-off-by: Ying Han
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Acked-by: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     
  • Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.

    readahead page allocations are completely optional. They are OK to fail
    and in particular shall not trigger OOM on themselves.

    Reported-by: Dave Young
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Wu Fengguang
    Reviewed-by: Minchan Kim
    Reviewed-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • This fixes a problem where the first pageblock got marked MIGRATE_RESERVE
    even though it only had a few free pages. eg, On current ARM port, The
    kernel starts at offset 0x8000 to leave room for boot parameters, and the
    memory is freed later.

    This in turn caused no contiguous memory to be reserved and frequent
    kswapd wakeups that emptied the caches to get more contiguous memory.

    Unfortunatelly, ARM needs order-2 allocation for pgd (see
    arm/mm/pgd.c#pgd_alloc()). Therefore the issue is not minor nor easy
    avoidable.

    [kosaki.motohiro@jp.fujitsu.com: added some explanation]
    [kosaki.motohiro@jp.fujitsu.com: add !pfn_valid_within() to check]
    [minchan.kim@gmail.com: check end_pfn in pageblock_is_reserved]
    Signed-off-by: John Stultz
    Signed-off-by: Arve Hjønnevåg
    Signed-off-by: KOSAKI Motohiro
    Acked-by: Mel Gorman
    Acked-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arve Hjønnevåg
     
  • For m32r, N_NORMAL_MEMORY represents all nodes that have present memory
    since it does not support HIGHMEM. This patch sets the bit at the time
    the node is initialized.

    If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
    uses this nodemask to setup per-cache kmem_cache_node data structures.

    Signed-off-by: David Rientjes
    Cc: Hirokazu Takata
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • For alpha, N_NORMAL_MEMORY represents all nodes that have present memory
    since it does not support HIGHMEM. This patch sets the bit at the time
    the node is initialized.

    If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
    uses this nodemask to setup per-cache kmem_cache_node data structures.

    Signed-off-by: David Rientjes
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • isolate_lru_page() must be called only with stable reference to the page,
    this is what is written in the comment above it, this is reasonable.

    current isolate_lru_page() users and its page extra reference sources:

    mm/huge_memory.c:
    __collapse_huge_page_isolate() - reference from pte

    mm/memcontrol.c:
    mem_cgroup_move_parent() - get_page_unless_zero()
    mem_cgroup_move_charge_pte_range() - reference from pte

    mm/memory-failure.c:
    soft_offline_page() - fixed, reference from get_any_page()
    delete_from_lru_cache() - reference from caller or get_page_unless_zero()
    [ seems like there bug, because __memory_failure() can call
    page_action() for hpages tail, but it is ok for
    isolate_lru_page(), tail getted and not in lru]

    mm/memory_hotplug.c:
    do_migrate_range() - fixed, get_page_unless_zero()

    mm/mempolicy.c:
    migrate_page_add() - reference from pte

    mm/migrate.c:
    do_move_page_to_node_array() - reference from follow_page()

    mlock.c: - various external references

    mm/vmscan.c:
    putback_lru_page() - reference from isolate_lru_page()

    It seems that all isolate_lru_page() users are ready now for this
    restriction. So, let's replace redundant get_page_unless_zero() with
    get_page() and add page initial reference count check with VM_BUG_ON()

    Signed-off-by: Konstantin Khlebnikov
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Drop first page reference only after calling isolate_lru_page() to keep
    page stable reference while isolating.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • isolate_lru_page() must be called only with stable reference to page. So,
    let's grab normal page reference.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • I was tracking down a page allocation failure that ended up in vmalloc().
    Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
    of memory, we'll still get a warning with "order:0" in it. That's not
    very useful.

    During recovery, vmalloc() also nicely frees all of the memory that it got
    up to the point of the failure. That is wonderful, but it also quickly
    hides any issues. We have a much different sitation if vmalloc()
    repeatedly fails 10GB in to:

    vmalloc(100 * 1<] warn_alloc_failed+0x146/0x170
    [ 68.126464] [] ? printk+0x6c/0x70
    [ 68.126791] [] ? alloc_pages_current+0x94/0xe0
    [ 68.127661] [] __vmalloc_node_range+0x237/0x290
    ...

    The 'order' variable is added for clarity when calling warn_alloc_failed()
    to avoid having an unexplained '0' as an argument.

    The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take
    us over 80 columns for the alloc_pages_node() call. If we are going to
    add a line, it might as well be one that makes the sucker easier to read.

    As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
    solely on an unverified value passed in from userspace. Granted, it's
    under CAP_SYS_ADMIN, but it still frightens me a bit.

    Signed-off-by: Dave Hansen
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: Michal Nazarewicz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This originally started as a simple patch to give vmalloc() some more
    verbose output on failure on top of the plain page allocator messages.
    Johannes suggested that it might be nicer to lead with the vmalloc() info
    _before_ the page allocator messages.

    But, I do think there's a lot of value in what __alloc_pages_slowpath()
    does with its filtering and so forth.

    This patch creates a new function which other allocators can call instead
    of relying on the internal page allocator warnings. It also gives this
    function private rate-limiting which separates it from other
    printk_ratelimit() users.

    Signed-off-by: Dave Hansen
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: Michal Nazarewicz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
    It might lead to reduce cache hit ratio.

    This patch has two change.
    1) Move the place of cpumask into last of mm_struct. Because usually cpumask
    is accessed only front bits when the system has cpu-hotplug capability
    2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
    footprint if cpumask_size() will use nr_cpumask_bits properly in future.

    In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
    It may help to detect out of tree cpu_vm_mask users.

    This patch has no functional change.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KOSAKI Motohiro
    Cc: David Howells
    Cc: Koichi Yasutake
    Cc: Hugh Dickins
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • We don't need to hold the mmmap_sem through mem_cgroup_newpage_charge(),
    the mmap_sem is only hold for keeping the vma stable and we don't need the
    vma stable anymore after we return from alloc_hugepage_vma().

    Signed-off-by: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Some of these functions have grown beyond inline sanity, move them
    out-of-line.

    Signed-off-by: Peter Zijlstra
    Requested-by: Andrew Morton
    Requested-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Optimize the page_lock_anon_vma() fast path to be one atomic op, instead
    of two.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Straightforward conversion of anon_vma->lock to a mutex.

    Signed-off-by: Peter Zijlstra
    Acked-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Convert page_lock_anon_vma() over to use refcounts. This is done to
    prepare for the conversion of anon_vma from spinlock to mutex.

    Sadly this inceases the cost of page_lock_anon_vma() from one to two
    atomics, a follow up patch addresses this, lets keep that simple for now.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: KOSAKI Motohiro
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: Mel Gorman
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • A slightly more verbose comment to go along with the trickery in
    page_lock_anon_vma().

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KOSAKI Motohiro
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Mel Gorman
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Its beyond ugly and gets in the way.

    Signed-off-by: Peter Zijlstra
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Namhyung Kim
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Straightforward conversion of i_mmap_lock to a mutex.

    Signed-off-by: Peter Zijlstra
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Hugh says:
    "The only significant loser, I think, would be page reclaim (when
    concurrent with truncation): could spin for a long time waiting for
    the i_mmap_mutex it expects would soon be dropped? "

    Counter points:
    - cpu contention makes the spin stop (need_resched())
    - zap pages should be freeing pages at a higher rate than reclaim
    ever can

    I think the simplification of the truncate code is definitely worth it.

    Effectively reverts: 2aa15890f3c ("mm: prevent concurrent
    unmap_mapping_range() on the same inode") and takes out the code that
    caused its problem.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra