07 Aug, 2016

1 commit

  • Pull documentation fixes from Jonathan Corbet:
    "Three fixes for the docs build, including removing an annoying warning
    on 'make help' if sphinx isn't present"

    * tag 'doc-4.8-fixes' of git://git.lwn.net/linux:
    DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1
    Documenation: update cgroup's document path
    Documentation/sphinx: do not warn about missing tools in 'make help'

    Linus Torvalds
     

04 Aug, 2016

1 commit


27 Jul, 2016

1 commit


06 Nov, 2015

2 commits

  • KernelThreadSanitizer (ktsan) has shown that the down_read_trylock() of
    mmap_sem in try_to_unmap_one() (when going to set PageMlocked on a page
    found mapped in a VM_LOCKED vma) is ineffective against races with
    exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not held when
    tearing down an mm.

    But that's okay, those races are benign; and although we've believed for
    years in that ugly down_read_trylock(), it's unsuitable for the job, and
    frustrates the good intention of setting PageMlocked when it fails.

    It just doesn't matter if here we read vm_flags an instant before or after
    a racing mlock() or munlock() or exit_mmap() sets or clears VM_LOCKED: the
    syscalls (or exit) work their way up the address space (taking pt locks
    after updating vm_flags) to establish the final state.

    We do still need to be careful never to mark a page Mlocked (hence
    unevictable) by any race that will not be corrected shortly after. The
    page lock protects from many of the races, but not all (a page is not
    necessarily locked when it's unmapped). But the pte lock we just dropped
    is good to cover the rest (and serializes even with
    munlock_vma_pages_all(), so no special barriers required): now hold on to
    the pte lock while calling mlock_vma_page(). Is that lock ordering safe?
    Yes, that's how follow_page_pte() calls it, and how page_remove_rmap()
    calls the complementary clear_page_mlock().

    This fixes the following case (though not a case which anyone has
    complained of), which mmap_sem did not: truncation's preliminary
    unmap_mapping_range() is supposed to remove even the anonymous COWs of
    filecache pages, and that might race with try_to_unmap_one() on a
    VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
    zap_pte_range() unmaps the page, causing "Bad page state (mlocked)" when
    freed. The pte lock protects against this.

    You could say that it also protects against the more ordinary case, racing
    with the preliminary unmapping of a filecache page itself: but in our
    current tree, that's independently protected by i_mmap_rwsem; and that
    race would be why "Bad page state (mlocked)" was seen before commit
    48ec833b7851 ("Revert mm/memory.c: share the i_mmap_rwsem").

    Vlastimil Babka points out another race which this patch protects against.
    try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
    moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
    leaving PageMlocked and unevictable when it should be evictable. mmap_sem
    is ineffective because exit_mmap() does not hold it; page lock ineffective
    because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
    is effective because __munlock_pagevec_fill() takes it to get the page,
    after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.

    Kirill Shutemov points out that if the compiler chooses to implement a
    "vma->vm_flags &= VM_WHATEVER" or "vma->vm_flags |= VM_WHATEVER" operation
    with an intermediate store of unrelated bits set, since I'm here foregoing
    its usual protection by mmap_sem, try_to_unmap_one() might catch sight of
    a spurious VM_LOCKED in vm_flags, and make the wrong decision. This does
    not appear to be an immediate problem, but we may want to define vm_flags
    accessors in future, to guard against such a possibility.

    While we're here, make a related optimization in try_to_munmap_one(): if
    it's doing TTU_MUNLOCK, then there's no point at all in descending the
    page tables and getting the pt lock, unless the vma is VM_LOCKED. Yes,
    that can change racily, but it can change racily even without the
    optimization: it's not critical. Far better not to waste time here.

    Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
    on this occasion, but that's probably the sensible next step - with a
    rename, given that try_to_munlock()'s business is to try to set Mlocked.

    Updated the unevictable-lru Documentation, to remove its reference to mmap
    semaphore, but found a few more updates needed in just that area.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Lameter
    Cc: "Kirill A. Shutemov"
    Cc: Rik van Riel
    Acked-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: Oleg Nesterov
    Cc: Sasha Levin
    Cc: Dmitry Vyukov
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • While updating some mm Documentation, I came across a few straggling
    references to the non-linear vmas which were happily removed in v4.0.
    Delete them.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Lameter
    Cc: "Kirill A. Shutemov"
    Cc: Rik van Riel
    Acked-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: Oleg Nesterov
    Cc: Sasha Levin
    Cc: Dmitry Vyukov
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

25 Jun, 2015

1 commit

  • There is a very subtle difference between mmap()+mlock() vs
    mmap(MAP_LOCKED) semantic. The former one fails if the population of the
    area fails while the later one doesn't. This basically means that
    mmap(MAPLOCKED) areas might see major fault after mmap syscall returns
    which is not the case for mlock. mmap man page has already been altered
    but Documentation/vm/unevictable-lru.txt deserves a clarification as well.

    Signed-off-by: Michal Hocko
    Reported-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

16 Apr, 2015

1 commit

  • …d the unevictable LRU

    The memory compaction code uses the migration code to do most of the
    work in compaction. However, the compaction code interacts with the
    unevictable LRU differently than migration code and this difference
    should be noted in the documentation.

    [akpm@linux-foundation.org: identify /proc/sys/vm/compact_unevictable directly]
    Signed-off-by: Eric B Munson <emunson@akamai.com>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Eric B Munson
     

15 Apr, 2015

1 commit

  • __mlock_vma_pages_range() doesn't necessarily mlock pages. It depends on
    vma flags. The same codepath is used for MAP_POPULATE.

    Let's rename __mlock_vma_pages_range() to populate_vma_page_range().

    This patch also drops mlock_vma_pages_range() references from
    documentation. It has gone in cea10a19b797 ("mm: directly use
    __mlock_vma_pages_range() in find_extend_vma()").

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Acked-by: David Rientjes
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

21 Mar, 2014

1 commit


09 Oct, 2012

2 commits

  • page_evictable(page, vma) is an irritant: almost all its callers pass
    NULL for vma. Remove the vma arg and use mlocked_vma_newpage(vma, page)
    explicitly in the couple of places it's needed. But in those places we
    don't even need page_evictable() itself! They're dealing with a freshly
    allocated anonymous page, which has no "mapping" and cannot be mlocked yet.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Michel Lespinasse
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
    currently it lost original meaning but still has some effects:

    | effect | alternative flags
    -+------------------------+---------------------------------------------
    1| account as reserved_vm | VM_IO
    2| skip in core dump | VM_IO, VM_DONTDUMP
    3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
    4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

    This patch removes reserved_vm counter from mm_struct. Seems like nobody
    cares about it, it does not exported into userspace directly, it only
    reduces total_vm showed in proc.

    Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

    remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
    remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

    [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

07 Mar, 2012

1 commit


10 Feb, 2012

1 commit


17 Mar, 2011

1 commit


14 Apr, 2009

1 commit


07 Jan, 2009

1 commit

  • An unfortunate feature of the Unevictable LRU work was that reclaiming an
    anonymous page involved an extra scan through the anon_vma: to check that
    the page is evictable before allocating swap, because the swap could not
    be freed reliably soon afterwards.

    Now try_to_free_swap() has replaced remove_exclusive_swap_page(), that's
    not an issue any more: remove try_to_munlock() call from
    shrink_page_list(), leaving it to try_to_munmap() to discover if the page
    is one to be culled to the unevictable list - in which case then
    try_to_free_swap().

    Update unevictable-lru.txt to remove comments on the try_to_munlock() in
    shrink_page_list(), and shorten some lines over 80 columns.

    Signed-off-by: Hugh Dickins
    Cc: Lee Schermerhorn
    Acked-by: Rik van Riel
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

20 Oct, 2008

1 commit