30 May, 2012

40 commits

  • Update Documentation/vm/transhuge.txt and
    Documentation/filesystems/proc.txt with some information on monitoring
    transparent huge page usage and the associated overhead.

    Signed-off-by: Mel Gorman
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • - make pageflag_names[] const

    - remove null termination of pageflag_names[]

    Cc: Johannes Weiner
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • String tables with names of enum items are always prone to go out of
    sync with the enums themselves. Ensure during compile time that the
    name table of page flags has the same size as the page flags enum.

    Signed-off-by: Johannes Weiner
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The array pageflag_names[] does conversion from page flags into their
    corresponding names so that a meaningful representation of the
    corresponding page flag can be printed. This mechanism is used while
    dumping page frames. However, the array missed PG_compound_lock. So
    the PG_compound_lock page flag would be printed as a digital number
    instead of a meaningful string.

    The patch fixes that and prints "compound_lock" for the PG_compound_lock
    page flag.

    Signed-off-by: Gavin Shan
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • It is better to define readahead(2) in mm/readahead.c than in
    mm/filemap.c.

    Signed-off-by: Cong Wang
    Cc: Fengguang Wu
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cong Wang
     
  • It's quite easy for tmpfs to scan the radix_tree to support llseek's new
    SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are still
    on my mind (in particular, the !PageUptodate-ness of pages fallocated but
    still unwritten).

    But I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
    would be of any use to them on tmpfs. This code adds 92 lines and 752
    bytes on x86_64 - is that bloat or worthwhile?

    [akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Josef Bacik
    Cc: Andi Kleen
    Cc: Andreas Dilger
    Cc: Dave Chinner
    Cc: Marco Stornelli
    Cc: Jeff liu
    Cc: Chris Mason
    Cc: Sunil Mushran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • As it stands, a large fallocate() on tmpfs is liable to fill memory with
    pages, freed on failure except when they run into swap, at which point
    they become fixed into the file despite the failure. That feels quite
    wrong, to be consuming resources precisely when they're in short supply.

    Go the other way instead: shmem_fallocate() indicate the range it has
    fallocated to shmem_writepage(), keeping count of pages it's allocating;
    shmem_writepage() reactivate instead of swapping out pages fallocated by
    this syscall (but happily swap out those from earlier occasions), keeping
    count; shmem_fallocate() compare counts and give up once the reactivated
    pages have started to coming back to writepage (approximately: some zones
    would in fact recycle faster than others).

    This is a little unusual, but works well: although we could consider the
    failure to swap as a bug, and fix it later with SWAP_MAP_FALLOC handling
    added in swapfile.c and memcontrol.c, I doubt that we shall ever want to.

    (If there's no swap, an over-large fallocate() on tmpfs is limited in the
    same way as writing: stopped by rlimit, or by tmpfs mount size if that was
    set sensibly, or by __vm_enough_memory() heuristics if OVERCOMMIT_GUESS or
    OVERCOMMIT_NEVER. If OVERCOMMIT_ALWAYS, then it is liable to OOM-kill
    others as writing would, but stops and frees if interrupted.)

    Now that everything is freed on failure, we can then skip updating ctime.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Cong Wang
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • In the previous episode, we left the already-fallocated pages attached to
    the file when shmem_fallocate() fails part way through.

    Now try to do better, by extending the earlier optimization of !Uptodate
    pages (then always under page lock) to !Uptodate pages (outside of page
    lock), representing fallocated pages. And don't waste time clearing them
    at the time of fallocate(), leave that until later if necessary.

    Adapt shmem_truncate_range() to shmem_undo_range(), so that a failing
    fallocate can recognize and remove precisely those !Uptodate allocations
    which it added (and were not independently allocated by racing tasks).

    But unless we start playing with swapfile.c and memcontrol.c too, once one
    of our fallocated pages reaches shmem_writepage(), we do then have to
    instantiate it as an ordinarily allocated page, before swapping out. This
    is unsatisfactory, but improved in the next episode.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Cong Wang
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The systemd plumbers expressed a wish that tmpfs support preallocation.
    Cong Wang wrote a patch, but several kernel guys expressed scepticism:
    https://lkml.org/lkml/2011/11/18/137

    Christoph Hellwig: What for exactly? Please explain why preallocating on
    tmpfs would make any sense.

    Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files
    on the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or
    -EOPNOTSUPP] on fallocate is just ugly.

    Hugh Dickins: If tmpfs is going to support
    fallocate(FALLOC_FL_PUNCH_HOLE), it would seem perverse to permit the
    deallocation but fail the allocation. Christoph Hellwig: Agreed.

    Now that we do have shmem_fallocate() for hole-punching, plumb in basic
    support for preallocation mode too. It's fairly straightforward (though
    quite a few details needed attention), except for when it fails part way
    through. What a pity that fallocate(2) was not specified to return the
    length allocated, permitting short fallocations!

    As it is, when it fails part way through, we ought to free what has just
    been allocated by this system call; but must be very sure not to free any
    allocated earlier, or any allocated by racing accesses (not all excluded
    by i_mutex).

    But we cannot distinguish them: so in this patch simply leak allocations
    on partial failure (they will be freed later if the file is removed).

    An attractive alternative approach would have been for fallocate() not to
    allocate pages at all, but note reservations by entries in the radix-tree.
    But that would give less assurance, and, critically, would be hard to fit
    with mem cgroups (who owns the reservations?): allocating pages lets
    fallocate() behave in just the same way as write().

    Based-on-patch-by: Cong Wang
    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Cong Wang
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove vmtruncate_range(), and remove the truncate_range method from
    struct inode_operations: only tmpfs ever supported it, and tmpfs has now
    converted over to using the fallocate method of file_operations.

    Update Documentation accordingly, adding (setlease and) fallocate lines.
    And while we're in mm.h, remove duplicate declarations of shmem_lock() and
    shmem_file_setup(): everyone is now using the ones in shmem_fs.h.

    Based-on-patch-by: Cong Wang
    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Cong Wang
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
    to use do_fallocate() instead of vmtruncate_range(): which extends
    madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.

    There is one more user of vmtruncate_range() in our tree,
    staging/android's ashmem_shrink(): convert it to use do_fallocate() too
    (but if its unpinned areas are already unmapped - I don't know - then it
    would do better to use shmem_truncate_range() directly).

    Based-on-patch-by: Cong Wang
    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Colin Cross
    Cc: John Stultz
    Cc: Greg Kroah-Hartman
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Dave Chinner
    Cc: Ben Myers
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • tmpfs has supported hole-punching since 2.6.16, via
    madvise(,,MADV_REMOVE).

    But nowadays fallocate(,FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,,) is
    the agreed way to punch holes.

    So add shmem_fallocate() to support that, and tweak shmem_truncate_range()
    to support partial pages at both the beginning and end of range (never
    needed for madvise, which demands rounded addr and rounds up length).

    Based-on-patch-by: Cong Wang
    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Cong Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Nick proposed years ago that tmpfs should avoid clearing its pages where
    write will overwrite them with new data, as ramfs has long done. But I
    messed it up and just got bad data. Tried again recently, it works
    fine.

    Here's time output for writing 4GiB 16 times on this Core i5 laptop:

    before: real 0m21.169s user 0m0.028s sys 0m21.057s
    real 0m21.382s user 0m0.016s sys 0m21.289s
    real 0m21.311s user 0m0.020s sys 0m21.217s

    after: real 0m18.273s user 0m0.032s sys 0m18.165s
    real 0m18.354s user 0m0.020s sys 0m18.265s
    real 0m18.440s user 0m0.032s sys 0m18.337s

    ramfs: real 0m16.860s user 0m0.028s sys 0m16.765s
    real 0m17.382s user 0m0.040s sys 0m17.273s
    real 0m17.133s user 0m0.044s sys 0m17.021s

    Yes, I have done perf reports, but they need more explanation than they
    deserve: in summary, clear_page vanishes, its cache loading shifts into
    copy_user_generic_unrolled; shmem_getpage_gfp goes down, and
    surprisingly mark_page_accessed goes way up - I think because they are
    respectively where the cache gets to be reloaded after being purged by
    clear or copy.

    Suggested-by: Nick Piggin
    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Let tmpfs into the NOSEC optimization (avoiding file_remove_suid()
    overhead on most common writes): set MS_NOSEC on its superblocks.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Andi Kleen
    Cc: Al Viro
    Cc: Cong Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The GMA500 GPU driver uses GEM shmem objects, but with a new twist: the
    backing RAM has to be below 4GB. Not a problem while the boards
    supported only 4GB: but now Intel's D2700MUD boards support 8GB, and
    their GMA3600 is managed by the GMA500 driver.

    shmem/tmpfs has never pretended to support hardware restrictions on the
    backing memory, but it might have appeared to do so before v3.1, and
    even now it works fine until a page is swapped out then back in. When
    read_cache_page_gfp() supplied a freshly allocated page for copy, that
    compensated for whatever choice might have been made by earlier swapin
    readahead; but swapoff was likely to destroy the illusion.

    We'd like to continue to support GMA500, so now add a new
    shmem_should_replace_page() check on the zone when about to move a page
    from swapcache to filecache (in swapin and swapoff cases), with
    shmem_replace_page() to allocate and substitute a suitable page (given
    gma500/gem.c's mapping_set_gfp_mask GFP_KERNEL | __GFP_DMA32).

    This does involve a minor extension to mem_cgroup_replace_page_cache()
    (the page may or may not have already been charged); and I've removed a
    comment and call to mem_cgroup_uncharge_cache_page(), which in fact is
    always a no-op while PageSwapCache.

    Also removed optimization of an unlikely path in shmem_getpage_gfp(),
    now that we need to check PageSwapCache more carefully (a racing caller
    might already have made the copy). And at one point shmem_unuse_inode()
    needs to use the hitherto private page_swapcount(), to guard against
    racing with inode eviction.

    It would make sense to extend shmem_should_replace_page(), to cover
    cpuset and NUMA mempolicy restrictions too, but set that aside for now:
    needs a cleanup of shmem mempolicy handling, and more testing, and ought
    to handle swap faults in do_swap_page() as well as shmem.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Alan Cox
    Cc: Stephane Marchesin
    Cc: Andi Kleen
    Cc: Dave Airlie
    Cc: Daniel Vetter
    Cc: Rob Clark
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE type
    pageblock (and some MIGRATE_MOVABLE pages are left in it) waiting until an
    allocation takes ownership of the block may take too long. The type of
    the pageblock remains unchanged so the pageblock cannot be used as a
    migration target during compaction.

    Fix it by:

    * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE], and
    COMPACT_SYNC) and then converting sync field in struct compact_control
    to use it.

    * Adding nr_pageblocks_skipped field to struct compact_control and
    tracking how many destination pageblocks were of MIGRATE_UNMOVABLE type.
    If COMPACT_ASYNC_MOVABLE mode compaction ran fully in
    try_to_compact_pages() (COMPACT_COMPLETE) it implies that there is not a
    suitable page for allocation. In this case then check how if there were
    enough MIGRATE_UNMOVABLE pageblocks to try a second pass in
    COMPACT_ASYNC_UNMOVABLE mode.

    * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC and
    COMPACT_ASYNC_UNMOVABLE compaction modes) and building a count based on
    finding PageBuddy pages, page_count(page) == 0 or PageLRU pages. If all
    pages within the MIGRATE_UNMOVABLE pageblock are in one of those three
    sets change the whole pageblock type to MIGRATE_MOVABLE.

    My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
    131072 standard 4KiB pages in 'Normal' zone) is to:

    - allocate 120000 pages for kernel's usage
    - free every second page (60000 pages) of memory just allocated
    - allocate and use 60000 pages from user space
    - free remaining 60000 pages of kernel memory
    (now we have fragmented memory occupied mostly by user space pages)
    - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

    The results:
    - with compaction disabled I get 11 successful allocations
    - with compaction enabled - 14 successful allocations
    - with this patch I'm able to get all 100 successful allocations

    NOTE: If we can make kswapd aware of order-0 request during compaction, we
    can enhance kswapd with changing mode to COMPACT_ASYNC_FULL
    (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE). Please see the
    following thread:

    http://marc.info/?l=linux-mm&m=133552069417068&w=2

    [minchan@kernel.org: minor cleanups]
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Marek Szyprowski
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Kyungmin Park
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bartlomiej Zolnierkiewicz
     
  • alloc_bootmem_section() derives allocation area constraints from the
    specified sparsemem section. This is a bit specific for a generic memory
    allocator like bootmem, though, so move it over to sparsemem.

    As __alloc_bootmem_node_nopanic() already retries failed allocations with
    relaxed area constraints, the fallback code in sparsemem.c can be removed
    and the code becomes a bit more compact overall.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Pass down the node descriptor instead of the more specific bootmem node
    descriptor down the call stack, like nobootmem does, when there is no good
    reason for the two to be different.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • While the panicking node-specific allocation function tries to satisfy
    node+goal, goal, node, anywhere, the non-panicking function still does
    node+goal, goal, anywhere.

    Make it simpler: define the panicking version in terms of the non-panicking
    one, like the node-agnostic interface, so they always behave the same way
    apart from how to deal with allocation failure.

    Signed-off-by: Johannes Weiner
    Acked-by: Yinghai Lu
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • __alloc_bootmem_node and __alloc_bootmem_low_node documentation claims
    the functions panic on allocation failure. Do it.

    Signed-off-by: Johannes Weiner
    Acked-by: Yinghai Lu
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • While the panicking node-specific allocation function tries to satisfy
    node+goal, goal, node, anywhere, the non-panicking function still does
    node+goal, goal, anywhere.

    Make it simpler: define the panicking version in terms of the
    non-panicking one, like the node-agnostic interface, so they always behave
    the same way apart from how to deal with allocation failure.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Match the nobootmem version of __alloc_bootmem_node. Try to satisfy both
    the node and the goal, then just the goal, then just the node, then
    allocate anywhere before panicking.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Matching the desired goal to the right node is one thing, dropping the
    goal when it can not be satisfied is another. Split this into separate
    functions so that subsequent patches can use the node-finding but drop and
    handle the goal fallback on their own terms.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Callsites need to provide a bootmem_data_t *, make the naming more
    descriptive.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • When bootmem releases an unaligned BITS_PER_LONG pages chunk of memory
    to the page allocator, it checks the bitmap if there are still
    unreserved pages in the chunk (set bits), but also if the offset in the
    chunk indicates BITS_PER_LONG loop iterations already.

    But since the consulted bitmap is only a one-word-excerpt of the full
    per-node bitmap, there can not be more than BITS_PER_LONG bits set in
    it. The additional offset check is unnecessary.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • When bootmem releases an unaligned chunk of memory at the beginning of a
    node to the page allocator, it iterates from that unaligned PFN but
    checks an aligned word of the page bitmap. The checked bits do not
    correspond to the PFNs and, as a result, reserved pages can be freed.

    Properly shift the bitmap word so that the lowest bit corresponds to the
    starting PFN before entering the freeing loop.

    This bug has been around since commit 41546c17418f ("bootmem: clean up
    free_all_bootmem_core") (2.6.27) without known reports.

    Signed-off-by: Gavin Shan
    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • This has always been broken: one version takes an unsigned int and the
    other version takes no arguments. This bug was hidden because one
    version of set_pageblock_order() was a macro which doesn't evaluate its
    argument.

    Simplify it all and remove pageblock_default_order() altogether.

    Reported-by: rajman mekaco
    Cc: Mel Gorman
    Cc: KAMEZAWA Hiroyuki
    Cc: Tejun Heo
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • When transparent_hugepage_enabled() is used outside mm/, such as in
    arch/x86/xx/tlb.c:

    + if (!cpu_has_invlpg || vma->vm_flags & VM_HUGETLB
    + || transparent_hugepage_enabled(vma)) {
    + flush_tlb_mm(vma->vm_mm);

    is_vma_temporary_stack() isn't referenced in huge_mm.h, so it has compile
    errors:

    arch/x86/mm/tlb.c: In function `flush_tlb_range':
    arch/x86/mm/tlb.c:324:4: error: implicit declaration of function `is_vma_temporary_stack' [-Werror=implicit-function-declaration]

    Since is_vma_temporay_stack() is just used in rmap.c and huge_memory.c, it
    is better to move it to huge_mm.h from rmap.h to avoid such errors.

    Signed-off-by: Alex Shi
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Shi
     
  • Compiling page-type.c with a recent compiler produces many warnings,
    mostly related to signed/unsigned comparisons. This patch cleans up most
    of them.

    One remaining warning is about an unused parameter. The file
    doesn't define a __unused macro (or the like) yet. This can be addressed
    later.

    Signed-off-by: Ulrich Drepper
    Acked-by: KOSAKI Motohiro
    Acked-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • Programs using /proc/kpageflags need to know about the various flags. The
    provides them and the comments in the file
    indicate that it is supposed to be used by user-level code. But the file
    is not installed.

    Install the headers and mark the unstable flags as out-of-bounds. The
    page-type tool is also adjusted to not duplicate the definitions

    Signed-off-by: Ulrich Drepper
    Acked-by: KOSAKI Motohiro
    Acked-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • Print physical address info in a style consistent with the %pR style used
    elsewhere in the kernel. For example:

    -Zone PFN ranges:
    +Zone ranges:
    - DMA32 0x00000010 -> 0x00100000
    + DMA32 [mem 0x00010000-0xffffffff]
    - Normal 0x00100000 -> 0x01080000
    + Normal [mem 0x100000000-0x107fffffff]

    Signed-off-by: Bjorn Helgaas
    Cc: Yinghai Lu
    Cc: Konrad Rzeszutek Wilk
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Print swiotlb info in a style consistent with the %pR style used elsewhere
    in the kernel. For example:

    -Placing 64MB software IO TLB between ffff88007a662000 - ffff88007e662000
    -software IO TLB at phys 0x7a662000 - 0x7e662000
    +software IO TLB [mem 0x7a662000-0x7e661fff] (64MB) mapped at [ffff88007a662000-ffff88007e661fff]

    Signed-off-by: Bjorn Helgaas
    Cc: Yinghai Lu
    Cc: Konrad Rzeszutek Wilk
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Print physical address info in a style consistent with the %pR style used
    elsewhere in the kernel. For example:

    -found SMP MP-table at [ffff8800000fce90] fce90
    +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90]
    -initial memory mapped : 0 - 20000000
    +initial memory mapped: [mem 0x00000000-0x1fffffff]
    -Base memory trampoline at [ffff88000009c000] 9c000 size 8192
    +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000]
    -SRAT: Node 0 PXM 0 0-80000000
    +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]

    Signed-off-by: Bjorn Helgaas
    Cc: Yinghai Lu
    Cc: Konrad Rzeszutek Wilk
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Print physical address info in a style consistent with the %pR style used
    elsewhere in the kernel. For example:

    -BIOS-provided physical RAM map:
    +e820: BIOS-provided physical RAM map:
    - BIOS-e820: 0000000000000100 - 000000000009e000 (usable)
    +BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable
    -Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
    +e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
    -reserve RAM buffer: 000000000009e000 - 000000000009ffff
    +e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff]

    Signed-off-by: Bjorn Helgaas
    Cc: Yinghai Lu
    Cc: Konrad Rzeszutek Wilk
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Even if CONFIG_DEBUG_VM=n gcc genereates code for some VM_BUG_ON()

    for example VM_BUG_ON(!PageCompound(page) || !PageHead(page)); in
    do_huge_pmd_wp_page() generates 114 bytes of code.

    But they mostly disappears when I split this VM_BUG_ON into two:

    -VM_BUG_ON(!PageCompound(page) || !PageHead(page));
    +VM_BUG_ON(!PageCompound(page));
    +VM_BUG_ON(!PageHead(page));

    weird... but anyway after this patch code disappears completely.

    add/remove: 0/0 grow/shrink: 7/97 up/down: 135/-1784 (-1649)

    Signed-off-by: Konstantin Khlebnikov
    Cc: Linus Torvalds
    Cc: Geert Uytterhoeven
    Cc: "H. Peter Anvin"
    Cc: Cong Wang
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Sometimes we want to check some expressions correctness at compile time.
    "(void)(e);" or "if (e);" can be dangerous if the expression has
    side-effects, and gcc sometimes generates a lot of code, even if the
    expression has no effect.

    This patch introduces macro BUILD_BUG_ON_INVALID() for such checks, it
    forces a compilation error if expression is invalid without any extra
    code.

    [Cast to "long" required because sizeof does not work for bit-fields.]

    Signed-off-by: Konstantin Khlebnikov
    Cc: Linus Torvalds
    Cc: Geert Uytterhoeven
    Cc: "H. Peter Anvin"
    Cc: Cong Wang
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Add a Kconfig option to allow people who don't want cross memory attach to
    not have it included in their build.

    Signed-off-by: Chris Yeoh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     
  • The hierarchical versions of per-memcg counters in memory.stat are all
    calculated the same way and are all named total_.

    Documenting the pattern is easier for maintenance than listing each
    counter twice.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: KOSAKI Motohiro
    Acked-by: Ying Han
    Randy Dunlap
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • mm->page_table_lock is hotly contested for page fault tests and isn't
    necessary to do mem_cgroup_uncharge_page() in do_huge_pmd_wp_page().

    Signed-off-by: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: Andrea Arcangeli
    Acked-by: Johannes Weiner
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Andrew pointed out that the is_mlocked_vma() is misnamed. A function
    with name like that would expect bool return and no side-effects.

    Since it is called on the fault path for new page, rename it in this
    patch.

    Signed-off-by: Ying Han
    Reviewed-by: Rik van Riel
    Acked-by: KOSAKI Motohiro
    Acked-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    [akpm@linux-foundation.org: s/mlock_vma_newpage/mlock_vma_newpage/, per Minchan]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han