24 Mar, 2011

18 commits

  • This set of patches eliminates RapidIO dependency on PowerPC architecture
    and makes it available to other architectures (x86 and MIPS). It also
    enables support of new platform independent RapidIO controllers such as
    PCI-to-SRIO and PCI Express-to-SRIO.

    This patch:

    Extend number of mport callback functions to eliminate direct linking of
    architecture specific mport operations.

    Signed-off-by: Alexandre Bounine
    Cc: Kumar Gala
    Cc: Matt Porter
    Cc: Li Yang
    Cc: Thomas Moll
    Cc: Micha Nelissen
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • 1. namelen is declared "unsigned short" which hints for "maybe space savings".
    Indeed in 2.4 struct proc_dir_entry looked like:

    struct proc_dir_entry {
    unsigned short low_ino;
    unsigned short namelen;

    Now, low_ino is "unsigned int", all savings were gone for a long time.
    "struct proc_dir_entry" is not that countless to worry about it's size,
    anyway.

    2. converting from unsigned short to int/unsigned int can only create
    problems, we better play it safe.

    Space is not really conserved, because of natural alignment for the next
    field. sizeof(struct proc_dir_entry) remains the same.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • We never uncharge subpage quantities.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • In struct page_cgroup, we have a full word for flags but only a few are
    reserved. Use the remaining upper bits to encode, depending on
    configuration, the node or the section, to enable page_cgroup-to-page
    lookups without a direct pointer.

    This saves a full word for every page in a system with memory cgroups
    enabled.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • It is one logical function, no need to have it split up.

    Also, get rid of some checks from the inner function that ensured the
    sanity of the outer function.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Instead of passing a whole struct page_cgroup to this function, let it
    take only what it really needs from it: the struct mem_cgroup and the
    page.

    This has the advantage that reading pc->mem_cgroup is now done at the same
    place where the ordering rules for this pointer are enforced and
    explained.

    It is also in preparation for removing the pc->page backpointer.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Add checks at allocating or freeing a page whether the page is used (iow,
    charged) from the view point of memcg.

    This check may be useful in debugging a problem and we did similar checks
    before the commit 52d4b9ac(memcg: allocate all page_cgroup at boot).

    This patch adds some overheads at allocating or freeing memory, so it's
    enabled only when CONFIG_DEBUG_VM is enabled.

    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • Since transparent huge pages, checking whether memory cgroups are below
    their limits is no longer enough, but the actual amount of chargeable
    space is important.

    To not have more than one limit-checking interface, replace
    memory_cgroup_check_under_limit() and memory_cgroup_check_margin() with a
    single memory_cgroup_margin() that returns the chargeable space and leaves
    the comparison to the callsite.

    Soft limits are now checked the other way round, by using the already
    existing function that returns the amount by which soft limits are
    exceeded: res_counter_soft_limit_excess().

    Also remove all the corresponding functions on the res_counter side that
    are now no longer used.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Soft limit reclaim continues until the usage is below the current soft
    limit, but the documented semantics are actually that soft limit reclaim
    will push usage back until the soft limits are met again.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • minix bit operations are only used by minix filesystem and useless by
    other modules. Because byte order of inode and block bitmaps is different
    on each architecture like below:

    m68k:
    big-endian 16bit indexed bitmaps

    h8300, microblaze, s390, sparc, m68knommu:
    big-endian 32 or 64bit indexed bitmaps

    m32r, mips, sh, xtensa:
    big-endian 32 or 64bit indexed bitmaps for big-endian mode
    little-endian bitmaps for little-endian mode

    Others:
    little-endian bitmaps

    In order to move minix bit operations from asm/bitops.h to architecture
    independent code in minix filesystem, this provides two config options.

    CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k.
    CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use
    native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu,
    m32r, mips, sh, xtensa). The architectures which always use little-endian
    bitmaps do not select these options.

    Finally, we can remove minix bit operations from asm/bitops.h for all
    architectures.

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Acked-by: Greg Ungerer
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Andreas Schwab
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Michal Simek
    Cc: "David S. Miller"
    Cc: Hirokazu Takata
    Acked-by: Ralf Baechle
    Acked-by: Paul Mundt
    Cc: Chris Zankel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As the result of conversions, there are no users of ext2 non-atomic bit
    operations except for ext2 filesystem itself. Now we can put them into
    architecture independent code in ext2 filesystem, and remove from
    asm/bitops.h for all architectures.

    Signed-off-by: Akinobu Mita
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Jan Kara
    Cc: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Introduce little-endian bit operations to the big-endian architectures
    which do not have native little-endian bit operations and the
    little-endian architectures. (alpha, avr32, blackfin, cris, frv, h8300,
    ia64, m32r, mips, mn10300, parisc, sh, sparc, tile, x86, xtensa)

    These architectures can just include generic implementation
    (asm-generic/bitops/le.h).

    Signed-off-by: Akinobu Mita
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: Grant Grundler
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Hirokazu Takata
    Cc: "David S. Miller"
    Cc: Chris Zankel
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: Hans-Christian Egtvedt
    Acked-by: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This makes the little-endian bitops take any pointer types by changing the
    prototypes and adding casts in the preprocessor macros.

    That would seem to at least make all the filesystem code happier, and they
    can continue to do just something like

    #define ext2_set_bit __test_and_set_bit_le

    (or whatever the exact sequence ends up being).

    Signed-off-by: Akinobu Mita
    Cc: Arnd Bergmann
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: Grant Grundler
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Hirokazu Takata
    Cc: "David S. Miller"
    Cc: Chris Zankel
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Hans-Christian Egtvedt
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for providing little-endian bitops for all architectures,
    This renames generic implementation of little-endian bitops. (remove
    "generic_" prefix and postfix "_le")

    s/generic_find_next_le_bit/find_next_bit_le/
    s/generic_find_next_zero_le_bit/find_next_zero_bit_le/
    s/generic_find_first_zero_le_bit/find_first_zero_bit_le/
    s/generic___test_and_set_le_bit/__test_and_set_bit_le/
    s/generic___test_and_clear_le_bit/__test_and_clear_bit_le/
    s/generic_test_le_bit/test_bit_le/
    s/generic___set_le_bit/__set_bit_le/
    s/generic___clear_le_bit/__clear_bit_le/
    s/generic_test_and_set_le_bit/test_and_set_bit_le/
    s/generic_test_and_clear_le_bit/test_and_clear_bit_le/

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Acked-by: Hans-Christian Egtvedt
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Andreas Schwab
    Cc: Greg Ungerer
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch series introduces little-endian bit operations in asm/bitops.h
    for all architectures and converts all ext2 non-atomic and minix bit
    operations to use little-endian bit operations. It enables us to remove
    ext2 non-atomic and minix bit operations from asm/bitops.h. The reason
    they should be removed from asm/bitops.h is as follows:

    For ext2 non-atomic bit operations, they are used for little-endian byte
    order bitmap access by some filesystems and modules. But using ext2_*()
    functions on a module other than ext2 filesystem makes some feel strange.

    For minix bit operations, they are only used by minix filesystem and are
    useless by other modules. Because byte order of inode and block bitmap is

    This patch:

    In order to make the forthcoming changes smaller, this merges macro
    definisions in asm-generic/bitops/le.h for big-endian and little-endian as
    much as possible.

    This also removes unused BITOP_WORD macro.

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

23 Mar, 2011

22 commits

  • Commit 34db18a054c6 ("smp: move smp setup functions to kernel/smp.c")
    causes this build error on s390 because of a missing init.h include:

    CC arch/s390/kernel/asm-offsets.s
    In file included from /home2/heicarst/linux-2.6/arch/s390/include/asm/spinlock.h:14:0,
    from include/linux/spinlock.h:87,
    from include/linux/seqlock.h:29,
    from include/linux/time.h:8,
    from include/linux/timex.h:56,
    from include/linux/sched.h:57,
    from arch/s390/kernel/asm-offsets.c:10:
    include/linux/smp.h:117:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'setup_nr_cpu_ids'
    include/linux/smp.h:118:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'smp_init'

    Fix it by adding the include statement.

    Signed-off-by: Heiko Carstens
    Acked-by: WANG Cong
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (66 commits)
    avr32: at32ap700x: fix typo in DMA master configuration
    dmaengine/dmatest: Pass timeout via module params
    dma: let IMX_DMA depend on IMX_HAVE_DMA_V1 instead of an explicit list of SoCs
    fsldma: make halt behave nicely on all supported controllers
    fsldma: reduce locking during descriptor cleanup
    fsldma: support async_tx dependencies and automatic unmapping
    fsldma: fix controller lockups
    fsldma: minor codingstyle and consistency fixes
    fsldma: improve link descriptor debugging
    fsldma: use channel name in printk output
    fsldma: move related helper functions near each other
    dmatest: fix automatic buffer unmap type
    drivers, pch_dma: Fix warning when CONFIG_PM=n.
    dmaengine/dw_dmac fix: use readl & writel instead of __raw_readl & __raw_writel
    avr32: at32ap700x: Specify DMA Flow Controller, Src and Dst msize
    dw_dmac: Setting Default Burst length for transfers as 16.
    dw_dmac: Allow src/dst msize & flow controller to be configured at runtime
    dw_dmac: Changing type of src_master and dest_master to u8.
    dw_dmac: Pass Channel Priority from platform_data
    dw_dmac: Pass Channel Allocation Order from platform_data
    ...

    Linus Torvalds
     
  • Instead of always creating a huge (268K) deflate_workspace with the
    maximum compression parameters (windowBits=15, memLevel=8), allow the
    caller to obtain a smaller workspace by specifying smaller parameter
    values.

    For example, when capturing oops and panic reports to a medium with
    limited capacity, such as NVRAM, compression may be the only way to
    capture the whole report. In this case, a small workspace (24K works
    fine) is a win, whether you allocate the workspace when you need it (i.e.,
    during an oops or panic) or at boot time.

    I've verified that this patch works with all accepted values of windowBits
    (positive and negative), memLevel, and compression level.

    Signed-off-by: Jim Keniston
    Cc: Herbert Xu
    Cc: David Miller
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Keniston
     
  • Add brackets around typecasted argument in crc32() macro.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Analog Devices' SigmaStudio can produce firmware blobs for devices with
    these DSPs embedded (like some audio codecs). Allow these device drivers
    to easily parse and load them.

    Signed-off-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • 1. simple_strto*() do not contain overflow checks and crufty,
    libc way to indicate failure.
    2. strict_strto*() also do not have overflow checks but the name and
    comments pretend they do.
    3. Both families have only "long long" and "long" variants,
    but users want strtou8()
    4. Both "simple" and "strict" prefixes are wrong:
    Simple doesn't exactly say what's so simple, strict should not exist
    because conversion should be strict by default.

    The solution is to use "k" prefix and add convertors for more types.
    Enter
    kstrtoull()
    kstrtoll()
    kstrtoul()
    kstrtol()
    kstrtouint()
    kstrtoint()

    kstrtou64()
    kstrtos64()
    kstrtou32()
    kstrtos32()
    kstrtou16()
    kstrtos16()
    kstrtou8()
    kstrtos8()

    Include runtime testsuite (somewhat incomplete) as well.

    strict_strto*() become deprecated, stubbed to kstrto*() and
    eventually will be removed altogether.

    Use kstrto*() in code today!

    Note: on some archs _kstrtoul() and _kstrtol() are left in tree, even if
    they'll be unused at runtime. This is temporarily solution,
    because I don't want to hardcode list of archs where these
    functions aren't needed. Current solution with sizeof() and
    __alignof__ at least always works.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
    setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
    CONFIG_SMP.

    Signed-off-by: WANG Cong
    Cc: Rakib Mullick
    Cc: David Howells
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Arnd Bergmann
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • PTR_RET() can be used if you have an error-pointer and are only interested
    in the eventual error value, but not the pointer. Yields the usual 0 for
    no error, -ESOMETHING otherwise.

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • Re-ordering struct block_inode to remove 8 bytes of padding on 64 bit
    builds, which also shrinks bdev_inode by 8 bytes (776 -> 768) allowing it
    to fit into one fewer cache lines.

    Signed-off-by: Richard Kennedy
    Cc: Al Viro
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Kennedy
     
  • Unify identical gcc3.x and gcc4.x macros.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • All architectures can use the common dma_addr_t typedef now. We can
    remove the arch specific dma_addr_t.

    Signed-off-by: FUJITA Tomonori
    Acked-by: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Ivan Kokshaysky
    Cc: Richard Henderson
    Cc: Matt Turner
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     
  • Add a new __GFP_OTHER_NODE flag to tell the low level numa statistics in
    zone_statistics() that an allocation is on behalf of another thread. This
    way the local and remote counters can be still correct, even when
    background daemons like khugepaged are changing memory mappings.

    This only affects the accounting, but I think it's worth doing that right
    to avoid confusing users.

    I first tried to just pass down the right node, but this required a lot of
    changes to pass down this parameter and at least one addition of a 10th
    argument to a 9 argument function. Using the flag is a lot less
    intrusive.

    Open: should be also used for migration?

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Andi Kleen
    Cc: Andrea Arcangeli
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • When reclaiming for order-0 pages, kswapd requires that all zones be
    balanced. Each cycle through balance_pgdat() does background ageing on
    all zones if necessary and applies equal pressure on the inactive zone
    unless a lot of pages are free already.

    A "lot of free pages" is defined as a "balance gap" above the high
    watermark which is currently 7*high_watermark. Historically this was
    reasonable as min_free_kbytes was small. However, on systems using huge
    pages, it is recommended that min_free_kbytes is higher and it is tuned
    with hugeadm --set-recommended-min_free_kbytes. With the introduction of
    transparent huge page support, this recommended value is also applied. On
    X86-64 with 4G of memory, min_free_kbytes becomes 67584 so one would
    expect around 68M of memory to be free. The Normal zone is approximately
    35000 pages so under even normal memory pressure such as copying a large
    file, it gets exhausted quickly. As it is getting exhausted, kswapd
    applies pressure equally to all zones, including the DMA32 zone. DMA32 is
    approximately 700,000 pages with a high watermark of around 23,000 pages.
    In this situation, kswapd will reclaim around (23000*8 where 8 is the high
    watermark + balance gap of 7 * high watermark) pages or 718M of pages
    before the zone is ignored. What the user sees is that free memory far
    higher than it should be.

    To avoid an excessive number of pages being reclaimed from the larger
    zones, explicitely defines the "balance gap" to be either 1% of the zone
    or the low watermark for the zone, whichever is smaller. While kswapd
    will check all zones to apply pressure, it'll ignore zones that meets the
    (high_wmark + balance_gap) watermark.

    To test this, 80G were copied from a partition and the amount of memory
    being used was recorded. A comparison of a patch and unpatched kernel can
    be seen at
    http://www.csn.ul.ie/~mel/postings/minfree-20110222/memory-usage-hydra.ps
    and shows that kswapd is not reclaiming as much memory with the patch
    applied.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Shaohua Li
    Cc: "Chen, Tim C"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will
    unconditionally split any transparent huge pages it runs in to. In
    practice, that means that anyone doing a

    cat /proc/$pid/smaps

    will unconditionally break down every huge page in the process and depend
    on khugepaged to re-collapse it later. This is fairly suboptimal.

    This patch changes that behavior. It teaches each ->pmd_entry handler
    (there are five) that they must break down the THPs themselves. Also, the
    _generic_ code will never break down a THP unless a ->pte_entry handler is
    actually set.

    This means that the ->pmd_entry handlers can now choose to deal with THPs
    without breaking them down.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Hansen
    Acked-by: Mel Gorman
    Acked-by: David Rientjes
    Reviewed-by: Eric B Munson
    Tested-by: Eric B Munson
    Cc: Michael J Wolf
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • The rotate_reclaimable_page function moves just written out pages, which
    the VM wanted to reclaim, to the end of the inactive list. That way the
    VM will find those pages first next time it needs to free memory.

    This patch applies the rule in memcg. It can help to prevent unnecessary
    working page eviction of memcg.

    Signed-off-by: Minchan Kim
    Acked-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Reviewed-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Acked-by: Johannes Weiner
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Recently, there are reported problem about thrashing.
    (http://marc.info/?l=rsync&m=128885034930933&w=2) It happens by backup
    workloads(ex, nightly rsync). That's because the workload makes just
    use-once pages and touches pages twice. It promotes the page into active
    list so that it results in working set page eviction.

    Some app developer want to support POSIX_FADV_NOREUSE. But other OSes
    don't support it, either.
    (http://marc.info/?l=linux-mm&m=128928979512086&w=2)

    By other approach, app developers use POSIX_FADV_DONTNEED. But it has a
    problem. If kernel meets page is writing during invalidate_mapping_pages,
    it can't work. It makes for application programmer to use it since they
    always have to sync data before calling fadivse(..POSIX_FADV_DONTNEED) to
    make sure the pages could be discardable. At last, they can't use
    deferred write of kernel so that they could see performance loss.
    (http://insights.oetiker.ch/linux/fadvise.html)

    In fact, invalidation is very big hint to reclaimer. It means we don't
    use the page any more. So let's move the writing page into inactive
    list's head if we can't truncate it right now.

    Why I move page to head of lru on this patch, Dirty/Writeback page would
    be flushed sooner or later. It can prevent writeout of pageout which is
    less effective than flusher's writeout.

    Originally, I reused lru_demote of Peter with some change so added his
    Signed-off-by.

    Signed-off-by: Minchan Kim
    Reported-by: Ben Gamari
    Signed-off-by: Peter Zijlstra
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Reviewed-by: KOSAKI Motohiro
    Cc: Wu Fengguang
    Acked-by: Johannes Weiner
    Cc: Nick Piggin
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Reorder mm_struct to remove 16 bytes of alignment padding on 64 bit
    builds. On my config this shrinks mm_struct by enough to fit in one
    fewer cache lines and allows more objects per slab in mm_struct
    kmem_cache under SLUB.

    slabinfo before patch :-
    Sizes (bytes) Slabs
    --------------------------------
    Object : 848 Total : 9
    SlabObj: 896 Full : 2
    SlabSiz: 16384 Partial: 5
    Loss : 48 CpuSlab: 2
    Align : 64 Objects: 18

    slabinfo after :-
    Sizes (bytes) Slabs
    --------------------------------
    Object : 832 Total : 7
    SlabObj: 832 Full : 2
    SlabSiz: 16384 Partial: 3
    Loss : 0 CpuSlab: 2
    Align : 64 Objects: 19

    Signed-off-by: Richard Kennedy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Kennedy
     
  • TestSetPageLocked() isn't being used anywhere. Also, using it would
    likely be an error, since the proper interface trylock_page() provides
    stronger ordering guarantees.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This patch changes the anon_vma refcount to be 0 when the object is free.
    It does this by adding 1 ref to being in use in the anon_vma structure
    (iow. the anon_vma->head list is not empty).

    This allows a simpler release scheme without having to check both the
    refcount and the list as well as avoids taking a ref for each entry on the
    list.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Hugh Dickins
    Acked-by: Mel Gorman
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • We need the anon_vma refcount unconditionally to simplify the anon_vma
    lifetime rules.

    Signed-off-by: Peter Zijlstra
    Acked-by: Mel Gorman
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Hugh Dickins
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • The normal code pattern used in the kernel is: get/put.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Hugh Dickins
    Reviewed-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Now we renamed remove_from_page_cache with delete_from_page_cache. As
    consistency of __remove_from_swap_cache and remove_from_swap_cache, we
    change internal page cache handling function name, too.

    Signed-off-by: Minchan Kim
    Cc: Christoph Hellwig
    Acked-by: Hugh Dickins
    Acked-by: Mel Gorman
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Johannes Weiner
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim