24 Mar, 2011

40 commits

  • In struct page_cgroup, we have a full word for flags but only a few are
    reserved. Use the remaining upper bits to encode, depending on
    configuration, the node or the section, to enable page_cgroup-to-page
    lookups without a direct pointer.

    This saves a full word for every page in a system with memory cgroups
    enabled.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The per-cgroup LRU lists string up 'struct page_cgroup's. To get from
    those structures to the page they represent, a lookup is required.
    Currently, the lookup is done through a direct pointer in struct
    page_cgroup, so a lot of functions down the callchain do this lookup by
    themselves instead of receiving the page pointer from their callers.

    The next patch removes this pointer, however, and the lookup is no longer
    that straight-forward. In preparation for that, this patch only leaves
    the non-optional lookups when coming directly from the LRU list and passes
    the page down the stack.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • It is one logical function, no need to have it split up.

    Also, get rid of some checks from the inner function that ensured the
    sanity of the outer function.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Instead of passing a whole struct page_cgroup to this function, let it
    take only what it really needs from it: the struct mem_cgroup and the
    page.

    This has the advantage that reading pc->mem_cgroup is now done at the same
    place where the ordering rules for this pointer are enforced and
    explained.

    It is also in preparation for removing the pc->page backpointer.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This patch series removes the direct page pointer from struct page_cgroup,
    which saves 20% of per-page memcg memory overhead (Fedora and Ubuntu
    enable memcg per default, openSUSE apparently too).

    The node id or section number is encoded in the remaining free bits of
    pc->flags which allows calculating the corresponding page without the
    extra pointer.

    I ran, what I think is, a worst-case microbenchmark that just cats a large
    sparse file to /dev/null, because it means that walking the LRU list on
    behalf of per-cgroup reclaim and looking up pages from page_cgroups is
    happening constantly and at a high rate. But it made no measurable
    difference. A profile reported a 0.11% share of the new
    lookup_cgroup_page() function in this benchmark.

    This patch:

    All callsites check PCG_USED before passing pc->mem_cgroup, so the latter
    is never NULL.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Add checks at allocating or freeing a page whether the page is used (iow,
    charged) from the view point of memcg.

    This check may be useful in debugging a problem and we did similar checks
    before the commit 52d4b9ac(memcg: allocate all page_cgroup at boot).

    This patch adds some overheads at allocating or freeing memory, so it's
    enabled only when CONFIG_DEBUG_VM is enabled.

    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • The page_cgroup array is set up before even fork is initialized. I
    seriously doubt that this code executes before the array is alloc'd.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • No callsite ever passes a NULL pointer for a struct mem_cgroup * to the
    committing function. There is no need to check for it.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • These definitions have been unused since '4b3bde4 memcg: remove the
    overhead associated with the root cgroup'.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Since transparent huge pages, checking whether memory cgroups are below
    their limits is no longer enough, but the actual amount of chargeable
    space is important.

    To not have more than one limit-checking interface, replace
    memory_cgroup_check_under_limit() and memory_cgroup_check_margin() with a
    single memory_cgroup_margin() that returns the chargeable space and leaves
    the comparison to the callsite.

    Soft limits are now checked the other way round, by using the already
    existing function that returns the amount by which soft limits are
    exceeded: res_counter_soft_limit_excess().

    Also remove all the corresponding functions on the res_counter side that
    are now no longer used.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Soft limit reclaim continues until the usage is below the current soft
    limit, but the documented semantics are actually that soft limit reclaim
    will push usage back until the soft limits are met again.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Remove initialization of vaiable in caller of memory cgroup function.
    Actually, it's return value of memcg function but it's initialized in
    caller.

    Some memory cgroup uses following style to bring the result of start
    function to the end function for avoiding races.

    mem_cgroup_start_A(&(*ptr))
    /* Something very complicated can happen here. */
    mem_cgroup_end_A(*ptr)

    In some calls, *ptr should be initialized to NULL be caller. But it's
    ugly. This patch fixes that *ptr is initialized by _start function.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Johannes Weiner
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • res_counter_read_u64 reads u64 value without lock. It's dangerous in a
    32bit environment. Add locking.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • minix bit operations are only used by minix filesystem and useless by
    other modules. Because byte order of inode and block bitmaps is different
    on each architecture like below:

    m68k:
    big-endian 16bit indexed bitmaps

    h8300, microblaze, s390, sparc, m68knommu:
    big-endian 32 or 64bit indexed bitmaps

    m32r, mips, sh, xtensa:
    big-endian 32 or 64bit indexed bitmaps for big-endian mode
    little-endian bitmaps for little-endian mode

    Others:
    little-endian bitmaps

    In order to move minix bit operations from asm/bitops.h to architecture
    independent code in minix filesystem, this provides two config options.

    CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k.
    CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use
    native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu,
    m32r, mips, sh, xtensa). The architectures which always use little-endian
    bitmaps do not select these options.

    Finally, we can remove minix bit operations from asm/bitops.h for all
    architectures.

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Acked-by: Greg Ungerer
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Andreas Schwab
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Michal Simek
    Cc: "David S. Miller"
    Cc: Hirokazu Takata
    Acked-by: Ralf Baechle
    Acked-by: Paul Mundt
    Cc: Chris Zankel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for moving minix bit operations from asm/bitops.h to
    architecture independent code in minix filesystem, this removes inline asm
    from minix_find_first_zero_bit() for m68k.

    Signed-off-by: Akinobu Mita
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As the result of conversions, there are no users of ext2 non-atomic bit
    operations except for ext2 filesystem itself. Now we can put them into
    architecture independent code in ext2 filesystem, and remove from
    asm/bitops.h for all architectures.

    Signed-off-by: Akinobu Mita
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Cc: Alasdair Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: NeilBrown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: "Theodore Ts'o"
    Cc: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Jan Kara
    Cc: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Cc: Andy Grover
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Cc: Avi Kivity
    Cc: Marcelo Tosatti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Introduce little-endian bit operations to the big-endian architectures
    which do not have native little-endian bit operations and the
    little-endian architectures. (alpha, avr32, blackfin, cris, frv, h8300,
    ia64, m32r, mips, mn10300, parisc, sh, sparc, tile, x86, xtensa)

    These architectures can just include generic implementation
    (asm-generic/bitops/le.h).

    Signed-off-by: Akinobu Mita
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: Grant Grundler
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Hirokazu Takata
    Cc: "David S. Miller"
    Cc: Chris Zankel
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: Hans-Christian Egtvedt
    Acked-by: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Introduce little-endian bit operations by renaming native ext2 bit
    operations. The ext2 bit operations are kept as wrapper macros using
    little-endian bit operations to maintain bisectability until the
    conversions are finished.

    Signed-off-by: Akinobu Mita
    Acked-by: Greg Ungerer
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Andreas Schwab
    Cc: Arnd Bergmann
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This introduces CONFIG_GENERIC_FIND_BIT_LE to tell whether to use generic
    implementation of find_*_bit_le() in lib/find_next_bit.c or not.

    For now we select CONFIG_GENERIC_FIND_BIT_LE for all architectures which
    enable CONFIG_GENERIC_FIND_NEXT_BIT.

    But m68knommu wants to define own faster find_next_zero_bit_le() and
    continues using generic find_next_{,zero_}bit().
    (CONFIG_GENERIC_FIND_NEXT_BIT and !CONFIG_GENERIC_FIND_BIT_LE)

    Signed-off-by: Akinobu Mita
    Cc: Greg Ungerer
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Introduce little-endian bit operations by renaming native ext2 bit
    operations and changing find_*_bit_le() to take a "void *". The ext2 bit
    operations are kept as wrapper macros using little-endian bit operations
    to maintain bisectability until the conversions are finished.

    Signed-off-by: Akinobu Mita
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Andreas Schwab
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Introduce little-endian bit operations by renaming native ext2 bit
    operations. The ext2 and minix bit operations are kept as wrapper macros
    using little-endian bit operations to maintain bisectability until the
    conversions are finished.

    Signed-off-by: Akinobu Mita
    Cc: Russell King
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Introduce little-endian bit operations by renaming native ext2 bit
    operations. The ext2 bit operations are kept as wrapper macros using
    little-endian bit operations to maintain bisectability until the
    conversions are finished.

    Signed-off-by: Akinobu Mita
    Cc: Arnd Bergmann
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Introduce little-endian bit operations by renaming existing powerpc native
    little-endian bit operations and changing them to take any pointer types.

    Signed-off-by: Akinobu Mita
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This makes the little-endian bitops take any pointer types by changing the
    prototypes and adding casts in the preprocessor macros.

    That would seem to at least make all the filesystem code happier, and they
    can continue to do just something like

    #define ext2_set_bit __test_and_set_bit_le

    (or whatever the exact sequence ends up being).

    Signed-off-by: Akinobu Mita
    Cc: Arnd Bergmann
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: Grant Grundler
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Hirokazu Takata
    Cc: "David S. Miller"
    Cc: Chris Zankel
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Hans-Christian Egtvedt
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • As a preparation for providing little-endian bitops for all architectures,
    This renames generic implementation of little-endian bitops. (remove
    "generic_" prefix and postfix "_le")

    s/generic_find_next_le_bit/find_next_bit_le/
    s/generic_find_next_zero_le_bit/find_next_zero_bit_le/
    s/generic_find_first_zero_le_bit/find_first_zero_bit_le/
    s/generic___test_and_set_le_bit/__test_and_set_bit_le/
    s/generic___test_and_clear_le_bit/__test_and_clear_bit_le/
    s/generic_test_le_bit/test_bit_le/
    s/generic___set_le_bit/__set_bit_le/
    s/generic___clear_le_bit/__clear_bit_le/
    s/generic_test_and_set_le_bit/test_and_set_bit_le/
    s/generic_test_and_clear_le_bit/test_and_clear_bit_le/

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Acked-by: Hans-Christian Egtvedt
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Andreas Schwab
    Cc: Greg Ungerer
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch series introduces little-endian bit operations in asm/bitops.h
    for all architectures and converts all ext2 non-atomic and minix bit
    operations to use little-endian bit operations. It enables us to remove
    ext2 non-atomic and minix bit operations from asm/bitops.h. The reason
    they should be removed from asm/bitops.h is as follows:

    For ext2 non-atomic bit operations, they are used for little-endian byte
    order bitmap access by some filesystems and modules. But using ext2_*()
    functions on a module other than ext2 filesystem makes some feel strange.

    For minix bit operations, they are only used by minix filesystem and are
    useless by other modules. Because byte order of inode and block bitmap is

    This patch:

    In order to make the forthcoming changes smaller, this merges macro
    definisions in asm-generic/bitops/le.h for big-endian and little-endian as
    much as possible.

    This also removes unused BITOP_WORD macro.

    Signed-off-by: Akinobu Mita
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • asm-generic/bitops/le.h is only intended to be included directly from
    asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h
    which implements generic ext2 or minix bit operations.

    This stops including asm-generic/bitops/le.h directly and use ext2
    non-atomic bit operations instead.

    It seems odd to use ext2_*_bit() on rds, but it will replaced with
    __{set,clear,test}_bit_le() after introducing little endian bit operations
    for all architectures. This indirect step is necessary to maintain
    bisectability for some architectures which have their own little-endian
    bit operations.

    Signed-off-by: Akinobu Mita
    Cc: Andy Grover
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • asm-generic/bitops/le.h is only intended to be included directly from
    asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h
    which implements generic ext2 or minix bit operations.

    This stops including asm-generic/bitops/le.h directly and use ext2
    non-atomic bit operations instead.

    It seems odd to use ext2_set_bit() on kvm, but it will replaced with
    __set_bit_le() after introducing little endian bit operations for all
    architectures. This indirect step is necessary to maintain bisectability
    for some architectures which have their own little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Cc: Avi Kivity
    Cc: Marcelo Tosatti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita