30 Mar, 2011

1 commit

  • * 'frv' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-frv:
    FRV: Use generic show_interrupts()
    FRV: Convert genirq namespace
    frv: Select GENERIC_HARDIRQS_NO_DEPRECATED
    frv: Convert cpu irq_chip to new functions
    frv: Convert mb93493 irq_chip to new functions
    frv: Convert mb93093 irq_chip to new function
    frv: Convert mb93091 irq_chip to new functions
    frv: Fix typo from __do_IRQ overhaul
    frv: Remove stale irq_chip.end
    FRV: Do some cleanups
    FRV: Missing node arg in alloc_thread_info_node() macro
    NOMMU: implement access_remote_vm
    NOMMU: support SMP dynamic percpu_alloc
    NOMMU: percpu should use is_vmalloc_addr().

    Linus Torvalds
     

29 Mar, 2011

1 commit

  • Recent vm changes brought in a new function which the core procfs code
    utilizes. So implement it for nommu systems too to avoid link failures.

    Signed-off-by: Mike Frysinger
    Signed-off-by: David Howells
    Tested-by: Simon Horman
    Tested-by: Ithamar Adema
    Acked-by: Greg Ungerer

    Mike Frysinger
     

28 Mar, 2011

2 commits

  • per_cpu_ptr_to_phys() uses VMALLOC_START and VMALLOC_END to determine if an
    address is in the vmalloc() region or not. This is incorrect on NOMMU as
    there is no real vmalloc() capability (vmalloc() is emulated by kmalloc()).

    The correct way to do this is to use is_vmalloc_addr(). This encapsulates the
    vmalloc() region test in MMU mode and just returns 0 in NOMMU mode.

    On FRV in NOMMU mode, the percpu compilation fails without this patch:

    mm/percpu.c: In function 'per_cpu_ptr_to_phys':
    mm/percpu.c:1011: error: 'VMALLOC_START' undeclared (first use in this function)
    mm/percpu.c:1011: error: (Each undeclared identifier is reported only once
    mm/percpu.c:1011: error: for each function it appears in.)
    mm/percpu.c:1012: error: 'VMALLOC_END' undeclared (first use in this function)
    mm/percpu.c:1018: warning: control reaches end of non-void function

    Signed-off-by: David Howells

    David Howells
     
  • Fix mm/memory.c incorrect kernel-doc function notation:

    Warning(mm/memory.c:3718): Cannot understand * @access_remote_vm - access another process' address space
    on line 3718 - I thought it was a doc line

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

25 Mar, 2011

8 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    fs: simplify iget & friends
    fs: pull inode->i_lock up out of writeback_single_inode
    fs: rename inode_lock to inode_hash_lock
    fs: move i_wb_list out from under inode_lock
    fs: move i_sb_list out from under inode_lock
    fs: remove inode_lock from iput_final and prune_icache
    fs: Lock the inode LRU list separately
    fs: factor inode disposal
    fs: protect inode->i_state with inode->i_lock
    autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
    autofs4 - remove autofs4_lock
    autofs4 - fix d_manage() return on rcu-walk
    autofs4 - fix autofs4_expire_indirect() traversal
    autofs4 - fix dentry leak in autofs4_expire_direct()
    autofs4 - reinstate last used update on access
    vfs - check non-mountpoint dentry might block in __follow_mount_rcu()

    Linus Torvalds
     
  • Protect the inode writeback list with a new global lock
    inode_wb_list_lock and use it to protect the list manipulations and
    traversals. This lock replaces the inode_lock as the inodes on the
    list can be validity checked while holding the inode->i_lock and
    hence the inode_lock is no longer needed to protect the list.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     
  • Protect inode state transitions and validity checks with the
    inode->i_lock. This enables us to make inode state transitions
    independently of the inode_lock and is the first step to peeling
    away the inode_lock from the code.

    This requires that __iget() is done atomically with i_state checks
    during list traversals so that we don't race with another thread
    marking the inode I_FREEING between the state check and grabbing the
    reference.

    Also remove the unlock_new_inode() memory barrier optimisation
    required to avoid taking the inode_lock when clearing I_NEW.
    Simplify the code by simply taking the inode->i_lock around the
    state change and wakeup. Because the wakeup is no longer tricky,
    remove the wake_up_inode() function and open code the wakeup where
    necessary.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     
  • * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    SLUB: Write to per cpu data when allocating it
    slub: Fix debugobjects with lockless fastpath

    Linus Torvalds
     
  • Commit ddd588b5dd55 ("oom: suppress nodes that are not allowed from
    meminfo on oom kill") moved lib/show_mem.o out of lib/lib.a, which
    resulted in build warnings on all architectures that implement their own
    versions of show_mem():

    lib/lib.a(show_mem.o): In function `show_mem':
    show_mem.c:(.text+0x1f4): multiple definition of `show_mem'
    arch/sparc/mm/built-in.o:(.text+0xd70): first defined here

    The fix is to remove __show_mem() and add its argument to show_mem() in
    all implementations to prevent this breakage.

    Architectures that implement their own show_mem() actually don't do
    anything with the argument yet, but they could be made to filter nodes
    that aren't allowed in the current context in the future just like the
    generic implementation.

    Reported-by: Stephen Rothwell
    Reported-by: James Bottomley
    Suggested-by: Andrew Morton
    Signed-off-by: David Rientjes
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • It turns out that the cmpxchg16b emulation has to access vmalloced
    percpu memory with interrupts disabled. If the memory has never
    been touched before then the fault necessary to establish the
    mapping will not to occur and the kernel will fail on boot.

    Fix that by reusing the CONFIG_PREEMPT code that writes the
    cpu number into a field on every cpu. Writing to the per cpu
    area before causes the mapping to be established before we get
    to a cmpxchg16b emulation.

    Tested-by: Ingo Molnar
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • On Thu, 24 Mar 2011, Ingo Molnar wrote:
    > RIP: 0010:[] [] get_next_timer_interrupt+0x119/0x260

    That's a typical timer crash, but you were unable to debug it with
    debugobjects because commit d3f661d6 broke those.

    Cc: Christoph Lameter
    Tested-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Pekka Enberg

    Thomas Gleixner
     
  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

24 Mar, 2011

28 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    deal with races in /proc/*/{syscall,stack,personality}
    proc: enable writing to /proc/pid/mem
    proc: make check_mem_permission() return an mm_struct on success
    proc: hold cred_guard_mutex in check_mem_permission()
    proc: disable mem_write after exec
    mm: implement access_remote_vm
    mm: factor out main logic of access_process_vm
    mm: use mm_struct to resolve gate vma's in __get_user_pages
    mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm
    mm: arch: make in_gate_area take an mm_struct instead of a task_struct
    mm: arch: make get_gate_vma take an mm_struct instead of a task_struct
    x86: mark associated mm when running a task in 32 bit compatibility mode
    x86: add context tag to mark mm when running a task in 32-bit compatibility mode
    auxv: require the target to be tracable (or yourself)
    close race in /proc/*/environ
    report errors in /proc/*/*map* sanely
    pagemap: close races with suid execve
    make sessionid permissions in /proc/*/task/* match those in /proc/*
    fix leaks in path_lookupat()

    Fix up trivial conflicts in fs/proc/base.c

    Linus Torvalds
     
  • …p_elfcorehdr and saved_max_pfn

    The Xen PV drivers in a crashed HVM guest can not connect to the dom0
    backend drivers because both frontend and backend drivers are still in
    connected state. To run the connection reset function only in case of a
    crashdump, the is_kdump_kernel() function needs to be available for the PV
    driver modules.

    Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
    kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
    usable for modules.

    Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup()
    to early_param(). It adds an address range check from x86 also on ia64
    and powerpc.

    [akpm@linux-foundation.org: additional #includes]
    [akpm@linux-foundation.org: remove elfcorehdr_addr export]
    [akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
    Signed-off-by: Olaf Hering <olaf@aepfle.de>
    Cc: Russell King <rmk@arm.linux.org.uk>
    Cc: "Luck, Tony" <tony.luck@intel.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Paul Mundt <lethal@linux-sh.org>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Olaf Hering
     
  • When a memcg is oom and current has already received a SIGKILL, then give
    it access to memory reserves with a higher scheduling priority so that it
    may quickly exit and free its memory.

    This is identical to the global oom killer and is done even before
    checking for panic_on_oom: a pending SIGKILL here while panic_on_oom is
    selected is guaranteed to have come from userspace; the thread only needs
    access to memory reserves to exit and thus we don't unnecessarily panic
    the machine until the kernel has no last resort to free memory.

    Signed-off-by: David Rientjes
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • fs/fuse/dev.c::fuse_try_move_page() does

    (1) remove a page by ->steal()
    (2) re-add the page to page cache
    (3) link the page to LRU if it was not on LRU at (1)

    This implies the page is _on_ LRU when it's added to radix-tree. So, the
    page is added to memory cgroup while it's on LRU. because LRU is lazy and
    no one flushs it.

    This is the same behavior as SwapCache and needs special care as
    - remove page from LRU before overwrite pc->mem_cgroup.
    - add page to LRU after overwrite pc->mem_cgroup.

    And we need to taking care of pagevec.

    If PageLRU(page) is set before we add PCG_USED bit, the page will not be
    added to memcg's LRU (in short period). So, regardlress of PageLRU(page)
    value before commit_charge(), we need to check PageLRU(page) after
    commit_charge().

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30432

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Johannes Weiner
    Acked-by: Daisuke Nishimura
    Cc: Miklos Szeredi
    Cc: Balbir Singh
    Reported-by: Daniel Poelzleithner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • KAMEZAWA Hiroyuki noted that free_pages_cgroup doesn't have to check for
    PageReserved because we never store the array on reserved pages (neither
    alloc_pages_exact nor vmalloc use those pages).

    So we can replace the check by a BUG_ON.

    Signed-off-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Currently we are allocating a single page_cgroup array per memory section
    (stored in mem_section->base) when CONFIG_SPARSEMEM is selected. This is
    correct but memory inefficient solution because the allocated memory
    (unless we fall back to vmalloc) is not kmalloc friendly:

    - 32b - 16384 entries (20B per entry) fit into 327680B so the
    524288B slab cache is used
    - 32b with PAE - 131072 entries with 2621440B fit into 4194304B
    - 64b - 32768 entries (40B per entry) fit into 2097152 cache

    This is ~37% wasted space per memory section and it sumps up for the whole
    memory. On a x86_64 machine it is something like 6MB per 1GB of RAM.

    We can reduce the internal fragmentation by using alloc_pages_exact which
    allocates PAGE_SIZE aligned blocks so we will get down to
    Cc: Dave Hansen
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • mm/memcontrol.c: In function 'mem_cgroup_force_empty':
    mm/memcontrol.c:2280: warning: 'flags' may be used uninitialized in this function

    It's a false positive.

    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The statistic counters are in units of pages, there is no reason to make
    them 64-bit wide on 32-bit machines.

    Make them native words. Since they are signed, this leaves 31 bit on
    32-bit machines, which can represent roughly 8TB assuming a page size of
    4k.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Johannes Weiner
    Signed-off-by: Greg Thelen
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • For increasing and decreasing per-cpu cgroup usage counters it makes sense
    to use signed types, as single per-cpu values might go negative during
    updates. But this is not the case for only-ever-increasing event
    counters.

    All the counters have been signed 64-bit so far, which was enough to count
    events even with the sign bit wasted.

    This patch:
    - divides s64 counters into signed usage counters and unsigned
    monotonically increasing event counters.
    - converts unsigned event counters into 'unsigned long' rather than
    'u64'. This matches the type used by the /proc/vmstat event counters.

    The next patch narrows the signed usage counters type (on 32-bit CPUs,
    that is).

    Signed-off-by: Johannes Weiner
    Signed-off-by: Greg Thelen
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • There is no clear pattern when we pass a page count and when we pass a
    byte count that is a multiple of PAGE_SIZE.

    We never charge or uncharge subpage quantities, so convert it all to page
    counts.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • We never uncharge subpage quantities.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • We never keep subpage quantities in the per-cpu stock.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • We have two charge cancelling functions: one takes a page count, the other
    a page size. The second one just divides the parameter by PAGE_SIZE and
    then calls the first one. This is trivial, no need for an extra function.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The reclaim_param_lock is only taken around single reads and writes to
    integer variables and is thus superfluous. Drop it.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Reviewed-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • page_cgroup_zoneinfo() will never return NULL for a charged page, remove
    the check for it in mem_cgroup_get_reclaim_stat_from_page().

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Reviewed-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • In struct page_cgroup, we have a full word for flags but only a few are
    reserved. Use the remaining upper bits to encode, depending on
    configuration, the node or the section, to enable page_cgroup-to-page
    lookups without a direct pointer.

    This saves a full word for every page in a system with memory cgroups
    enabled.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The per-cgroup LRU lists string up 'struct page_cgroup's. To get from
    those structures to the page they represent, a lookup is required.
    Currently, the lookup is done through a direct pointer in struct
    page_cgroup, so a lot of functions down the callchain do this lookup by
    themselves instead of receiving the page pointer from their callers.

    The next patch removes this pointer, however, and the lookup is no longer
    that straight-forward. In preparation for that, this patch only leaves
    the non-optional lookups when coming directly from the LRU list and passes
    the page down the stack.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • It is one logical function, no need to have it split up.

    Also, get rid of some checks from the inner function that ensured the
    sanity of the outer function.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Instead of passing a whole struct page_cgroup to this function, let it
    take only what it really needs from it: the struct mem_cgroup and the
    page.

    This has the advantage that reading pc->mem_cgroup is now done at the same
    place where the ordering rules for this pointer are enforced and
    explained.

    It is also in preparation for removing the pc->page backpointer.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This patch series removes the direct page pointer from struct page_cgroup,
    which saves 20% of per-page memcg memory overhead (Fedora and Ubuntu
    enable memcg per default, openSUSE apparently too).

    The node id or section number is encoded in the remaining free bits of
    pc->flags which allows calculating the corresponding page without the
    extra pointer.

    I ran, what I think is, a worst-case microbenchmark that just cats a large
    sparse file to /dev/null, because it means that walking the LRU list on
    behalf of per-cgroup reclaim and looking up pages from page_cgroups is
    happening constantly and at a high rate. But it made no measurable
    difference. A profile reported a 0.11% share of the new
    lookup_cgroup_page() function in this benchmark.

    This patch:

    All callsites check PCG_USED before passing pc->mem_cgroup, so the latter
    is never NULL.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Add checks at allocating or freeing a page whether the page is used (iow,
    charged) from the view point of memcg.

    This check may be useful in debugging a problem and we did similar checks
    before the commit 52d4b9ac(memcg: allocate all page_cgroup at boot).

    This patch adds some overheads at allocating or freeing memory, so it's
    enabled only when CONFIG_DEBUG_VM is enabled.

    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • The page_cgroup array is set up before even fork is initialized. I
    seriously doubt that this code executes before the array is alloc'd.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • No callsite ever passes a NULL pointer for a struct mem_cgroup * to the
    committing function. There is no need to check for it.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • These definitions have been unused since '4b3bde4 memcg: remove the
    overhead associated with the root cgroup'.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Since transparent huge pages, checking whether memory cgroups are below
    their limits is no longer enough, but the actual amount of chargeable
    space is important.

    To not have more than one limit-checking interface, replace
    memory_cgroup_check_under_limit() and memory_cgroup_check_margin() with a
    single memory_cgroup_margin() that returns the chargeable space and leaves
    the comparison to the callsite.

    Soft limits are now checked the other way round, by using the already
    existing function that returns the amount by which soft limits are
    exceeded: res_counter_soft_limit_excess().

    Also remove all the corresponding functions on the res_counter side that
    are now no longer used.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Soft limit reclaim continues until the usage is below the current soft
    limit, but the documented semantics are actually that soft limit reclaim
    will push usage back until the soft limits are met again.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Remove initialization of vaiable in caller of memory cgroup function.
    Actually, it's return value of memcg function but it's initialized in
    caller.

    Some memory cgroup uses following style to bring the result of start
    function to the end function for avoiding races.

    mem_cgroup_start_A(&(*ptr))
    /* Something very complicated can happen here. */
    mem_cgroup_end_A(*ptr)

    In some calls, *ptr should be initialized to NULL be caller. But it's
    ugly. This patch fixes that *ptr is initialized by _start function.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Johannes Weiner
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Provide an alternative to access_process_vm that allows the caller to obtain a
    reference to the supplied mm_struct.

    Signed-off-by: Stephen Wilson
    Signed-off-by: Al Viro

    Stephen Wilson