18 Dec, 2009

4 commits

  • …/rusty/linux-2.6-for-linus

    * 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    cpumask: rename tsk_cpumask to tsk_cpus_allowed
    cpumask: don't recommend set_cpus_allowed hack in Documentation/cpu-hotplug.txt
    cpumask: avoid dereferencing struct cpumask
    cpumask: convert drivers/idle/i7300_idle.c to cpumask_var_t
    cpumask: use modern cpumask style in drivers/scsi/fcoe/fcoe.c
    cpumask: avoid deprecated function in mm/slab.c
    cpumask: use cpu_online in kernel/perf_event.c

    Linus Torvalds
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    Keys: KEYCTL_SESSION_TO_PARENT needs TIF_NOTIFY_RESUME architecture support
    NOMMU: Optimise away the {dac_,}mmap_min_addr tests
    security/min_addr.c: make init_mmap_min_addr() static
    keys: PTR_ERR return of wrong pointer in keyctl_get_security()

    Linus Torvalds
     
  • * 'kmemleak' of git://linux-arm.org/linux-2.6:
    kmemleak: fix kconfig for crc32 build error
    kmemleak: Reduce the false positives by checking for modified objects
    kmemleak: Show the age of an unreferenced object
    kmemleak: Release the object lock before calling put_object()
    kmemleak: Scan the _ftrace_events section in modules
    kmemleak: Simplify the kmemleak_scan_area() function prototype
    kmemleak: Do not use off-slab management with SLAB_NOLEAKTRACE

    Linus Torvalds
     
  • I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O
    is unpluged to improve throughput on especially RAID environment.

    The normal case is, if page N become uptodate at time T(N), then T(N)
    Acked-by: Wu Fengguang
    Cc: Jens Axboe
    Cc: KOSAKI Motohiro
    Tested-by: Ronald
    Cc: Bart Van Assche
    Cc: Vladislav Bolkhovitin
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     

17 Dec, 2009

11 commits

  • These days we use cpumask_empty() which takes a pointer.

    Signed-off-by: Rusty Russell
    Acked-by: Christoph Lameter

    Rusty Russell
     
  • Replacing
    error = 0;
    if (error)
    op
    with nothing is not quite an equivalent transformation ;-)

    Signed-off-by: Al Viro

    Al Viro
     
  • In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be
    skipped by the compiler. We do this as the minimum mmap address doesn't make
    any sense in NOMMU mode.

    mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Signed-off-by: James Morris

    David Howells
     
  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (34 commits)
    HWPOISON: Remove stray phrase in a comment
    HWPOISON: Try to allocate migration page on the same node
    HWPOISON: Don't do early filtering if filter is disabled
    HWPOISON: Add a madvise() injector for soft page offlining
    HWPOISON: Add soft page offline support
    HWPOISON: Undefine short-hand macros after use to avoid namespace conflict
    HWPOISON: Use new shake_page in memory_failure
    HWPOISON: Use correct name for MADV_HWPOISON in documentation
    HWPOISON: mention HWPoison in Kconfig entry
    HWPOISON: Use get_user_page_fast in hwpoison madvise
    HWPOISON: add an interface to switch off/on all the page filters
    HWPOISON: add memory cgroup filter
    memcg: add accessor to mem_cgroup.css
    memcg: rename and export try_get_mem_cgroup_from_page()
    HWPOISON: add page flags filter
    mm: export stable page flags
    HWPOISON: limit hwpoison injector to known page types
    HWPOISON: add fs/device filters
    HWPOISON: return 0 to indicate success reliably
    HWPOISON: make semantics of IGNORED/DELAYED clear
    ...

    Linus Torvalds
     
  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
    direct I/O fallback sync simplification
    ocfs: stop using do_sync_mapping_range
    cleanup blockdev_direct_IO locking
    make generic_acl slightly more generic
    sanitize xattr handler prototypes
    libfs: move EXPORT_SYMBOL for d_alloc_name
    vfs: force reval of target when following LAST_BIND symlinks (try #7)
    ima: limit imbalance msg
    Untangling ima mess, part 3: kill dead code in ima
    Untangling ima mess, part 2: deal with counters
    Untangling ima mess, part 1: alloc_file()
    O_TRUNC open shouldn't fail after file truncation
    ima: call ima_inode_free ima_inode_free
    IMA: clean up the IMA counts updating code
    ima: only insert at inode creation time
    ima: valid return code from ima_inode_alloc
    fs: move get_empty_filp() deffinition to internal.h
    Sanitize exec_permission_lite()
    Kill cached_lookup() and real_lookup()
    Kill path_lookup_open()
    ...

    Trivial conflicts in fs/direct-io.c

    Linus Torvalds
     
  • In the case of direct I/O falling back to buffered I/O we sync data
    twice currently: once at the end of generic_file_buffered_write using
    filemap_write_and_wait_range and once a little later in
    __generic_file_aio_write using do_sync_mapping_range with all flags set.

    The wait before write of the do_sync_mapping_range call does not make
    any sense, so just keep the filemap_write_and_wait_range call and move
    it to the right spot.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Now that we cache the ACL pointers in the generic inode all the generic_acl
    cruft can go away and generic_acl.c can directly implement xattr handlers
    dealing with the full Posix ACL semantics for in-memory filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Add a flags argument to struct xattr_handler and pass it to all xattr
    handler methods. This allows using the same methods for multiple
    handlers, e.g. for the ACL methods which perform exactly the same action
    for the access and default ACLs, just using a different underlying
    attribute. With a little more groundwork it'll also allow sharing the
    methods for the regular user/trusted/secure handlers in extN, ocfs2 and
    jffs2 like it's already done for xfs in this patch.

    Also change the inode argument to the handlers to a dentry to allow
    using the handlers mechnism for filesystems that require it later,
    e.g. cifs.

    [with GFS2 bits updated by Steven Whitehouse ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • There are 2 groups of alloc_file() callers:
    * ones that are followed by ima_counts_get
    * ones giving non-regular files
    So let's pull that ima_counts_get() into alloc_file();
    it's a no-op in case of non-regular files.

    Signed-off-by: Al Viro

    Al Viro
     
  • ... and have the caller grab both mnt and dentry; kill
    leak in infiniband, while we are at it.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     

16 Dec, 2009

25 commits

  • Variable `progress' isn't used in mem_cgroup_resize_limit() any more.
    Remove it.

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Bob Liu
    Cc: Daisuke Nishimura
    Reviewed-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • memcg_tasklist was introduced at commit 7f4d454d(memcg: avoid deadlock
    caused by race between oom and cpuset_attach) instead of cgroup_mutex to
    fix a deadlock problem. The cgroup_mutex, which was removed by the
    commit, in mem_cgroup_out_of_memory() was originally introduced at commit
    c7ba5c9e (Memory controller: OOM handling).

    IIUC, the intention of this cgroup_mutex was to prevent task move during
    select_bad_process() so that situations like below can be avoided.

    Assume cgroup "foo" has exceeded its limit and is about to trigger oom.
    1. Process A, which has been in cgroup "baa" and uses large memory, is just
    moved to cgroup "foo". Process A can be the candidates for being killed.
    2. Process B, which has been in cgroup "foo" and uses large memory, is just
    moved from cgroup "foo". Process B can be excluded from the candidates for
    being killed.

    But these race window exists anyway even if we hold a lock, because
    __mem_cgroup_try_charge() decides wether it should trigger oom or not
    outside of the lock. So the original cgroup_mutex in
    mem_cgroup_out_of_memory and thus current memcg_tasklist has no use. And
    IMHO, those races are not so critical for users.

    This patch removes it and make codes simpler.

    Signed-off-by: Daisuke Nishimura
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • task_in_mem_cgroup(), which is called by select_bad_process() to check
    whether a task can be a candidate for being oom-killed from memcg's limit,
    checks "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs
    to).

    But this check return true(it's false positive) when:

    /aa use_hierarchy == 0 /aa/00 use_hierarchy == 1
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • mem_cgroup_move_parent() calls try_charge first and cancel_charge on
    failure. IMHO, charge/uncharge(especially charge) is high cost operation,
    so we should avoid it as far as possible.

    This patch tries to delay try_charge in mem_cgroup_move_parent() by
    re-ordering checks it does.

    And this patch renames mem_cgroup_move_account() to
    __mem_cgroup_move_account(), changes the return value of
    __mem_cgroup_move_account() from int to void, and adds a new
    wrapper(mem_cgroup_move_account()), which checks whether a @pc is valid
    for moving account and calls __mem_cgroup_move_account().

    This patch removes the last caller of trylock_page_cgroup(), so removes
    its definition too.

    Signed-off-by: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • There are some places calling both res_counter_uncharge() and css_put() to
    cancel the charge and the refcnt we have got by mem_cgroup_tyr_charge().

    This patch introduces mem_cgroup_cancel_charge() and call it in those
    places.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Daisuke Nishimura
    Reviewed-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • In global VM, FILE_MAPPED is used but memcg uses MAPPED_FILE. This makes
    grep difficult. Replace memcg's MAPPED_FILE with FILE_MAPPED

    And in global VM, mapped shared memory is accounted into FILE_MAPPED.
    But memcg doesn't. fix it.
    Note:
    page_is_file_cache() just checks SwapBacked or not.
    So, we need to check PageAnon.

    Cc: Balbir Singh
    Reviewed-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This is a patch for coalescing access to res_counter at charging by percpu
    caching. At charge, memcg charges 64pages and remember it in percpu
    cache. Because it's cache, drain/flush if necessary.

    This version uses public percpu area.
    2 benefits for using public percpu area.
    1. Sum of stocked charge in the system is limited to # of cpus
    not to the number of memcg. This shows better synchonization.
    2. drain code for flush/cpuhotplug is very easy (and quick)

    The most important point of this patch is that we never touch res_counter
    in fast path. The res_counter is system-wide shared counter which is modified
    very frequently. We shouldn't touch it as far as we can for avoiding
    false sharing.

    On x86-64 8cpu server, I tested overheads of memcg at page fault by
    running a program which does map/fault/unmap in a loop. Running
    a task per a cpu by taskset and see sum of the number of page faults
    in 60secs.

    [without memcg config]
    40156968 page-faults # 0.085 M/sec ( +- 0.046% )
    27.67 cache-miss/faults

    [root cgroup]
    36659599 page-faults # 0.077 M/sec ( +- 0.247% )
    31.58 cache miss/faults

    [in a child cgroup]
    18444157 page-faults # 0.039 M/sec ( +- 0.133% )
    69.96 cache miss/faults

    [ + coalescing uncharge patch]
    27133719 page-faults # 0.057 M/sec ( +- 0.155% )
    47.16 cache miss/faults

    [ + coalescing uncharge patch + this patch ]
    34224709 page-faults # 0.072 M/sec ( +- 0.173% )
    34.69 cache miss/faults

    Changelog (since Oct/2):
    - updated comments
    - replaced get_cpu_var() with __get_cpu_var() if possible.
    - removed mutex for system-wide drain. adds a counter instead of it.
    - removed CONFIG_HOTPLUG_CPU

    Changelog (old):
    - rebased onto the latest mmotm
    - moved charge size check before __GFP_WAIT check for avoiding unnecesary
    - added asynchronous flush routine.
    - fixed bugs pointed out by Nishimura-san.

    [akpm@linux-foundation.org: tweak comments]
    [nishimura@mxp.nes.nec.co.jp: don't do INIT_WORK() repeatedly against the same work_struct]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In massive parallel enviroment, res_counter can be a performance
    bottleneck. One strong techinque to reduce lock contention is reducing
    calls by coalescing some amount of calls into one.

    Considering charge/uncharge chatacteristic,
    - charge is done one by one via demand-paging.
    - uncharge is done by
    - in chunk at munmap, truncate, exit, execve...
    - one by one via vmscan/paging.

    It seems we have a chance to coalesce uncharges for improving scalability
    at unmap/truncation.

    This patch is a for coalescing uncharge. For avoiding scattering memcg's
    structure to functions under /mm, this patch adds memcg batch uncharge
    information to the task. A reason for per-task batching is for making use
    of caller's context information. We do batched uncharge (deleyed
    uncharge) when truncation/unmap occurs but do direct uncharge when
    uncharge is called by memory reclaim (vmscan.c).

    The degree of coalescing depends on callers
    - at invalidate/trucate... pagevec size
    - at unmap ....ZAP_BLOCK_SIZE
    (memory itself will be freed in this degree.)
    Then, we'll not coalescing too much.

    On x86-64 8cpu server, I tested overheads of memcg at page fault by
    running a program which does map/fault/unmap in a loop. Running
    a task per a cpu by taskset and see sum of the number of page faults
    in 60secs.

    [without memcg config]
    40156968 page-faults # 0.085 M/sec ( +- 0.046% )
    27.67 cache-miss/faults
    [root cgroup]
    36659599 page-faults # 0.077 M/sec ( +- 0.247% )
    31.58 miss/faults
    [in a child cgroup]
    18444157 page-faults # 0.039 M/sec ( +- 0.133% )
    69.96 miss/faults
    [child with this patch]
    27133719 page-faults # 0.057 M/sec ( +- 0.155% )
    47.16 miss/faults

    We can see some amounts of improvement.
    (root cgroup doesn't affected by this patch)
    Another patch for "charge" will follow this and above will be improved more.

    Changelog(since 2009/10/02):
    - renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes)
    - some clean up and commentary/description updates.
    - added initialize code to copy_process(). (possible bug fix)

    Changelog(old):
    - fixed !CONFIG_MEM_CGROUP case.
    - rebased onto the latest mmotm + softlimit fix patches.
    - unified patch for callers
    - added commetns.
    - make ->do_batch as bool.
    - removed css_get() at el. We don't need it.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • A memory cgroup has a memory.memsw.usage_in_bytes file. It shows the sum
    of the usage of pages and swapents in the cgroup. Presently the root
    cgroup's memsw.usage_in_bytes shows the wrong value - the number of
    swapents are not added.

    So take MEM_CGROUP_STAT_SWAPOUT into account.

    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Fix node-oriented allocation handling in oom-kill.c I myself think of this
    as a bugfix not as an ehnancement.

    In these days, things are changed as
    - alloc_pages() eats nodemask as its arguments, __alloc_pages_nodemask().
    - mempolicy don't maintain its own private zonelists.
    (And cpuset doesn't use nodemask for __alloc_pages_nodemask())

    So, current oom-killer's check function is wrong.

    This patch does
    - check nodemask, if nodemask && nodemask doesn't cover all
    node_states[N_HIGH_MEMORY], this is CONSTRAINT_MEMORY_POLICY.
    - Scan all zonelist under nodemask, if it hits cpuset's wall
    this faiulre is from cpuset.
    And
    - modifies the caller of out_of_memory not to call oom if __GFP_THISNODE.
    This doesn't change "current" behavior. If callers use __GFP_THISNODE
    it should handle "page allocation failure" by itself.

    - handle __GFP_NOFAIL+__GFP_THISNODE path.
    This is something like a FIXME but this gfpmask is not used now.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: David Rientjes
    Cc: Daisuke Nishimura
    Cc: KOSAKI Motohiro
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In a typical oom analysis scenario, we frequently want to know whether the
    killed process has a memory leak or not at the first step. This patch
    adds vsz and rss information to the oom log to help this analysis. To
    save time for the debugging.

    example:
    ===================================================================
    rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
    Pid: 1308, comm: rsyslogd Not tainted 2.6.32-rc6 #24
    Call Trace:
    [] ?_spin_unlock+0x2b/0x40
    [] oom_kill_process+0xbe/0x2b0

    (snip)

    492283 pages non-shared
    Out of memory: kill process 2341 (memhog) score 527276 or a child
    Killed process 2341 (memhog) vsz:1054552kB, anon-rss:970588kB, file-rss:4kB
    ===========================================================================
    ^
    |
    here

    [rientjes@google.com: fix race, add pid & comm to message]
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Better to have complete sentences.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Process based injection is much easier to handle for test programs,
    who can first bring a page into a specific state and then test.
    So add a new MADV_SOFT_OFFLINE to soft offline a page, similar
    to the existing hard offline injector.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • This is a simpler, gentler variant of memory_failure() for soft page
    offlining controlled from user space. It doesn't kill anything, just
    tries to invalidate and if that doesn't work migrate the
    page away.

    This is useful for predictive failure analysis, where a page has
    a high rate of corrected errors, but hasn't gone bad yet. Instead
    it can be offlined early and avoided.

    The offlining is controlled from sysfs, including a new generic
    entry point for hard page offlining for symmetry too.

    We use the page isolate facility to prevent re-allocation
    race. Normally this is only used by memory hotplug. To avoid
    races with memory allocation I am using lock_system_sleep().
    This avoids the situation where memory hotplug is about
    to isolate a page range and then hwpoison undoes that work.
    This is a big hammer currently, but the simplest solution
    currently.

    When the page is not free or LRU we try to free pages
    from slab and other caches. The slab freeing is currently
    quite dumb and does not try to focus on the specific slab
    cache which might own the page. This could be potentially
    improved later.

    Thanks to Fengguang Wu and Haicheng Li for some fixes.

    [Added fix from Andrew Morton to adapt to new migrate_pages prototype]
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • shake_page handles more types of page caches than
    the much simpler lru_add_drain_all:

    - slab (quite inefficiently for now)
    - any other caches with a shrinker callback
    - per cpu page allocator pages
    - per CPU LRU

    Use this call to try to turn pages into free or LRU pages.
    Then handle the case of the page becoming free after drain everything.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • The previous version didn't take the mmap_sem before calling gup(),
    which is racy.

    Use get_user_pages_fast() instead which doesn't need any locks.
    This is also faster of course, but then it doesn't really matter
    because this is just a testing path.

    Based on report from Nick Piggin.
    Cc: npiggin@suse.de

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • In some use cases, user doesn't need extra filtering. E.g. user program
    can inject errors through madvise syscall to its own pages, however it
    might not know what the page state exactly is or which inode the page
    belongs to.

    So introduce an one-off interface "corrupt-filter-enable".

    Echo 0 to switch off page filters, and echo 1 to switch on the filters.
    [AK: changed default to 0]

    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andi Kleen

    Haicheng Li
     
  • The hwpoison test suite need to inject hwpoison to a collection of
    selected task pages, and must not touch pages not owned by them and
    thus kill important system processes such as init. (But it's OK to
    mis-hwpoison free/unowned pages as well as shared clean pages.
    Mis-hwpoison of shared dirty pages will kill all tasks, so the test
    suite will target all or non of such tasks in the first place.)

    The memory cgroup serves this purpose well. We can put the target
    processes under the control of a memory cgroup, and tell the hwpoison
    injection code to only kill pages associated with some active memory
    cgroup.

    The prerequisite for doing hwpoison stress tests with mem_cgroup is,
    the mem_cgroup code tracks task pages _accurately_ (unless page is
    locked). Which we believe is/should be true.

    The benefits are simplification of hwpoison injector code. Also the
    mem_cgroup code will automatically be tested by hwpoison test cases.

    The alternative interfaces pin-pfn/unpin-pfn can also delegate the
    (process and page flags) filtering functions reliably to user space.
    However prototype implementation shows that this scheme adds more
    complexity than we wanted.

    Example test case:

    mkdir /cgroup/hwpoison

    usemem -m 100 -s 1000 &
    echo `jobs -p` > /cgroup/hwpoison/tasks

    memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ')
    echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg

    page-types -p `pidof init` --hwpoison # shall do nothing
    page-types -p `pidof usemem` --hwpoison # poison its pages

    [AK: Fix documentation]
    [Add fix for problem noticed by Li Zefan ;
    dentry in the css could be NULL]

    CC: KOSAKI Motohiro
    CC: Hugh Dickins
    CC: Daisuke Nishimura
    CC: Balbir Singh
    CC: KAMEZAWA Hiroyuki
    CC: Li Zefan
    CC: Paul Menage
    CC: Nick Piggin
    CC: Andi Kleen
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • So that an outside user can free the reference count grabbed by
    try_get_mem_cgroup_from_page().

    CC: KOSAKI Motohiro
    CC: Hugh Dickins
    CC: Daisuke Nishimura
    CC: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andi Kleen

    Wu Fengguang
     
  • So that the hwpoison injector can get mem_cgroup for arbitrary page
    and thus know whether it is owned by some mem_cgroup task(s).

    [AK: Merged with latest git tree]

    CC: KOSAKI Motohiro
    CC: Hugh Dickins
    CC: Daisuke Nishimura
    CC: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andi Kleen

    Wu Fengguang
     
  • When specified, only poison pages if ((page_flags & mask) == value).

    - corrupt-filter-flags-mask
    - corrupt-filter-flags-value

    This allows stress testing of many kinds of pages.

    Strictly speaking, the buddy pages requires taking zone lock, to avoid
    setting PG_hwpoison on a "was buddy but now allocated to someone" page.
    However we can just do nothing because we set PG_locked in the beginning,
    this prevents the page allocator from allocating it to someone. (It will
    BUG() on the unexpected PG_locked, which is fine for hwpoison testing.)

    [AK: Add select PROC_PAGE_MONITOR to satisfy dependency]

    CC: Nick Piggin
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andi Kleen

    Wu Fengguang