22 Aug, 2012

1 commit

  • Occasionally an isolated BUG_ON(mm->nr_ptes) gets reported, indicating
    that not all the page tables allocated could be found and freed when
    exit_mmap() tore down the user address space.

    There's usually nothing we can say about it, beyond that it's probably a
    sign of some bad memory or memory corruption; though it might still
    indicate a bug in vma or page table management (and did recently reveal a
    race in THP, fixed a few months ago).

    But one overdue change we can make is from BUG_ON to WARN_ON.

    It's fairly likely that the system will crash shortly afterwards in some
    other way (for example, the BUG_ON(page_mapped(page)) in
    __delete_from_page_cache(), once an inode mapped into the lost page tables
    gets evicted); but might tell us more before that.

    Change the BUG_ON(page_mapped) to WARN_ON too? Later perhaps: I'm less
    eager, since that one has several times led to fixes.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

01 Aug, 2012

2 commits

  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now.
    But we can also account for total_vm in the vm_stat_account() which makes
    the code tidy.

    Even for mprotect_fixup(), we can get the right result in the end.

    Signed-off-by: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     

30 Jul, 2012

2 commits

  • Remove insert_vm_struct()->uprobe_mmap(). It is not needed, nobody
    except arch/ia64/kernel/perfmon.c uses insert_vm_struct(vma)
    with vma->vm_file != NULL.

    And it is wrong. Again, get_user_pages() can not succeed before
    vma_link(vma) makes is visible to find_vma(). And even if this
    worked, we must not insert the new bp before this mapping is
    visible to vma_prio_tree_foreach() for uprobe_unregister().

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju
    Cc: Anton Arapov
    Cc: Srikar Dronamraju
    Link: http://lkml.kernel.org/r/20120729182238.GA20349@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Remove copy_vma()->uprobe_mmap(new_vma), it is absolutely wrong.

    This new_vma was just initialized to represent the new unmapped
    area, [vm_start, vm_end) was returned by get_unmapped_area() in
    the caller.

    This means that uprobe_mmap()->get_user_pages() will fail for
    sure, simply because find_vma() can never succeed. And I
    verified that sys_mremap()->mremap_to() indeed always fails with
    the wrong ENOMEM code if [addr, addr+old_len] is probed.

    And why this uprobe_mmap() was added? I believe the intent was
    wrong. Note that the caller is going to do move_page_tables(),
    all registered uprobes are already faulted in, we only change
    the virtual addresses.

    NOTE: However, somehow we need to close the race with
    uprobe_register() which relies on map_info->vaddr. This needs
    another fix I'll try to do later. Probably we need uprobe_mmap()
    in move_vma() but we can not do this right now, this can confuse
    uprobes_state.counter (which I still hope we are going to kill).

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju
    Cc: Anton Arapov
    Cc: Srikar Dronamraju
    Link: http://lkml.kernel.org/r/20120729182236.GA20342@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

02 Jun, 2012

1 commit

  • Pull vfs changes from Al Viro.
    "A lot of misc stuff. The obvious groups:
    * Miklos' atomic_open series; kills the damn abuse of
    ->d_revalidate() by NFS, which was the major stumbling block for
    all work in that area.
    * ripping security_file_mmap() and dealing with deadlocks in the
    area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
    general.
    * ->encode_fh() switched to saner API; insane fake dentry in
    mm/cleancache.c gone.
    * assorted annotations in fs (endianness, __user)
    * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
    * ->update_time() work from Josef.
    * other bits and pieces all over the place.

    Normally it would've been in two or three pull requests, but
    signal.git stuff had eaten a lot of time during this cycle ;-/"

    Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
    'truncate_range' inode method was removed by the VM changes, the VFS
    update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
    to sparse fix added twice, with other changes nearby).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
    nfs: don't open in ->d_revalidate
    vfs: retry last component if opening stale dentry
    vfs: nameidata_to_filp(): don't throw away file on error
    vfs: nameidata_to_filp(): inline __dentry_open()
    vfs: do_dentry_open(): don't put filp
    vfs: split __dentry_open()
    vfs: do_last() common post lookup
    vfs: do_last(): add audit_inode before open
    vfs: do_last(): only return EISDIR for O_CREAT
    vfs: do_last(): check LOOKUP_DIRECTORY
    vfs: do_last(): make ENOENT exit RCU safe
    vfs: make follow_link check RCU safe
    vfs: do_last(): use inode variable
    vfs: do_last(): inline walk_component()
    vfs: do_last(): make exit RCU safe
    vfs: split do_lookup()
    Btrfs: move over to use ->update_time
    fs: introduce inode operation ->update_time
    reiserfs: get rid of resierfs_sync_super
    reiserfs: mark the superblock as dirty a bit later
    ...

    Linus Torvalds
     

01 Jun, 2012

7 commits


31 May, 2012

1 commit


30 May, 2012

1 commit

  • The "if (mm)" check is not required in find_vma, as the kernel code
    calls find_vma only when it is absolutely sure that the mm_struct arg to
    it is non-NULL.

    Remove the if(mm) check and adding the a WARN_ONCE(!mm) for now. This
    will serve the purpose of mandating that the execution
    context(user-mode/kernel-mode) be known before find_vma is called. Also
    fixed 2 checkpatch.pl errors in the declaration of the rb_node and
    vma_tmp local variables.

    I was browsing through the internet and read a discussion at
    https://lkml.org/lkml/2012/3/27/342 which discusses removal of the
    validation check within find_vma. Since no-one responded, I decided to
    send this patch with Andrew's suggestions.

    [akpm@linux-foundation.org: add remove-me comment]
    Signed-off-by: Rajman Mekaco
    Cc: Kautuk Consul
    Cc: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rajman Mekaco
     

25 May, 2012

1 commit

  • Pull user-space probe instrumentation from Ingo Molnar:
    "The uprobes code originates from SystemTap and has been used for years
    in Fedora and RHEL kernels. This version is much rewritten, reviews
    from PeterZ, Oleg and myself shaped the end result.

    This tree includes uprobes support in 'perf probe' - but SystemTap
    (and other tools) can take advantage of user probe points as well.

    Sample usage of uprobes via perf, for example to profile malloc()
    calls without modifying user-space binaries.

    First boot a new kernel with CONFIG_UPROBE_EVENT=y enabled.

    If you don't know which function you want to probe you can pick one
    from 'perf top' or can get a list all functions that can be probed
    within libc (binaries can be specified as well):

    $ perf probe -F -x /lib/libc.so.6

    To probe libc's malloc():

    $ perf probe -x /lib64/libc.so.6 malloc
    Added new event:
    probe_libc:malloc (on 0x7eac0)

    You can now use it in all perf tools, such as:

    perf record -e probe_libc:malloc -aR sleep 1

    Make use of it to create a call graph (as the flat profile is going to
    look very boring):

    $ perf record -e probe_libc:malloc -gR make
    [ perf record: Woken up 173 times to write data ]
    [ perf record: Captured and wrote 44.190 MB perf.data (~1930712

    $ perf report | less

    32.03% git libc-2.15.so [.] malloc
    |
    --- malloc

    29.49% cc1 libc-2.15.so [.] malloc
    |
    --- malloc
    |
    |--0.95%-- 0x208eb1000000000
    |
    |--0.63%-- htab_traverse_noresize

    11.04% as libc-2.15.so [.] malloc
    |
    --- malloc
    |

    7.15% ld libc-2.15.so [.] malloc
    |
    --- malloc
    |

    5.07% sh libc-2.15.so [.] malloc
    |
    --- malloc
    |
    4.99% python-config libc-2.15.so [.] malloc
    |
    --- malloc
    |
    4.54% make libc-2.15.so [.] malloc
    |
    --- malloc
    |
    |--7.34%-- glob
    | |
    | |--93.18%-- 0x41588f
    | |
    | --6.82%-- glob
    | 0x41588f

    ...

    Or:

    $ perf report -g flat | less

    # Overhead Command Shared Object Symbol
    # ........ ............. ............. ..........
    #
    32.03% git libc-2.15.so [.] malloc
    27.19%
    malloc

    29.49% cc1 libc-2.15.so [.] malloc
    24.77%
    malloc

    11.04% as libc-2.15.so [.] malloc
    11.02%
    malloc

    7.15% ld libc-2.15.so [.] malloc
    6.57%
    malloc

    ...

    The core uprobes design is fairly straightforward: uprobes probe
    points register themselves at (inode:offset) addresses of
    libraries/binaries, after which all existing (or new) vmas that map
    that address will have a software breakpoint injected at that address.
    vmas are COW-ed to preserve original content. The probe points are
    kept in an rbtree.

    If user-space executes the probed inode:offset instruction address
    then an event is generated which can be recovered from the regular
    perf event channels and mmap-ed ring-buffer.

    Multiple probes at the same address are supported, they create a
    dynamic callback list of event consumers.

    The basic model is further complicated by the XOL speedup: the
    original instruction that is probed is copied (in an architecture
    specific fashion) and executed out of line when the probe triggers.
    The XOL area is a single vma per process, with a fixed number of
    entries (which limits probe execution parallelism).

    The API: uprobes are installed/removed via
    /sys/kernel/debug/tracing/uprobe_events, the API is integrated to
    align with the kprobes interface as much as possible, but is separate
    to it.

    Injecting a probe point is privileged operation, which can be relaxed
    by setting perf_paranoid to -1.

    You can use multiple probes as well and mix them with kprobes and
    regular PMU events or tracepoints, when instrumenting a task."

    Fix up trivial conflicts in mm/memory.c due to previous cleanup of
    unmap_single_vma().

    * 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    perf probe: Detect probe target when m/x options are absent
    perf probe: Provide perf interface for uprobes
    tracing: Fix kconfig warning due to a typo
    tracing: Provide trace events interface for uprobes
    tracing: Extract out common code for kprobes/uprobes trace events
    tracing: Modify is_delete, is_return from int to bool
    uprobes/core: Decrement uprobe count before the pages are unmapped
    uprobes/core: Make background page replacement logic account for rss_stat counters
    uprobes/core: Optimize probe hits with the help of a counter
    uprobes/core: Allocate XOL slots for uprobes use
    uprobes/core: Handle breakpoint and singlestep exceptions
    uprobes/core: Rename bkpt to swbp
    uprobes/core: Make order of function parameters consistent across functions
    uprobes/core: Make macro names consistent
    uprobes: Update copyright notices
    uprobes/core: Move insn to arch specific structure
    uprobes/core: Remove uprobe_opcode_sz
    uprobes/core: Make instruction tables volatile
    uprobes: Move to kernel/events/
    uprobes/core: Clean up, refactor and improve the code
    ...

    Linus Torvalds
     

14 May, 2012

1 commit


07 May, 2012

2 commits

  • The VM accounting makes no sense at this level, and half of the callers
    didn't ever actually use the end result. The only time we want to
    unaccount the memory is when we actually remove the vma, so do the
    accounting at that point instead.

    This simplifies the interfaces (no need to pass down that silly page
    counter to functions that really don't care), and also makes it much
    more obvious what is actually going on: we do vm_[un]acct_memory() when
    adding or removing the vma, not on random page walking.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • None of the callers want to pass in 'zap_details', and it doesn't even
    make sense for the case of actually unmapping vma's. So remove the
    argument, and clean up the interface.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Apr, 2012

4 commits

  • it's always current->mm

    Signed-off-by: Al Viro

    Al Viro
     
  • This continues the theme started with vm_brk() and vm_munmap():
    vm_mmap() does the same thing as do_mmap(), but additionally does the
    required VM locking.

    This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
    duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have
    to export our internal do_mmap_pgoff() function.

    Some day we hopefully don't have to export do_mmap() either, if all
    modular users can become the simpler vm_mmap() instead. We're actually
    very close to that already, with the notable exception of the (broken)
    use in i810, and a couple of stragglers in binfmt_elf.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Like the vm_brk() function, this is the same as "do_munmap()", except it
    does the VM locking for the caller.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • It does the same thing as "do_brk()", except it handles the VM locking
    too.

    It turns out that all external callers want that anyway, so we can make
    do_brk() static to just mm/mmap.c while at it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Apr, 2012

2 commits

  • Uprobes has a callback (uprobe_munmap()) in the unmap path to
    maintain the uprobes count.

    In the exit path this callback gets called in unlink_file_vma().
    However by the time unlink_file_vma() is called, the pages would
    have been unmapped (in unmap_vmas()) and the task->rss_stat counts
    accounted (in zap_pte_range()).

    If the exiting process has probepoints, uprobe_munmap() checks if
    the breakpoint instruction was around before decrementing the probe
    count.

    This results in a file backed page being reread by uprobe_munmap()
    and hence it does not find the breakpoint.

    This patch fixes this problem by moving the callback to
    unmap_single_vma(). Since unmap_single_vma() may not unmap the
    complete vma, add start and end parameters to uprobe_munmap().

    This bug became apparent courtesy of commit c3f0327f8e9d
    ("mm: add rss counters consistency check").

    Signed-off-by: Srikar Dronamraju
    Cc: Linus Torvalds
    Cc: Ananth N Mavinakayanahalli
    Cc: Jim Keniston
    Cc: Linux-mm
    Cc: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Christoph Hellwig
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Anton Arapov
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120411103527.23245.9835.sendpatchset@srdronam.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srikar Dronamraju
     
  • Merge in latest upstream (and the latest perf development tree),
    to prepare for tooling changes, and also to pick up v3.4 MM
    changes that the uprobes code needs to take care of.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

31 Mar, 2012

1 commit

  • Maintain a per-mm counter: number of uprobes that are inserted
    on this process address space.

    This counter can be used at probe hit time to determine if we
    need a lookup in the uprobes rbtree. Everytime a probe gets
    inserted successfully, the probe count is incremented and
    everytime a probe gets removed, the probe count is decremented.

    The new uprobe_munmap hook ensures the count is correct on a
    unmap or remap of a region. We expect that once a
    uprobe_munmap() is called, the vma goes away. So
    uprobe_unregister() finding a probe to unregister would either
    mean unmap event hasnt occurred yet or a mmap event on the same
    executable file occured after a unmap event.

    Additionally, uprobe_mmap hook now also gets called:

    a. on every executable vma that is COWed at fork.
    b. a vma of interest is newly mapped; breakpoint insertion also
    happens at the required address.

    On process creation, make sure the probes count in the child is
    set correctly.

    Special cases that are taken care include:

    a. mremap
    b. VM_DONTCOPY vmas on fork()
    c. insertion/removal races in the parent during fork().

    Signed-off-by: Srikar Dronamraju
    Cc: Linus Torvalds
    Cc: Ananth N Mavinakayanahalli
    Cc: Jim Keniston
    Cc: Linux-mm
    Cc: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Christoph Hellwig
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Anton Arapov
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120330182646.10018.85805.sendpatchset@srdronam.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srikar Dronamraju
     

23 Mar, 2012

1 commit

  • Merge first batch of patches from Andrew Morton:
    "A few misc things and all the MM queue"

    * emailed from Andrew Morton : (92 commits)
    memcg: avoid THP split in task migration
    thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
    memcg: clean up existing move charge code
    mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
    mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
    mm/memcontrol.c: s/stealed/stolen/
    memcg: fix performance of mem_cgroup_begin_update_page_stat()
    memcg: remove PCG_FILE_MAPPED
    memcg: use new logic for page stat accounting
    memcg: remove PCG_MOVE_LOCK flag from page_cgroup
    memcg: simplify move_account() check
    memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
    memcg: kill dead prev_priority stubs
    memcg: remove PCG_CACHE page_cgroup flag
    memcg: let css_get_next() rely upon rcu_read_lock()
    cgroup: revert ss_id_lock to spinlock
    idr: make idr_get_next() good for rcu_read_lock()
    memcg: remove unnecessary thp check in page stat accounting
    memcg: remove redundant returns
    memcg: enum lru_list lru
    ...

    Linus Torvalds
     

22 Mar, 2012

6 commits

  • The comment above __insert_vm_struct seems to suggest that this function
    is also going to link the VMA with the anon_vma, but this is not true.
    This function only links the VMA to the mm->mm_rb tree and the mm->mmap
    linked list.

    [akpm@linux-foundation.org: improve comment layout and text]
    Signed-off-by: Kautuk Consul
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kautuk Consul
     
  • When calling shmget() with SHM_HUGETLB, shmget aligns the request size to
    PAGE_SIZE, but this is not sufficient.

    Modify hugetlb_file_setup() to align requests to the huge page size, and
    to accept an address argument so that all alignment checks can be
    performed in hugetlb_file_setup(), rather than in its callers. Change
    newseg() and mmap_pgoff() to match the new prototype and eliminate a now
    redundant alignment check.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Steven Truelove
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Truelove
     
  • If the required size is bigger than cached_hole_size it is better to
    search from free_area_cache - it is easier to get a free region,
    specifically for the 64 bit process whose address space is large enough

    Do it just as hugetlb_get_unmapped_area_topdown() in arch/x86/mm/hugetlbpage.c

    Signed-off-by: Xiao Guangrong
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Michal Hocko
    Cc: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: KAMEZAWA Hiroyuki
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiao Guangrong
     
  • In the current code, cached_hole_size is set to the maximum value if the
    unmapped vma is less that free_area_cache so the next search will search
    from the base address.

    Actually, we can keep cached_hole_size so that if the next required size
    is more than cached_hole_size, it can search from free_area_cache.

    Signed-off-by: Xiao Guangrong
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Michal Hocko
    Cc: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: KAMEZAWA Hiroyuki
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiao Guangrong
     
  • Pull munmap/truncate race fixes from Al Viro:
    "Fixes for racy use of unmap_vmas() on truncate-related codepaths"

    * 'vm' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VM: make zap_page_range() callers that act on a single VMA use separate helper
    VM: make unmap_vmas() return void
    VM: don't bother with feeding upper limit to tlb_finish_mmu() in exit_mmap()
    VM: make zap_page_range() return void
    VM: can't go through the inner loop in unmap_vmas() more than once...
    VM: unmap_page_range() can return void

    Linus Torvalds
     
  • Pull security subsystem updates for 3.4 from James Morris:
    "The main addition here is the new Yama security module from Kees Cook,
    which was discussed at the Linux Security Summit last year. Its
    purpose is to collect miscellaneous DAC security enhancements in one
    place. This also marks a departure in policy for LSM modules, which
    were previously limited to being standalone access control systems.
    Chromium OS is using Yama, and I believe there are plans for Ubuntu,
    at least.

    This patchset also includes maintenance updates for AppArmor, TOMOYO
    and others."

    Fix trivial conflict in due to the jumo_label->static_key
    rename.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
    AppArmor: Fix location of const qualifier on generated string tables
    TOMOYO: Return error if fails to delete a domain
    AppArmor: add const qualifiers to string arrays
    AppArmor: Add ability to load extended policy
    TOMOYO: Return appropriate value to poll().
    AppArmor: Move path failure information into aa_get_name and rename
    AppArmor: Update dfa matching routines.
    AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
    AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
    AppArmor: Add const qualifiers to generated string tables
    AppArmor: Fix oops in policy unpack auditing
    AppArmor: Fix error returned when a path lookup is disconnected
    KEYS: testing wrong bit for KEY_FLAG_REVOKED
    TOMOYO: Fix mount flags checking order.
    security: fix ima kconfig warning
    AppArmor: Fix the error case for chroot relative path name lookup
    AppArmor: fix mapping of META_READ to audit and quiet flags
    AppArmor: Fix underflow in xindex calculation
    AppArmor: Fix dropping of allowed operations that are force audited
    AppArmor: Add mising end of structure test to caps unpacking
    ...

    Linus Torvalds
     

21 Mar, 2012

2 commits


07 Mar, 2012

2 commits

  • Commit 6bd4837de96e ("mm: simplify find_vma_prev()") broke memory
    management on PA-RISC.

    After application of the patch, programs that allocate big arrays on the
    stack crash with segfault, for example, this will crash if compiled
    without optimization:

    int main()
    {
    char array[200000];
    array[199999] = 0;
    return 0;
    }

    The reason is that PA-RISC has up-growing stack and the stack is usually
    the last memory area. In the above example, a page fault happens above
    the stack.

    Previously, if we passed too high address to find_vma_prev, it returned
    NULL and stored the last VMA in *pprev. After "simplify find_vma_prev"
    change, it stores NULL in *pprev. Consequently, the stack area is not
    found and it is not expanded, as it used to be before the change.

    This patch restores the old behavior and makes it return the last VMA in
    *pprev if the requested address is higher than address of any other VMA.

    Signed-off-by: Mikulas Patocka
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     
  • Currently error is -ENOMEM when rejecting VM_GROWSDOWN|VM_GROWSUP
    from shared anonymous: hoist the file case's -EINVAL up for both.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

06 Mar, 2012

1 commit


17 Feb, 2012

2 commits

  • Make the uprobes code readable to me:

    - improve the Kconfig text so that a mere mortal gets some idea
    what CONFIG_UPROBES=y is really about

    - do trivial renames to standardize around the uprobes_*() namespace

    - clean up and simplify various code flow details

    - separate basic blocks of functionality

    - line break artifact and white space related removal

    - use standard local varible definition blocks

    - use vertical spacing to make things more readable

    - remove unnecessary volatile

    - restructure comment blocks to make them more uniform and
    more readable in general

    Cc: Srikar Dronamraju
    Cc: Jim Keniston
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Anton Arapov
    Cc: Ananth N Mavinakayanahalli
    Link: http://lkml.kernel.org/n/tip-ewbwhb8o6navvllsauu7k07p@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Add uprobes support to the core kernel, with x86 support.

    This commit adds the kernel facilities, the actual uprobes
    user-space ABI and perf probe support comes in later commits.

    General design:

    Uprobes are maintained in an rb-tree indexed by inode and offset
    (the offset here is from the start of the mapping). For a unique
    (inode, offset) tuple, there can be at most one uprobe in the
    rb-tree.

    Since the (inode, offset) tuple identifies a unique uprobe, more
    than one user may be interested in the same uprobe. This provides
    the ability to connect multiple 'consumers' to the same uprobe.

    Each consumer defines a handler and a filter (optional). The
    'handler' is run every time the uprobe is hit, if it matches the
    'filter' criteria.

    The first consumer of a uprobe causes the breakpoint to be
    inserted at the specified address and subsequent consumers are
    appended to this list. On subsequent probes, the consumer gets
    appended to the existing list of consumers. The breakpoint is
    removed when the last consumer unregisters. For all other
    unregisterations, the consumer is removed from the list of
    consumers.

    Given a inode, we get a list of the mms that have mapped the
    inode. Do the actual registration if mm maps the page where a
    probe needs to be inserted/removed.

    We use a temporary list to walk through the vmas that map the
    inode.

    - The number of maps that map the inode, is not known before we
    walk the rmap and keeps changing.
    - extending vm_area_struct wasn't recommended, it's a
    size-critical data structure.
    - There can be more than one maps of the inode in the same mm.

    We add callbacks to the mmap methods to keep an eye on text vmas
    that are of interest to uprobes. When a vma of interest is mapped,
    we insert the breakpoint at the right address.

    Uprobe works by replacing the instruction at the address defined
    by (inode, offset) with the arch specific breakpoint
    instruction. We save a copy of the original instruction at the
    uprobed address.

    This is needed for:

    a. executing the instruction out-of-line (xol).
    b. instruction analysis for any subsequent fixups.
    c. restoring the instruction back when the uprobe is unregistered.

    We insert or delete a breakpoint instruction, and this
    breakpoint instruction is assumed to be the smallest instruction
    available on the platform. For fixed size instruction platforms
    this is trivially true, for variable size instruction platforms
    the breakpoint instruction is typically the smallest (often a
    single byte).

    Writing the instruction is done by COWing the page and changing
    the instruction during the copy, this even though most platforms
    allow atomic writes of the breakpoint instruction. This also
    mirrors the behaviour of a ptrace() memory write to a PRIVATE
    file map.

    The core worker is derived from KSM's replace_page() logic.

    In essence, similar to KSM:

    a. allocate a new page and copy over contents of the page that
    has the uprobed vaddr
    b. modify the copy and insert the breakpoint at the required
    address
    c. switch the original page with the copy containing the
    breakpoint
    d. flush page tables.

    replace_page() is being replicated here because of some minor
    changes in the type of pages and also because Hugh Dickins had
    plans to improve replace_page() for KSM specific work.

    Instruction analysis on x86 is based on instruction decoder and
    determines if an instruction can be probed and determines the
    necessary fixups after singlestep. Instruction analysis is done
    at probe insertion time so that we avoid having to repeat the
    same analysis every time a probe is hit.

    A lot of code here is due to the improvement/suggestions/inputs
    from Peter Zijlstra.

    Changelog:

    (v10):
    - Add code to clear REX.B prefix as suggested by Denys Vlasenko
    and Masami Hiramatsu.

    (v9):
    - Use insn_offset_modrm as suggested by Masami Hiramatsu.

    (v7):

    Handle comments from Peter Zijlstra:

    - Dont take reference to inode. (expect inode to uprobe_register to be sane).
    - Use PTR_ERR to set the return value.
    - No need to take reference to inode.
    - use PTR_ERR to return error value.
    - register and uprobe_unregister share code.

    (v5):

    - Modified del_consumer as per comments from Peter.
    - Drop reference to inode before dropping reference to uprobe.
    - Use i_size_read(inode) instead of inode->i_size.
    - Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
    - Includes errno.h as recommended by Stephen Rothwell to fix a build issue
    on sparc defconfig
    - Remove restrictions while unregistering.
    - Earlier code leaked inode references under some conditions while
    registering/unregistering.
    - Continue the vma-rmap walk even if the intermediate vma doesnt
    meet the requirements.
    - Validate the vma found by find_vma before inserting/removing the
    breakpoint
    - Call del_consumer under mutex_lock.
    - Use hash locks.
    - Handle mremap.
    - Introduce find_least_offset_node() instead of close match logic in
    find_uprobe
    - Uprobes no more depends on MM_OWNER; No reference to task_structs
    while inserting/removing a probe.
    - Uses read_mapping_page instead of grab_cache_page so that the pages
    have valid content.
    - pass NULL to get_user_pages for the task parameter.
    - call SetPageUptodate on the new page allocated in write_opcode.
    - fix leaking a reference to the new page under certain conditions.
    - Include Instruction Decoder if Uprobes gets defined.
    - Remove const attributes for instruction prefix arrays.
    - Uses mm_context to know if the application is 32 bit.

    Signed-off-by: Srikar Dronamraju
    Also-written-by: Jim Keniston
    Reviewed-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Christoph Hellwig
    Cc: Steven Rostedt
    Cc: Roland McGrath
    Cc: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Anton Arapov
    Cc: Ananth N Mavinakayanahalli
    Cc: Stephen Rothwell
    Cc: Denys Vlasenko
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Linux-mm
    Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
    [ Made various small edits to the commit log ]
    Signed-off-by: Ingo Molnar

    Srikar Dronamraju