12 May, 2013

2 commits

  • Pull tracing/kprobes update from Steven Rostedt:
    "The majority of these changes are from Masami Hiramatsu bringing
    kprobes up to par with the latest changes to ftrace (multi buffering
    and the new function probes).

    He also discovered and fixed some bugs in doing so. When pulling in
    his patches, I also found a few minor bugs as well and fixed them.

    This also includes a compile fix for some archs that select the ring
    buffer but not tracing.

    I based this off of the last patch you took from me that fixed the
    merge conflict error, as that was the commit that had all the changes
    I needed for this set of changes."

    * tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing/kprobes: Support soft-mode disabling
    tracing/kprobes: Support ftrace_event_file base multibuffer
    tracing/kprobes: Pass trace_probe directly from dispatcher
    tracing/kprobes: Increment probe hit-count even if it is used by perf
    tracing/kprobes: Use bool for retprobe checker
    ftrace: Fix function probe when more than one probe is added
    ftrace: Fix the output of enabled_functions debug file
    ftrace: Fix locking in register_ftrace_function_probe()
    tracing: Add helper function trace_create_new_event() to remove duplicate code
    tracing: Modify soft-mode only if there's no other referrer
    tracing: Indicate enabled soft-mode in enable file
    tracing/kprobes: Fix to increment return event probe hit-count
    ftrace: Cleanup regex_lock and ftrace_lock around hash updating
    ftrace, kprobes: Fix a deadlock on ftrace_regex_lock
    ftrace: Have ftrace_regex_write() return either read or error
    tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
    tracing: Don't succeed if event_enable_func did not register anything
    ring-buffer: Select IRQ_WORK

    Linus Torvalds
     
  • Pull audit changes from Eric Paris:
    "Al used to send pull requests every couple of years but he told me to
    just start pushing them to you directly.

    Our touching outside of core audit code is pretty straight forward. A
    couple of interface changes which hit net/. A simple argument bug
    calling audit functions in namei.c and the removal of some assembly
    branch prediction code on ppc"

    * git://git.infradead.org/users/eparis/audit: (31 commits)
    audit: fix message spacing printing auid
    Revert "audit: move kaudit thread start from auditd registration to kaudit init"
    audit: vfs: fix audit_inode call in O_CREAT case of do_last
    audit: Make testing for a valid loginuid explicit.
    audit: fix event coverage of AUDIT_ANOM_LINK
    audit: use spin_lock in audit_receive_msg to process tty logging
    audit: do not needlessly take a lock in tty_audit_exit
    audit: do not needlessly take a spinlock in copy_signal
    audit: add an option to control logging of passwords with pam_tty_audit
    audit: use spin_lock_irqsave/restore in audit tty code
    helper for some session id stuff
    audit: use a consistent audit helper to log lsm information
    audit: push loginuid and sessionid processing down
    audit: stop pushing loginid, uid, sessionid as arguments
    audit: remove the old depricated kernel interface
    audit: make validity checking generic
    audit: allow checking the type of audit message in the user filter
    audit: fix build break when AUDIT_DEBUG == 2
    audit: remove duplicate export of audit_enabled
    Audit: do not print error when LSMs disabled
    ...

    Linus Torvalds
     

11 May, 2013

2 commits

  • Pull stray syscall bits from Al Viro:
    "Several syscall-related commits that were missing from the original"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
    unicore32: just use mmap_pgoff()...
    unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
    x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)

    Linus Torvalds
     
  • Pull misc fixes from David Woodhouse:
    "This is some miscellaneous cleanups that don't really belong anywhere
    else (or were ignored), that have been sitting in linux-next for some
    time. Two of them are fixes resulting from my audit of krealloc()
    usage that don't seem to have elicited any response when I posted
    them, and the other three are patches from Artem removing dead code."

    * tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6:
    pcmcia: remove RPX board stuff
    m68k: remove rpxlite stuff
    pcmcia: remove Motorola MBX860 support
    params: Fix potential memory leak in add_sysfs_param()
    dell-laptop: Fix krealloc() misuse in parse_da_table()

    Linus Torvalds
     

10 May, 2013

16 commits

  • Support soft-mode disabling on kprobe-based dynamic events.
    Soft-disabling is just ignoring recording if the soft disabled
    flag is set.

    Link: http://lkml.kernel.org/r/20130509054454.30398.7237.stgit@mhiramat-M0-7522

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Cc: Oleg Nesterov
    Cc: Srikar Dronamraju
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Support multi-buffer on kprobe-based dynamic events by
    using ftrace_event_file.

    Link: http://lkml.kernel.org/r/20130509054449.30398.88343.stgit@mhiramat-M0-7522

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Cc: Oleg Nesterov
    Cc: Srikar Dronamraju
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Pass the pointer of struct trace_probe directly from probe
    dispatcher to handlers. This removes redundant container_of
    macro uses. Same thing has already done in trace_uprobe.

    Link: http://lkml.kernel.org/r/20130509054441.30398.69112.stgit@mhiramat-M0-7522

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Cc: Oleg Nesterov
    Cc: Srikar Dronamraju
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Increment probe hit-count for profiling even if it is used
    by perf tool. Same thing has already done in trace_uprobe.

    Link: http://lkml.kernel.org/r/20130509054436.30398.21133.stgit@mhiramat-M0-7522

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Cc: Oleg Nesterov
    Cc: Srikar Dronamraju
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Use bool instead of int for kretprobe checker.

    Link: http://lkml.kernel.org/r/20130509054431.30398.38561.stgit@mhiramat-M0-7522

    Cc: Srikar Dronamraju
    Cc: Oleg Nesterov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • When the first function probe is added and the function tracer
    is updated the functions are modified to call the probe.
    But when a second function is added, it updates the function
    records to have the second function also update, but it fails
    to update the actual function itself.

    This prevents the second (or third or forth and so on) probes
    from having their functions called.

    # echo vfs_symlink:enable_event:sched:sched_switch > set_ftrace_filter
    # echo vfs_unlink:enable_event:sched:sched_switch > set_ftrace_filter
    # cat trace
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 0/0 #P:4
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    # touch /tmp/a
    # rm /tmp/a
    # cat trace
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 0/0 #P:4
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    # ln -s /tmp/a
    # cat trace
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 414/414 #P:4
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    -0 [000] d..3 2847.923031: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=2786 next_prio=120
    -3114 [001] d..4 2847.923035: sched_switch: prev_comm=ln prev_pid=3114 prev_prio=120 prev_state=x ==> next_comm=swapper/1 next_pid=0 next_prio=120
    bash-2786 [000] d..3 2847.923535: sched_switch: prev_comm=bash prev_pid=2786 prev_prio=120 prev_state=S ==> next_comm=kworker/0:1 next_pid=34 next_prio=120
    kworker/0:1-34 [000] d..3 2847.923552: sched_switch: prev_comm=kworker/0:1 prev_pid=34 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
    -0 [002] d..3 2847.923554: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=2783 next_prio=120
    sshd-2783 [002] d..3 2847.923660: sched_switch: prev_comm=sshd prev_pid=2783 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120

    Still need to update the functions even though the probe itself
    does not need to be registered again when added a new probe.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • The enabled_functions debugfs file was created to be able to see
    what functions have been modified from nops to calling a tracer.

    The current method uses the counter in the function record.
    As when a ftrace_ops is registered to a function, its count
    increases. But that doesn't mean that the function is actively
    being traced. /proc/sys/kernel/ftrace_enabled can be set to zero
    which would disable it, as well as something can go wrong and
    we can think its enabled when only the counter is set.

    The record's FTRACE_FL_ENABLED flag is set or cleared when its
    function is modified. That is a much more accurate way of knowing
    what function is enabled or not.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • The iteration of the ftrace function list and the call to
    ftrace_match_record() need to be protected by the ftrace_lock.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Both __trace_add_new_event() and __trace_early_add_new_event() do
    basically the same thing, except that __trace_add_new_event() does
    a little more.

    Instead of having duplicate code between the two functions, add
    a helper function trace_create_new_event() that both can use.
    This will help against having bugs fixed in one function but not
    the other.

    Cc: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Modify soft-mode flag only if no other soft-mode referrer
    (currently only the ftrace triggers) by using a reference
    counter in each ftrace_event_file.

    Without this fix, adding and removing several different
    enable/disable_event triggers on the same event clear
    soft-mode bit from the ftrace_event_file. This also
    happens with a typo of glob on setting triggers.

    e.g.

    # echo vfs_symlink:enable_event:net:netif_rx > set_ftrace_filter
    # cat events/net/netif_rx/enable
    0*
    # echo typo_func:enable_event:net:netif_rx > set_ftrace_filter
    # cat events/net/netif_rx/enable
    0
    # cat set_ftrace_filter
    #### all functions enabled ####
    vfs_symlink:enable_event:net:netif_rx:unlimited

    As above, we still have a trigger, but soft-mode is gone.

    Link: http://lkml.kernel.org/r/20130509054429.30398.7464.stgit@mhiramat-M0-7522

    Cc: Srikar Dronamraju
    Cc: Oleg Nesterov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: David Sharp
    Cc: Hiraku Toyooka
    Cc: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Indicate enabled soft-mode event as "1*" in "enable" file
    for each event, because it can be soft-disabled when disable_event
    trigger is hit.

    Link: http://lkml.kernel.org/r/20130509054426.30398.28202.stgit@mhiramat-M0-7522

    Cc: Srikar Dronamraju
    Cc: Oleg Nesterov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Fix to increment probe hit-count for function return event.

    Link: http://lkml.kernel.org/r/20130509054424.30398.34058.stgit@mhiramat-M0-7522

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Cc: Oleg Nesterov
    Cc: Srikar Dronamraju
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Cleanup regex_lock and ftrace_lock locking points around
    ftrace_ops hash update code.

    The new rule is that regex_lock protects ops->*_hash
    read-update-write code for each ftrace_ops. Usually,
    hash update is done by following sequence.

    1. allocate a new local hash and copy the original hash.
    2. update the local hash.
    3. move(actually, copy) back the local hash to ftrace_ops.
    4. update ftrace entries if needed.
    5. release the local hash.

    This makes regex_lock protect #1-#4, and ftrace_lock
    to protect #3, #4 and adding and removing ftrace_ops from the
    ftrace_ops_list. The ftrace_lock protects #3 as well because
    the move functions update the entries too.

    Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522

    Cc: Srikar Dronamraju
    Cc: Oleg Nesterov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Fix a deadlock on ftrace_regex_lock which happens when setting
    an enable_event trigger on dynamic kprobe event as below.

    ----
    sh-2.05b# echo p vfs_symlink > kprobe_events
    sh-2.05b# echo vfs_symlink:enable_event:kprobes:p_vfs_symlink_0 > set_ftrace_filter

    =============================================
    [ INFO: possible recursive locking detected ]
    3.9.0+ #35 Not tainted
    ---------------------------------------------
    sh/72 is trying to acquire lock:
    (ftrace_regex_lock){+.+.+.}, at: [] ftrace_set_hash+0x81/0x1f0

    but task is already holding lock:
    (ftrace_regex_lock){+.+.+.}, at: [] ftrace_regex_write.isra.29.part.30+0x3d/0x220

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(ftrace_regex_lock);
    lock(ftrace_regex_lock);

    *** DEADLOCK ***
    ----

    To fix that, this introduces a finer regex_lock for each ftrace_ops.
    ftrace_regex_lock is too big of a lock which protects all
    filter/notrace_hash operations, but it doesn't need to be a global
    lock after supporting multiple ftrace_ops because each ftrace_ops
    has its own filter/notrace_hash.

    Link: http://lkml.kernel.org/r/20130509054417.30398.84254.stgit@mhiramat-M0-7522

    Cc: Srikar Dronamraju
    Cc: Oleg Nesterov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    [ Added initialization flag and automate mutex initialization for
    non ftrace.c ftrace_probes. ]
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     

09 May, 2013

5 commits

  • As ftrace_regex_write() reads the result of ftrace_process_regex()
    which can sometimes return a positive number, only consider a
    failure if the return is negative. Otherwise, it will skip possible
    other registered probes and by returning a positive number that
    wasn't read, it will confuse the user processes doing the writing.

    Cc: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • register_ftrace_function_probe() returns the number of functions
    it registered, which can be zero, it can also return a negative number
    if something went wrong. But event_enable_func() only checks for
    the case that it didn't register anything, it needs to also check
    for the case that something went wrong and return that error code
    as well.

    Added some comments about the code as well, to make it more
    understandable.

    Cc: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Return 0 instead of the number of activated ftrace function probes if
    event_enable_func succeeded and return an error code if it failed or
    did not register any functions. But it currently returns the number
    of registered functions and if it didn't register anything, it returns 0,
    but that is considered success.

    This also fixes the return value. As if it succeeds, it returns the
    number of functions that were enabled, which is returned back to
    the user in ftrace_regex_write (the write() return code). If only
    one function is enabled, then the return code of the write is one,
    and this can confuse the user program in thinking it only wrote 1
    byte.

    Link: http://lkml.kernel.org/r/20130509054413.30398.55650.stgit@mhiramat-M0-7522

    Cc: Srikar Dronamraju
    Cc: Oleg Nesterov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    [ Rewrote change log to reflect that this fixes two bugs - SR ]
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Pull block driver updates from Jens Axboe:
    "It might look big in volume, but when categorized, not a lot of
    drivers are touched. The pull request contains:

    - mtip32xx fixes from Micron.

    - A slew of drbd updates, this time in a nicer series.

    - bcache, a flash/ssd caching framework from Kent.

    - Fixes for cciss"

    * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
    bcache: Use bd_link_disk_holder()
    bcache: Allocator cleanup/fixes
    cciss: bug fix to prevent cciss from loading in kdump crash kernel
    cciss: add cciss_allow_hpsa module parameter
    drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
    mtip32xx: Workaround for unaligned writes
    bcache: Make sure blocksize isn't smaller than device blocksize
    bcache: Fix merge_bvec_fn usage for when it modifies the bvm
    bcache: Correctly check against BIO_MAX_PAGES
    bcache: Hack around stuff that clones up to bi_max_vecs
    bcache: Set ra_pages based on backing device's ra_pages
    bcache: Take data offset from the bdev superblock.
    mtip32xx: mtip32xx: Disable TRIM support
    mtip32xx: fix a smatch warning
    bcache: Disable broken btree fuzz tester
    bcache: Fix a format string overflow
    bcache: Fix a minor memory leak on device teardown
    bcache: Documentation updates
    bcache: Use WARN_ONCE() instead of __WARN()
    bcache: Add missing #include
    ...

    Linus Torvalds
     
  • Pull block core updates from Jens Axboe:

    - Major bit is Kents prep work for immutable bio vecs.

    - Stable candidate fix for a scheduling-while-atomic in the queue
    bypass operation.

    - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
    discard bios.

    - Tejuns changes to convert the writeback thread pool to the generic
    workqueue mechanism.

    - Runtime PM framework, SCSI patches exists on top of these in James'
    tree.

    - A few random fixes.

    * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
    relay: move remove_buf_file inside relay_close_buf
    partitions/efi.c: replace useless kzalloc's by kmalloc's
    fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
    block: fix max discard sectors limit
    blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
    Documentation: cfq-iosched: update documentation help for cfq tunables
    writeback: expose the bdi_wq workqueue
    writeback: replace custom worker pool implementation with unbound workqueue
    writeback: remove unused bdi_pending_list
    aoe: Fix unitialized var usage
    bio-integrity: Add explicit field for owner of bip_buf
    block: Add an explicit bio flag for bios that own their bvec
    block: Add bio_alloc_pages()
    block: Convert some code to bio_for_each_segment_all()
    block: Add bio_for_each_segment_all()
    bounce: Refactor __blk_queue_bounce to not use bi_io_vec
    raid1: use bio_copy_data()
    pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
    pktcdvd: use bio_copy_data()
    block: Add bio_copy_data()
    ...

    Linus Torvalds
     

08 May, 2013

5 commits

  • The helper function didn't include a leading space, so it was jammed
    against the previous text in the audit record.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • Merge more incoming from Andrew Morton:

    - Various fixes which were stalled or which I picked up recently

    - A large rotorooting of the AIO code. Allegedly to improve
    performance but I don't really have good performance numbers (I might
    have lost the email) and I can't raise Kent today. I held this out
    of 3.9 and we could give it another cycle if it's all too late/scary.

    I ended up taking only the first two thirds of the AIO rotorooting. I
    left the percpu parts and the batch completion for later. - Linus

    * emailed patches from Andrew Morton : (33 commits)
    aio: don't include aio.h in sched.h
    aio: kill ki_retry
    aio: kill ki_key
    aio: give shared kioctx fields their own cachelines
    aio: kill struct aio_ring_info
    aio: kill batch allocation
    aio: change reqs_active to include unreaped completions
    aio: use cancellation list lazily
    aio: use flush_dcache_page()
    aio: make aio_read_evt() more efficient, convert to hrtimers
    wait: add wait_event_hrtimeout()
    aio: refcounting cleanup
    aio: make aio_put_req() lockless
    aio: do fget() after aio_get_req()
    aio: dprintk() -> pr_debug()
    aio: move private stuff out of aio.h
    aio: add kiocb_cancel()
    aio: kill return value of aio_complete()
    char: add aio_{read,write} to /dev/{null,zero}
    aio: remove retry-based AIO
    ...

    Linus Torvalds
     
  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • This reverts commit 6ff5e45985c2fcb97947818f66d1eeaf9d6600b2.

    Conflicts:
    kernel/audit.c

    This patch was starting a kthread for all the time. Since the follow on
    patches that required it didn't get finished in 3.10 time, we shouldn't
    ship this change in 3.10.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • audit rule additions containing "-F auid!=4294967295" were failing
    with EINVAL because of a regression caused by e1760bd.

    Apparently some userland audit rule sets want to know if loginuid uid
    has been set and are using a test for auid != 4294967295 to determine
    that.

    In practice that is a horrible way to ask if a value has been set,
    because it relies on subtle implementation details and will break
    every time the uid implementation in the kernel changes.

    So add a clean way to test if the audit loginuid has been set, and
    silently convert the old idiom to the cleaner and more comprehensible
    new idiom.

    Cc: # 3.7
    Reported-By: Richard Guy Briggs
    Signed-off-by: "Eric W. Biederman"
    Tested-by: Richard Guy Briggs
    Signed-off-by: Eric Paris

    Eric W. Biederman
     

06 May, 2013

4 commits

  • Some interrupt controllers refuse to map interrupts marked as
    "protected" by firwmare. Since we try to map everyting in the
    device-tree on some platforms, we end up with a lot of nasty
    WARN's in the boot log for what is a normal situation on those
    machines.

    This defines a specific return code (-EPERM) from the host map()
    callback which cause irqdomain to fail silently.

    MPIC is updated to return this when hitting a protected source
    printing only a single line message for diagnostic purposes.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "Misc fixes plus a small hw-enablement patch for Intel IB model 58
    uncore events"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
    perf/x86/intel/lbr: Fix LBR filter
    perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
    perf: Fix vmalloc ring buffer pages handling
    perf/x86/intel: Fix unintended variable name reuse
    perf/x86/intel: Add support for IvyBridge model 58 Uncore
    perf/x86/intel: Fix typo in perf_event_intel_uncore.c
    x86: Eliminate irq_mis_count counted in arch_irq_stat

    Linus Torvalds
     
  • Pull mudule updates from Rusty Russell:
    "We get rid of the general module prefix confusion with a binary config
    option, fix a remove/insert race which Never Happens, and (my
    favorite) handle the case when we have too many modules for a single
    commandline. Seriously, the kernel is full, please go away!"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    modpost: fix unwanted VMLINUX_SYMBOL_STR expansion
    X.509: Support parse long form of length octets in Authority Key Identifier
    module: don't unlink the module until we've removed all exposure.
    kernel: kallsyms: memory override issue, need check destination buffer length
    MODSIGN: do not send garbage to stderr when enabling modules signature
    modpost: handle huge numbers of modules.
    modpost: add -T option to read module names from file/stdin.
    modpost: minor cleanup.
    genksyms: pass symbol-prefix instead of arch
    module: fix symbol versioning with symbol prefixes
    CONFIG_SYMBOL_PREFIX: cleanup.

    Linus Torvalds
     

05 May, 2013

3 commits

  • Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • Pull second round of VFS updates from Al Viro:
    "Assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    xtensa simdisk: fix braino in "xtensa simdisk: switch to proc_create_data()"
    hostfs: use kmalloc instead of kzalloc
    hostfs: move HOSTFS_SUPER_MAGIC to
    hostfs: remove "will unlock" comment
    vfs: use list_move instead of list_del/list_add
    proc_devtree: Replace include linux/module.h with linux/export.h
    create_mnt_ns: unidiomatic use of list_add()
    fs: remove dentry_lru_prune()
    Removed unused typedef to avoid "unused local typedef" warnings.
    kill fs/read_write.h
    fs: Fix hang with BSD accounting on frozen filesystem
    sun3_scsi: add ->show_info()
    nubus: Kill nubus_proc_detach_device()
    more mode_t whack-a-mole...
    do_coredump(): don't wait for thaw if coredump has already been interrupted
    do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks")

    Linus Torvalds
     
  • When BSD process accounting is enabled and logs information to a
    filesystem which gets frozen, system easily becomes unusable because
    each attempt to account process information blocks. Thus e.g. every task
    gets blocked in exit.

    It seems better to drop accounting information (which can already happen
    when filesystem is running out of space) instead of locking system up.
    So we just skip the write if the filesystem is frozen.

    Reported-by: Nikola Ciprich
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

04 May, 2013

3 commits

  • The scheduler doesn't yet fully support environments
    with a single task running without a periodic tick.

    In order to ensure we still maintain the duties of scheduler_tick(),
    keep at least 1 tick per second.

    This makes sure that we keep the progression of various scheduler
    accounting and background maintainance even with a very low granularity.
    Examples include cpu load, sched average, CFS entity vruntime,
    avenrun and events such as load balancing, amongst other details
    handled in sched_class::task_tick().

    This limitation will be removed in the future once we get
    these individual items to work in full dynticks CPUs.

    Suggested-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Commit 0637e029392386e6996f5d6574aadccee8315efa
    ("nohz: Select wide RCU nocb for full dynticks") intended
    to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
    enabled.

    However this option is part of a choice menu and Kconfig's
    "select" instruction has no effect on such targets.

    Fix this by using reverse dependencies on the targets we
    don't want instead.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • As the wake up logic for waiters on the buffer has been moved
    from the tracing code to the ring buffer, it requires also adding
    IRQ_WORK as the wake up code is performed via irq_work.

    This fixes compile breakage when a user of the ring buffer is selected
    but tracing and irq_work are not.

    Link http://lkml.kernel.org/r/20130503115332.GT8356@rric.localhost

    Cc: Arnd Bergmann
    Reported-by: Robert Richter
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)