25 Apr, 2009

1 commit

  • Commit c751085943362143f84346d274e0011419c84202 ("PM/Hibernate: Wait for
    SCSI devices scan to complete during resume") added a call to
    scsi_complete_async_scans() to software_resume(), so that it waited for
    the SCSI scanning to complete, but the call was added at a wrong place.

    Namely, it should have been added after wait_for_device_probe(), which
    is called only if the image partition hasn't been specified yet. Also,
    it's reasonable to check if the image partition is present and only wait
    for the device probing and SCSI scanning to complete if it is not the
    case.

    Additionally, since noresume is checked right at the beginning of
    software_resume() and the function returns immediately if it's set, it
    doesn't make sense to check it once again later.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

24 Apr, 2009

1 commit

  • Slow-work appears to delete its timer as soon as the first user
    unregisters, even though other users could be active. At the same time, it
    never seems to delete slow_work_oom_timer. Arrange for both to happen in
    the shutdown path.

    Signed-off-by: Jonathan Corbet
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Jonathan Corbet
     

22 Apr, 2009

2 commits

  • Add enable() and disable() callbacks for clocksources.

    This allows us to put unused clocksources in power save mode. The
    functions clocksource_enable() and clocksource_disable() wrap the
    callbacks and are inserted in the timekeeping code to enable before use
    and disable after switching to a new clocksource.

    Signed-off-by: Magnus Damm
    Acked-by: John Stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Magnus Damm
     
  • Pass clocksource pointer to the read() callback for clocksources. This
    allows us to share the callback between multiple instances.

    [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Magnus Damm
    Acked-by: John Stultz
    Cc: Thomas Gleixner
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Magnus Damm
     

21 Apr, 2009

1 commit


20 Apr, 2009

1 commit

  • Commit 900af0d973856d6feb6fc088c2d0d3fde57707d3 (PM: Change suspend
    code ordering) changed the ordering of suspend code in such a way
    that the platform .prepare() callback is now executed after the
    device drivers' late suspend callbacks have run. Unfortunately, this
    turns out to break ARM platforms that need to talk via I2C to power
    control devices during the .prepare() callback.

    For this reason introduce two new platform suspend callbacks,
    .prepare_late() and .wake(), that will be called just prior to
    disabling non-boot CPUs and right after bringing them back on line,
    respectively, and use them instead of .prepare() and .finish() for
    ACPI suspend. Make the PM core execute the .prepare() and .finish()
    platform suspend callbacks where they were executed previously (that
    is, right after calling the regular suspend methods provided by
    device drivers and right before executing their regular resume
    methods, respectively).

    It is not necessary to make analogous changes to the hibernation
    code and data structures at the moment, because they are only used
    by ACPI platforms.

    Signed-off-by: Rafael J. Wysocki
    Reported-by: Russell King
    Acked-by: Len Brown

    Rafael J. Wysocki
     

19 Apr, 2009

1 commit

  • This function is not actually used right now, since the original use
    case for it was done with insert_resource_expand_to_fit() instead.

    However, we now have another usage case that wants to basically do a
    "reserve IO resource, splitting around existing resources", however that
    one doesn't actually want the "recurse into the conflicting resource"
    logic at all.

    And since recursing into the conflicting resource was the most complex
    part, and isn't wanted, just remove it. Maybe we'll some day want both
    versions, but we can just resurrect the logic then.

    Tested-by: Yinghai Lu
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Apr, 2009

5 commits


16 Apr, 2009

1 commit

  • Don't try and predeclare inline funcs like this:

    static inline void wait_migrated_callbacks(void)
    ...
    static void _rcu_barrier(enum rcu_barrier type)
    {
    ...
    wait_migrated_callbacks();
    }
    ...
    static inline void wait_migrated_callbacks(void)
    {
    wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
    }

    as it upsets some versions of gcc under some circumstances:

    kernel/rcupdate.c: In function `_rcu_barrier':
    kernel/rcupdate.c:125: sorry, unimplemented: inlining failed in call to 'wait_migrated_callbacks': function body not available
    kernel/rcupdate.c:152: sorry, unimplemented: called from here

    This can be dealt with by simply putting the static variables (rcu_migrate_*)
    at the top, and moving the implementation of the function up so that it
    replaces its forward declaration.

    Signed-off-by: David Howells
    Cc: Dipankar Sarma
    Cc: Paul E. McKenney
    Signed-off-by: Linus Torvalds

    David Howells
     

15 Apr, 2009

1 commit


14 Apr, 2009

10 commits

  • This patch fixes a hierarchical-RCU performance bug located by Anton
    Blanchard. The problem stems from a misguided attempt to provide a
    work-around for jiffies-counter failure. This work-around uses a per-CPU
    n_rcu_pending counter, which is incremented on each call to rcu_pending(),
    which in turn is called from each scheduling-clock interrupt. Each CPU
    then treats this counter as a surrogate for the jiffies counter, so
    that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
    counter will cause RCU to invoke force_quiescent_state(), which in turn
    will (among other things) send resched IPIs to CPUs that have thus far
    failed to pass through an RCU quiescent state.

    Unfortunately, each CPU resets only its own counter after sending a
    batch of IPIs. This means that the other CPUs will also (needlessly)
    send -another- round of IPIs, for a full N-squared set of IPIs in the
    worst case every three scheduler-clock ticks until the grace period
    finally ends. It is not reasonable for a given CPU to reset each and
    every n_rcu_pending for all the other CPUs, so this patch instead simply
    disables the jiffies-counter "training wheels", thus eliminating the
    excessive IPIs.

    Note that the jiffies-counter IPIs do not have this problem due to
    the fact that the jiffies counter is global, so that the CPU sending
    the IPIs can easily reset things, thus preventing the other CPUs from
    sending redundant IPIs.

    Note also that the n_rcu_pending counter remains, as it will continue to
    be used for tracing. It may also see use to update the jiffies counter,
    should an appropriate kick-the-jiffies-counter API appear.

    Located-by: Anton Blanchard
    Tested-by: Anton Blanchard
    Signed-off-by: Paul E. McKenney
    Cc: anton@samba.org
    Cc: akpm@linux-foundation.org
    Cc: dipankar@in.ibm.com
    Cc: manfred@colorfullife.com
    Cc: cl@linux-foundation.org
    Cc: josht@linux.vnet.ibm.com
    Cc: schamp@sgi.com
    Cc: niv@us.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: ego@in.ibm.com
    Cc: laijs@cn.fujitsu.com
    Cc: rostedt@goodmis.org
    Cc: peterz@infradead.org
    Cc: penberg@cs.helsinki.fi
    Cc: andi@firstfloor.org
    Cc: "Paul E. McKenney"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Before patch:

    # tracer: branch
    #
    # TASK-PID CPU# TIMESTAMP FUNCTION
    # | | | | |
    -2981 [000] 24008.872738: [ ok ] trace_irq_handler_exit:irq_event_types.h:41
    -2981 [000] 24008.872742: [ ok ] note_interrupt:spurious.c:229
    ...

    After patch:

    # tracer: branch
    #
    # TASK-PID CPU# TIMESTAMP CORRECT FUNC:FILE:LINE
    # | | | | | |
    -2985 [000] 26329.142970: [ ok ] slab_free:slub.c:1776
    -2985 [000] 26329.142972: [ ok ] trace_kmem_cache_free:kmem_event_types.h:191
    ...

    Signed-off-by: Zhao Lei
    Acked-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Zhaolei
     
  • Impact: remove overly redundant tracing entries

    When tracer is "function" or "function_graph", way too much
    "get_parent_ip" entries are recorded in ring_buffer.

    Signed-off-by: Lai Jiangshan
    Acked-by: Frederic Weisbecker
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: cleanup, fix

    Clean up sys_shutdown() exit path. Factor out common code. Return
    correct error code instead of always 0 on failure.

    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Pointed out by Roland. The bug was recently introduced by me in
    "forget_original_parent: split out the un-ptrace part", commit
    39c626ae47c469abdfd30c6e42eff884931380d6.

    Since that patch we have a window after exit_ptrace() drops tasklist and
    before forget_original_parent() takes it again. In this window the child
    can do ptrace(PTRACE_TRACEME) and nobody can untrace this child after
    that.

    Change ptrace_traceme() to not attach to the exiting ->real_parent. We
    don't report the error in this case, we pretend we attach right before
    ->real_parent calls exit_ptrace() which should untrace us anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • vm knobs should go in the vm table. Probably too late for
    randomize_va_space though.

    Signed-off-by: Peter Zijlstra
    Acked-by: Lee Schermerhorn
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Before patch:
    # tracer: power
    #
    # TASK-PID CPU# TIMESTAMP FUNCTION
    # | | | | |
    [ 676.875865889] CSTATE: Going to C1 on cpu 0 for 0.005911463
    [ 676.882938805] CSTATE: Going to C1 on cpu 0 for 0.104796532
    ...

    After patch:
    # tracer: power
    #
    # TIMESTAMP STATE EVENT
    # | | |
    [ 676.875865889] CSTATE: Going to C1 on cpu 0 for 0.005911463
    [ 676.882938805] CSTATE: Going to C1 on cpu 0 for 0.104796532
    ...

    v2: Use seq_puts instead of seq_printf

    Signed-off-by: Zhao Lei
    Cc: Arjan van de Ven
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Zhaolei
     
  • There is a race between resume from hibernation and the asynchronous
    scanning of SCSI devices and to prevent it from happening we need to
    call scsi_complete_async_scans() during resume from hibernation.

    In addition, if the resume from hibernation is userland-driven, it's
    better to wait for all device probes in the kernel to complete before
    attempting to open the resume device.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing/filters: return proper error code when writing filter file
    tracing/filters: allow user input integer to be oct or hex
    tracing/filters: fix NULL pointer dereference
    tracing/filters: NIL-terminate user input filter
    ftrace: Output REC->var instead of __entry->var for trace format
    Make __stringify support variable argument macros too
    tracing: fix document references
    tracing: fix splice return too large
    tracing: update file->f_pos when splice(2) it
    tracing: allocate page when needed
    tracing: disable seeking for trace_pipe_raw

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    lockdep: continue lock debugging despite some taints
    lockdep: warn about lockdep disabling after kernel taint

    Linus Torvalds
     

13 Apr, 2009

1 commit


12 Apr, 2009

7 commits

  • Impact: broaden lockdep checks

    Lockdep is disabled after any kernel taints. This might be convenient
    to ignore bad locking issues which sources come from outside the kernel
    tree. Nevertheless, it might be a frustrating experience for the
    staging developers or those who experience a warning but are focused
    on another things that require lockdep.

    The v2 of this patch simply don't disable anymore lockdep in case
    of TAINT_CRAP and TAINT_WARN events.

    Signed-off-by: Frederic Weisbecker
    Cc: LTP
    Cc: Peter Zijlstra
    Cc: Greg KH
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: provide useful missing info for developers

    Kernel taint can occur in several situations such as warnings,
    load of prorietary or staging modules, bad page, etc...

    But when such taint happens, a developer might still be working on
    the kernel, expecting that lockdep is still enabled. But a taint
    disables lockdep without ever warning about it.
    Such a kernel behaviour doesn't really help for kernel development.

    This patch adds this missing warning.

    Since the taint is done most of the time after the main message that
    explain the real source issue, it seems safe to warn about it inside
    add_taint() so that it appears at last, without hurting the main
    information.

    v2: Use a generic helper to disable lockdep instead of an
    open coded xchg().

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • - propagate return value of filter_add_pred() to the user

    - return -ENOSPC but not -ENOMEM or -EINVAL when the filter array
    is full

    Signed-off-by: Li Zefan
    Acked-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • Before patch:

    # echo 'parent_pid == 0x10' > events/sched/sched_process_fork/filter
    # cat sched/sched_process_fork/filter
    parent_pid == 0

    After patch:

    # cat sched/sched_process_fork/filter
    parent_pid == 16

    Also check the input more strictly.

    Signed-off-by: Li Zefan
    Acked-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • Try this, and you'll see NULL pointer dereference bug:

    # echo -n 'parent_comm ==' > sched/sched_process_fork/filter

    Because we passed NULL ptr to simple_strtoull().

    Signed-off-by: Li Zefan
    Acked-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • Make sure messages from user space are NIL-terminated strings,
    otherwise we could dump random memory while reading filter file.

    Try this:
    # echo 'parent_comm ==' > events/sched/sched_process_fork/filter
    # cat events/sched/sched_process_fork/filter
    parent_comm == �

    Signed-off-by: Li Zefan
    Acked-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • Several drivers use asynchronous work to do device discovery, and we
    synchronize with them in the compiled-in case before we actually try to
    mount root filesystems etc.

    However, when compiled as modules, that synchronization is missing - the
    module loading completes, but the driver hasn't actually finished
    probing for devices, and that means that any user mode that expects to
    use the devices after the 'insmod' is now potentially broken.

    We already saw one case of a similar issue in the ACPI battery code,
    where the kernel itself expected the module to be all done, and unmapped
    the init memory - but the async device discovery was still running.
    That got hacked around by just removing the "__init" (see commit
    5d38258ec026921a7b266f4047ebeaa75db358e5 "ACPI battery: fix async boot
    oops"), but the real fix is to just make the module loading wait for all
    async work to be completed.

    It will slow down module loading, but since common devices should be
    built in anyway, and since the bug is really annoying and hard to handle
    from user space (and caused several S3 resume regressions), the simple
    fix to wait is the right one.

    This fixes at least

    http://bugzilla.kernel.org/show_bug.cgi?id=13063

    but probably a few other bugzilla entries too (12936, for example), and
    is confirmed to fix Rafael's storage driver breakage after resume bug
    report (no bugzilla entry).

    We should also be able to now revert that ACPI battery fix.

    Reported-and-tested-by: Rafael J. Wysocki
    Tested-by: Heinz Diehl
    Acked-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Apr, 2009

8 commits

  • print fmt: "irq=%d return=%s", __entry->irq, __entry->ret ? \"handled\" : \"unhandled\"

    "__entry" should be convert to "REC" by __stringify() macro.

    Signed-off-by: Zhao Lei
    Acked-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Zhaolei
     
  • When moving documents to Documentation/trace/, I forgot to
    grep Kconfig to find out those references.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Pekka Enberg
    Cc: Pekka Paalanen
    Cc: eduard.munteanu@linux360.ro
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • I got these from strace:

    splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
    splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
    splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
    splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
    splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
    splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
    splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192

    I wanted to splice_read 4096 bytes, but it returns 8192 or larger.

    It is because the return value of tracing_buffers_splice_read()
    does not include "zero out any left over data" bytes.

    But tracing_buffers_read() includes these bytes, we make them
    consistent.

    Signed-off-by: Lai Jiangshan
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: Cleanup

    These two lines:

    if (unlikely(*ppos))
    return -ESPIPE;

    in tracing_buffers_splice_read() are not needed, VFS layer
    has disabled seek(2).

    We remove these two lines, and then we can update file->f_pos.

    And tracing_buffers_read() updates file->f_pos, this fix
    make tracing_buffers_splice_read() updates file->f_pos too.

    Signed-off-by: Lai Jiangshan
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: Cleanup

    Sometimes, we open trace_pipe_raw, but we don't read(2) it,
    we just splice(2) it, thus, the page is not used.

    Signed-off-by: Lai Jiangshan
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: disable pread()

    We set tracing_buffers_fops.llseek to no_llseek,
    but we can still perform pread() to read this file.

    That is not expected.

    This fix uses nonseekable_open() to disable it.

    tracing_buffers_fops.llseek is still set to no_llseek,
    it mark this file is a "non-seekable device" and is used by
    sys_splice(). See also do_splice() or manual of splice(2):

    ERRORS
    EINVAL Target file system doesn't support splicing;
    neither of the descriptors refers to a pipe;
    or offset given for non-seekable device.

    Signed-off-by: Lai Jiangshan
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: consolidate documents
    blktrace: pass the right pointer to kfree()
    tracing/syscalls: use a dedicated file header
    tracing: append a comma to INIT_FTRACE_GRAPH

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: do not count frozen tasks toward load
    sched: refresh MAINTAINERS entry
    sched: Print sched_group::__cpu_power in sched_domain_debug
    cpuacct: add per-cgroup utime/stime statistics
    posixtimers, sched: Fix posix clock monotonicity
    sched_rt: don't allocate cpumask in fastpath
    cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when rcupreempt is used -v2

    Linus Torvalds