19 Aug, 2012

1 commit

  • New helper: current_thread_info(). Allows to do a bunch of odd syscalls
    in C. While we are at it, there had never been a reason to do
    osf_getpriority() in assembler. We also get "namespace"-aware (read:
    consistent with getuid(2), etc.) behaviour from getx?id() syscalls now.

    Signed-off-by: Al Viro
    Signed-off-by: Michael Cree
    Acked-by: Matt Turner
    Signed-off-by: Linus Torvalds

    Al Viro
     

06 Jun, 2012

4 commits

  • Gilad reported at

    http://lkml.kernel.org/r/1336056962-10465-2-git-send-email-gilad@benyossef.com

    "Current timer code fails to correctly return a value meaning that
    there is no future timer event, with the result that the timer keeps
    getting re-armed in HZ one shot mode even when we could turn it off,
    generating unneeded interrupts.

    What is happening is that when __next_timer_interrupt() wishes
    to return a value that signifies "there is no future timer
    event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).

    However, the code in tick_nohz_stop_sched_tick(), which called
    __next_timer_interrupt() via get_next_timer_interrupt(),
    compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
    to see if the timer needs to be re-armed.

    base->timer_jiffies != last_jiffies and so tick_nohz_stop_sched_tick()
    interperts the return value as indication that there is a distant
    future event 12 days from now and programs the timer to fire next
    after KTIME_MAX nsecs instead of avoiding to arm it. This ends up
    causing a needless interrupt once every KTIME_MAX nsecs."

    Fix this by using the new active timer accounting. This avoids scans
    when no active timer is enqueued completely, so we don't have to rely
    on base->timer_next and base->timer_jiffies anymore.

    Reported-by: Gilad Ben-Yossef
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20120525214819.317535385@linutronix.de

    Thomas Gleixner
     
  • The code in get_next_timer_interrupt() is suboptimal as it has to run
    through the cascade to find the next expiring timer. On a completely
    idle core we should only do that when there is an active timer
    enqueued and base->next_timer does not give us a fast answer.

    Add accounting of the active timers to the now consolidated
    attach/detach code. I deliberately avoided sanity checks because the
    code is fully symetric and any fiddling with timers w/o using the API
    functions will lead to cute explosions anyway. ulong is big enough
    even on 32bit and if we really run into the situation to have more
    than 1<
    Cc: Peter Zijlstra
    Cc: Gilad Ben-Yossef
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20120525214819.236377028@linutronix.de

    Thomas Gleixner
     
  • Another bunch of mindlessly copied code. All callers of
    internal_add_timer() except the recascading code updates
    base->next_timer.

    Move this into internal_add_timer() and let the cascading code call
    __internal_add_timer().

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Gilad Ben-Yossef
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20120525214819.189946224@linutronix.de

    Thomas Gleixner
     
  • Most callers of detach_timer() have the same pattern around
    them. Check whether the timer is pending and eventually updating
    base->next_timer.

    Create detach_if_pending() and replace the duplicated code.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Gilad Ben-Yossef
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20120525214819.131246037@linutronix.de

    Thomas Gleixner
     

24 May, 2012

1 commit

  • Pull user namespace enhancements from Eric Biederman:
    "This is a course correction for the user namespace, so that we can
    reach an inexpensive, maintainable, and reasonably complete
    implementation.

    Highlights:
    - Config guards make it impossible to enable the user namespace and
    code that has not been converted to be user namespace safe.

    - Use of the new kuid_t type ensures the if you somehow get past the
    config guards the kernel will encounter type errors if you enable
    user namespaces and attempt to compile in code whose permission
    checks have not been updated to be user namespace safe.

    - All uids from child user namespaces are mapped into the initial
    user namespace before they are processed. Removing the need to add
    an additional check to see if the user namespace of the compared
    uids remains the same.

    - With the user namespaces compiled out the performance is as good or
    better than it is today.

    - For most operations absolutely nothing changes performance or
    operationally with the user namespace enabled.

    - The worst case performance I could come up with was timing 1
    billion cache cold stat operations with the user namespace code
    enabled. This went from 156s to 164s on my laptop (or 156ns to
    164ns per stat operation).

    - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
    Most uid/gid setting system calls treat these value specially
    anyway so attempting to use -1 as a uid would likely cause
    entertaining failures in userspace.

    - If setuid is called with a uid that can not be mapped setuid fails.
    I have looked at sendmail, login, ssh and every other program I
    could think of that would call setuid and they all check for and
    handle the case where setuid fails.

    - If stat or a similar system call is called from a context in which
    we can not map a uid we lie and return overflowuid. The LFS
    experience suggests not lying and returning an error code might be
    better, but the historical precedent with uids is different and I
    can not think of anything that would break by lying about a uid we
    can't map.

    - Capabilities are localized to the current user namespace making it
    safe to give the initial user in a user namespace all capabilities.

    My git tree covers all of the modifications needed to convert the core
    kernel and enough changes to make a system bootable to runlevel 1."

    Fix up trivial conflicts due to nearby independent changes in fs/stat.c

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
    userns: Silence silly gcc warning.
    cred: use correct cred accessor with regards to rcu read lock
    userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
    userns: Convert cgroup permission checks to use uid_eq
    userns: Convert tmpfs to use kuid and kgid where appropriate
    userns: Convert sysfs to use kgid/kuid where appropriate
    userns: Convert sysctl permission checks to use kuid and kgids.
    userns: Convert proc to use kuid/kgid where appropriate
    userns: Convert ext4 to user kuid/kgid where appropriate
    userns: Convert ext3 to use kuid/kgid where appropriate
    userns: Convert ext2 to use kuid/kgid where appropriate.
    userns: Convert devpts to use kuid/kgid where appropriate
    userns: Convert binary formats to use kuid/kgid where appropriate
    userns: Add negative depends on entries to avoid building code that is userns unsafe
    userns: signal remove unnecessary map_cred_ns
    userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
    userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
    userns: Convert stat to return values mapped from kuids and kgids
    userns: Convert user specfied uids and gids in chown into kuids and kgid
    userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
    ...

    Linus Torvalds
     

23 May, 2012

1 commit

  • Pull workqueue changes from Tejun Heo:
    "Nothing exciting. Most are updates to debug stuff and related fixes.
    Two not-too-critical bugs are fixed - WARN_ON() triggering spurious
    during cpu offlining and unlikely lockdep related oops."

    * 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    lockdep: fix oops in processing workqueue
    workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is active
    workqueue: Catch more locking problems with flush_work()
    workqueue: change BUG_ON() to WARN_ON()
    trace: Remove unused workqueue tracer

    Linus Torvalds
     

15 May, 2012

1 commit

  • Under memory load, on x86_64, with lockdep enabled, the workqueue's
    process_one_work() has been seen to oops in __lock_acquire(), barfing
    on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].

    Because it's permissible to free a work_struct from its callout function,
    the map used is an onstack copy of the map given in the work_struct: and
    that copy is made without any locking.

    Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
    "rep movsq" for that structure copy: which might race with a workqueue
    user's wait_on_work() doing lock_map_acquire() on the source of the
    copy, putting a pointer into the class_cache[], but only in time for
    the top half of that pointer to be copied to the destination map.

    Boom when process_one_work() subsequently does lock_map_acquire()
    on its onstack copy of the lockdep_map.

    Fix this, and a similar instance in call_timer_fn(), with a
    lockdep_copy_map() function which additionally NULLs the class_cache[].

    Note: this oops was actually seen on 3.4-next, where flush_work() newly
    does the racing lock_map_acquire(); but Tejun points out that 3.4 and
    earlier are already vulnerable to the same through wait_on_work().

    * Patch orginally from Peter. Hugh modified it a bit and wrote the
    description.

    Signed-off-by: Peter Zijlstra
    Reported-by: Hugh Dickins
    LKML-Reference:
    Signed-off-by: Tejun Heo

    Peter Zijlstra
     

03 May, 2012

1 commit


27 Apr, 2012

1 commit

  • The mod_timer_pinned() header comment states that it prevents timers
    from being migrated to a different CPU. This is not the case, instead,
    it ensures that the timer is posted to the current CPU, but does nothing
    to prevent CPU-hotplug operations from migrating the timer.

    This commit therefore brings the comment header into alignment with
    reality.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt

    Paul E. McKenney
     

06 Jan, 2012

1 commit


09 Dec, 2011

1 commit


24 Nov, 2011

2 commits

  • del_timer_sync() calls debug_object_assert_init() to assert that
    a timer has been initialized before calling lock_timer_base().
    lock_timer_base() would spin forever on a NULL(uninit-ed) base.
    The check is added to del_timer() to prevent silent failure, even
    though it would not get stuck in an infinite loop.

    [ sboyd@codeaurora.org: Remove WARN, intialize timer function]

    Signed-off-by: Christine Chan
    Signed-off-by: Stephen Boyd
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/1320724108-20788-4-git-send-email-sboyd@codeaurora.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Christine Chan
     
  • Remove the WARN_ON() in timer_fixup_activate() as we now get the
    debugobjects printout in the debugobjects activate check.

    We also assign a dummy timer callback so that if the timer is
    actually set to fire we don't oops.

    [ tglx@linutronix.de: Split out the debugobjects vs. the timer change ]

    Signed-off-by: Stephen Boyd
    Cc: Christine Chan
    Cc: John Stultz
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/1320724108-20788-2-git-send-email-sboyd@codeaurora.org
    Signed-off-by: Thomas Gleixner

    Stephen Boyd
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

03 Jun, 2011

1 commit

  • There is an optimization which does not update the timer if the timer
    was pending and the expiration time was unchanged.

    Since commit 3bbb9ec9 ("timers: Introduce the concept of timer slack
    for legacy timers") this optimization is no longer applied for timers
    where the expiration time got extended due to the slack value. So we
    need to check again after the expiration time might have been updated.

    [ tglx: Made it a single check by applying slack first and sorting
    out the slack = 0 value (all timeouts < 256 jiffies) early ]

    Signed-off-by: Sebastian Andrzej Siewior
    Link: http://lkml.kernel.org/r/20110521105828.GA29442@Chamillionaire.breakpoint.cc
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

16 Mar, 2011

2 commits

  • …l/git/tip/linux-2.6-tip

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
    posix-clocks: Check write permissions in posix syscalls
    hrtimer: Remove empty hrtimer_init_hres_timer()
    hrtimer: Update hrtimer->state documentation
    hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
    timers: Export CLOCK_BOOTTIME via the posix timers interface
    timers: Add CLOCK_BOOTTIME hrtimer base
    time: Extend get_xtime_and_monotonic_offset() to also return sleep
    time: Introduce get_monotonic_boottime and ktime_get_boottime
    hrtimers: extend hrtimer base code to handle more then 2 clockids
    ntp: Remove redundant and incorrect parameter check
    mn10300: Switch do_timer() to xtimer_update()
    posix clocks: Introduce dynamic clocks
    posix-timers: Cleanup namespace
    posix-timers: Add support for fd based clocks
    x86: Add clock_adjtime for x86
    posix-timers: Introduce a syscall for clock tuning.
    time: Splitout compat timex accessors
    ntp: Add ADJ_SETOFFSET mode bit
    time: Introduce timekeeping_inject_offset
    posix-timer: Update comment
    ...

    Fix up new system-call-related conflicts in
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/syscall_table_32.S
    (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
    due to movement of get_jiffies_64() in:
    kernel/time.c

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rtmutex: tester: Remove the remaining BKL leftovers
    lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause
    rtmutex: Simplify PI algorithm and make highest prio task get lock
    rwsem: Remove redundant asmregparm annotation
    rwsem: Move duplicate function prototypes to linux/rwsem.h
    rwsem: Unify the duplicate rwsem_is_locked() inlines
    rwsem: Move duplicate init macros and functions to linux/rwsem.h
    rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h
    x86: Cleanup rwsem_count_t typedef
    rwsem: Cleanup includes
    locking: Remove deprecated lock initializers
    cred: Replace deprecated spinlock initialization
    kthread: Replace deprecated spinlock initialization
    xtensa: Replace deprecated spinlock initialization
    um: Replace deprecated spinlock initialization
    sparc: Replace deprecated spinlock initialization
    mips: Replace deprecated spinlock initialization
    cris: Replace deprecated spinlock initialization
    alpha: Replace deprecated spinlock initialization
    rtmutex-tester: Remove BKL tests

    Linus Torvalds
     

08 Mar, 2011

1 commit

  • In complex subsystems like mac80211 structures can contain several
    timers and work structs, so identifying a specific instance from the
    call trace and object type output of debugobjects can be hard.

    Allow the subsystems which support debugobjects to provide a hint
    function. This function returns a pointer to a kernel address
    (preferrably the objects callback function) which is printed along
    with the debugobjects type.

    Add hint methods for timer_list, work_struct and hrtimer.

    [ tglx: Massaged changelog, made it compile ]

    Signed-off-by: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     

16 Feb, 2011

1 commit

  • Twice I had to explain the output about why lockdep gives an error with
    locks in IRQ context and with del_timer_sync(). Might as well write it
    up and place it in the comments above the code in del_timer_sync().
    Perhaps the next time this lockdep dump triggers people will understand
    the issues.

    It is a ticky issue and very subtle, explaining it in detail in the code
    may help others understand the issue when they stumble upon the bug
    again.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

08 Feb, 2011

1 commit

  • Both attempts at trying to allow softirq usage for
    del_timer_sync() failed (produced bogus warnings),
    so revert the commit for this release:

    f266a5110d45: lockdep, timer: Fix del_timer_sync() annotation

    and try again later.

    Reported-by: Borislav Petkov
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Yong Zhang
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

04 Feb, 2011

1 commit

  • Calling local_bh_enable() will want to actually start processing
    softirqs, which isn't a good idea since this can get called with IRQs
    disabled.

    Cure this by using _local_bh_enable() which doesn't start processing
    softirqs, and use raw_local_irq_save() to avoid any softirqs from
    happening without letting lockdep think IRQs are in fact disabled.

    Reported-by: Nick Bowler
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Yong Zhang
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

31 Jan, 2011

1 commit

  • do_timer() is primary timekeeping related. calc_global_load() is
    called from do_timer() as well, but that's more for historical
    reasons.

    [ tglx: Fixed up the calc_global_load() reject andmassaged changelog ]

    Signed-off-by: Torben Hohn
    Cc: Peter Zijlstra
    Cc: johnstul@us.ibm.com
    Cc: yong.zhang0@gmail.com
    Cc: hch@infradead.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Torben Hohn
     

07 Jan, 2011

1 commit

  • * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    MAINTAINERS: Update timer related entries
    timers: Use this_cpu_read
    timerqueue: Make timerqueue_getnext() static inline
    hrtimer: fix timerqueue conversion flub
    hrtimers: Convert hrtimers to use timerlist infrastructure
    timers: Fixup allmodconfig build issue
    timers: Rename timerlist infrastructure to timerqueue
    timers: Introduce timerlist infrastructure.
    hrtimer: Remove stale comment on curr_timer
    timer: Warn when del_timer_sync() is called in hardirq context
    timer: Del_timer_sync() can be used in softirq context
    timer: Make try_to_del_timer_sync() the same on SMP and UP
    posix-timers: Annotate lock_timer()
    timer: Permit statically-declared work with deferrable timers
    time: Use ARRAY_SIZE macro in timecompare.c
    timer: Initialize the field slack of timer_list
    timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS
    time: Compensate for rounding on odd-frequency clocksources

    Fix up trivial conflict in MAINTAINERS

    Linus Torvalds
     

13 Dec, 2010

1 commit

  • Eric asked for this.

    [tglx: Because it generates faster code according to Erics ]

    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Eric Dumazet
    Cc: Mathieu Desnoyers
    Cc: Tejun Heo
    Cc: linux-mm@kvack.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Christoph Lameter
     

09 Dec, 2010

2 commits

  • This fixes a bug as seen on 2.6.32 based kernels where timers got
    enqueued on offline cpus.

    If a cpu goes offline it might still have pending timers. These will
    be migrated during CPU_DEAD handling after the cpu is offline.
    However while the cpu is going offline it will schedule the idle task
    which will then call tick_nohz_stop_sched_tick().

    That function in turn will call get_next_timer_intterupt() to figure
    out if the tick of the cpu can be stopped or not. If it turns out that
    the next tick is just one jiffy off (delta_jiffies == 1)
    tick_nohz_stop_sched_tick() incorrectly assumes that the tick should
    not stop and takes an early exit and thus it won't update the load
    balancer cpu.

    Just afterwards the cpu will be killed and the load balancer cpu could
    be the offline cpu.

    On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide
    on which cpu a timer should be enqueued (see __mod_timer()). Which
    leads to the possibility that timers get enqueued on an offline cpu.
    These will never expire and can cause a system hang.

    This has been observed 2.6.32 kernels. On current kernels
    __mod_timer() uses get_nohz_timer_target() which doesn't have that
    problem. However there might be other problems because of the too
    early exit tick_nohz_stop_sched_tick() in case a cpu goes offline.

    The easiest and probably safest fix seems to be to let
    get_next_timer_interrupt() just lie and let it say there isn't any
    pending timer if the current cpu is offline.

    I also thought of moving migrate_[hr]timers() from CPU_DEAD to
    CPU_DYING, but seeing that there already have been fixes at least in
    the hrtimer code in this area I'm afraid that this could add new
    subtle bugs.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Cc: stable@kernel.org
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     
  • There's a long-running regression that proved difficult to fix and
    which is hitting certain people and is rather annoying in its effects.

    Damien reported that after 74f5187ac8 (sched: Cure load average vs
    NO_HZ woes) his load average is unnaturally high, he also noted that
    even with that patch reverted the load avgerage numbers are not
    correct.

    The problem is that the previous patch only solved half the NO_HZ
    problem, it addressed the part of going into NO_HZ mode, not of
    comming out of NO_HZ mode. This patch implements that missing half.

    When comming out of NO_HZ mode there are two important things to take
    care of:

    - Folding the pending idle delta into the global active count.
    - Correctly aging the averages for the idle-duration.

    So with this patch the NO_HZ interaction should be complete and
    behaviour between CONFIG_NO_HZ=[yn] should be equivalent.

    Furthermore, this patch slightly changes the load average computation
    by adding a rounding term to the fixed point multiplication.

    Reported-by: Damien Wyart
    Reported-by: Tim McGrath
    Tested-by: Damien Wyart
    Tested-by: Orion Poplawski
    Tested-by: Kyle McMartin
    Signed-off-by: Peter Zijlstra
    Cc: stable@kernel.org
    Cc: Chase Douglas
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

22 Oct, 2010

3 commits

  • Add explict warning when del_timer_sync() is called in hardirq
    context.

    Signed-off-by: Yong Zhang
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Acked-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     
  • Actually we have used del_timer_sync() in softirq context for a long time,
    e.g. in __dst_free()::cancel_delayed_work().

    So change the comments of it to warn on hardirq context only, and make
    lockdep know about this change.

    Signed-off-by: Yong Zhang
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     
  • On UP try_to_del_timer_sync() is mapped to del_timer() which does not
    take the running timer callback into account, so it has different
    semantics.

    Remove the SMP dependency of try_to_del_timer_sync() by using
    base->running_timer in the UP case as well.

    [ tglx: Removed set_running_timer() inline and tweaked the changelog ]

    Signed-off-by: Yong Zhang
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Acked-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     

21 Oct, 2010

1 commit

  • Currently, you have to just define a delayed_work uninitialised, and then
    initialise it before first use. That's a tad clumsy. At risk of playing
    mind-games with the compiler, fooling it into doing pointer arithmetic
    with compile-time-constants, this lets clients properly initialise delayed
    work with deferrable timers statically.

    This patch was inspired by the issues which lead Artem Bityutskiy to
    commit 8eab945c5616fc984 ("sunrpc: make the cache cleaner workqueue
    deferrable").

    Signed-off-by: Phil Carmody
    Acked-by: Artem Bityutskiy
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Phil Carmody
     

19 Oct, 2010

1 commit

  • Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

    Signed-off-by: Peter Zijlstra
    Acked-by: Kyle McMartin
    Acked-by: Martin Schwidefsky
    [ various fixes ]
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Aug, 2010

1 commit


07 Aug, 2010

3 commits

  • * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    Documentation: Add timers/timers-howto.txt
    timer: Added usleep_range timer
    Revert "timer: Added usleep[_range] timer"
    clockevents: Remove the per cpu tick skew
    posix_timer: Move copy_to_user(created_timer_id) down in timer_create()
    timer: Added usleep[_range] timer
    timers: Document meaning of deferrable timer

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
    sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
    sched: No need for bootmem special cases
    sched: Revert nohz_ratelimit() for now
    sched: Reduce update_group_power() calls
    sched: Update rq->clock for nohz balanced cpus
    sched: Fix spelling of sibling
    sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
    sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
    sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
    sched: thread_group_cputime: Simplify, document the "alive" check
    sched: Remove the obsolete exit_state/signal hacks
    sched: task_tick_rt: Remove the obsolete ->signal != NULL check
    sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
    sched: Fix comments to make them DocBook happy
    sched: Fix fix_small_capacity
    powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
    powerpc: Enable asymmetric SMT scheduling on POWER7
    sched: Add asymmetric group packing option for sibling domain
    sched: Fix capacity calculations for SMT4
    sched: Change nohz idle load balancing logic to push model
    ...

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
    tracing/kprobes: unregister_trace_probe needs to be called under mutex
    perf: expose event__process function
    perf events: Fix mmap offset determination
    perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
    perf, powerpc: Convert the FSL driver to use local64_t
    perf tools: Don't keep unreferenced maps when unmaps are detected
    perf session: Invalidate last_match when removing threads from rb_tree
    perf session: Free the ref_reloc_sym memory at the right place
    x86,mmiotrace: Add support for tracing STOS instruction
    perf, sched migration: Librarize task states and event headers helpers
    perf, sched migration: Librarize the GUI class
    perf, sched migration: Make the GUI class client agnostic
    perf, sched migration: Make it vertically scrollable
    perf, sched migration: Parameterize cpu height and spacing
    perf, sched migration: Fix key bindings
    perf, sched migration: Ignore unhandled task states
    perf, sched migration: Handle ignored migrate out events
    perf: New migration tool overview
    tracing: Drop cpparg() macro
    perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
    ...

    Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c

    Linus Torvalds
     

05 Aug, 2010

1 commit


04 Aug, 2010

2 commits

  • usleep_range is a finer precision implementations of msleep
    and is designed to be a drop-in replacement for udelay where
    a precise sleep / busy-wait is unnecessary.

    Since an easy interface to hrtimers could lead to an undesired
    proliferation of interrupts, we provide only a "range" API,
    forcing the caller to think about an acceptable tolerance on
    both ends and hopefully avoiding introducing another interrupt.

    INTRO

    As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
    precise enough for many drivers (yes, sleep precision is an unfair notion,
    but consistently sleeping for ~an order of magnitude greater than requested
    is worth fixing). This patch adds a usleep API so that udelay does not have
    to be used. Obviously not every udelay can be replaced (those in atomic
    contexts or being used for simple bitbanging come to mind), but there are
    many, many examples of

    mydriver_write(...)
    /* Wait for hardware to latch */
    udelay(100)

    in various drivers where a busy-wait loop is neither beneficial nor
    necessary, but msleep simply does not provide enough precision and people
    are using a busy-wait loop instead.

    CONCERNS FROM THE RFC

    Why is udelay a problem / necessary? Most callers of udelay are in device/
    driver initialization code, which is serial...

    As I see it, there is only benefit to sleeping over a delay; the
    notion of "refactoring" areas that use udelay was presented, but
    I see usleep as the refactoring. Consider i2c, if the bus is busy,
    you need to wait a bit (say 100us) before trying again, your
    current options are:

    * udelay(100)
    * msleep(1) = | COUNT
    1000 | 319
    500 | 414
    100 | 1146
    20 | 1832

    I am working on Android, so that is my focus for this. The following table
    is a modified usleep that simply printk's the amount of time requested to
    sleep; these tests were run on a kernel with udelay >= 20 --> usleep

    "boot" is power-on to lock screen
    "power collapse" is when the power button is pushed and the device suspends
    "resume" is when the power button is pushed and the lock screen is displayed
    (no touchscreen events or anything, just turning on the display)
    "use device" is from the unlock swipe to clicking around a bit; there is no
    sd card in this phone, so fail loading music, video, camera

    ACTION | TOTAL NUMBER OF USLEEP CALLS | NET TIME (us)
    boot | 22 | 1250
    power-collapse | 9 | 1200
    resume | 5 | 500
    use device | 59 | 7700

    The most interesting category to me is the "use device" field; 7700us of
    busy-wait time that could be put towards better responsiveness, or at the
    least less power usage.

    Signed-off-by: Patrick Pannuto
    Cc: apw@canonical.com
    Cc: corbet@lwn.net
    Cc: arjan@linux.intel.com
    Cc: Randy Dunlap
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Patrick Pannuto
     
  • This reverts commit 22b8f15c2f7130bb0386f548428df2ffd4e81903 to merge
    an advanced version.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

03 Aug, 2010

1 commit