19 Oct, 2010

1 commit

  • Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

    Signed-off-by: Peter Zijlstra
    Acked-by: Kyle McMartin
    Acked-by: Martin Schwidefsky
    [ various fixes ]
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Aug, 2010

1 commit


07 Aug, 2010

3 commits

  • * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    Documentation: Add timers/timers-howto.txt
    timer: Added usleep_range timer
    Revert "timer: Added usleep[_range] timer"
    clockevents: Remove the per cpu tick skew
    posix_timer: Move copy_to_user(created_timer_id) down in timer_create()
    timer: Added usleep[_range] timer
    timers: Document meaning of deferrable timer

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
    sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
    sched: No need for bootmem special cases
    sched: Revert nohz_ratelimit() for now
    sched: Reduce update_group_power() calls
    sched: Update rq->clock for nohz balanced cpus
    sched: Fix spelling of sibling
    sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
    sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
    sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
    sched: thread_group_cputime: Simplify, document the "alive" check
    sched: Remove the obsolete exit_state/signal hacks
    sched: task_tick_rt: Remove the obsolete ->signal != NULL check
    sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
    sched: Fix comments to make them DocBook happy
    sched: Fix fix_small_capacity
    powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
    powerpc: Enable asymmetric SMT scheduling on POWER7
    sched: Add asymmetric group packing option for sibling domain
    sched: Fix capacity calculations for SMT4
    sched: Change nohz idle load balancing logic to push model
    ...

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
    tracing/kprobes: unregister_trace_probe needs to be called under mutex
    perf: expose event__process function
    perf events: Fix mmap offset determination
    perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
    perf, powerpc: Convert the FSL driver to use local64_t
    perf tools: Don't keep unreferenced maps when unmaps are detected
    perf session: Invalidate last_match when removing threads from rb_tree
    perf session: Free the ref_reloc_sym memory at the right place
    x86,mmiotrace: Add support for tracing STOS instruction
    perf, sched migration: Librarize task states and event headers helpers
    perf, sched migration: Librarize the GUI class
    perf, sched migration: Make the GUI class client agnostic
    perf, sched migration: Make it vertically scrollable
    perf, sched migration: Parameterize cpu height and spacing
    perf, sched migration: Fix key bindings
    perf, sched migration: Ignore unhandled task states
    perf, sched migration: Handle ignored migrate out events
    perf: New migration tool overview
    tracing: Drop cpparg() macro
    perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
    ...

    Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c

    Linus Torvalds
     

05 Aug, 2010

1 commit


04 Aug, 2010

2 commits

  • usleep_range is a finer precision implementations of msleep
    and is designed to be a drop-in replacement for udelay where
    a precise sleep / busy-wait is unnecessary.

    Since an easy interface to hrtimers could lead to an undesired
    proliferation of interrupts, we provide only a "range" API,
    forcing the caller to think about an acceptable tolerance on
    both ends and hopefully avoiding introducing another interrupt.

    INTRO

    As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
    precise enough for many drivers (yes, sleep precision is an unfair notion,
    but consistently sleeping for ~an order of magnitude greater than requested
    is worth fixing). This patch adds a usleep API so that udelay does not have
    to be used. Obviously not every udelay can be replaced (those in atomic
    contexts or being used for simple bitbanging come to mind), but there are
    many, many examples of

    mydriver_write(...)
    /* Wait for hardware to latch */
    udelay(100)

    in various drivers where a busy-wait loop is neither beneficial nor
    necessary, but msleep simply does not provide enough precision and people
    are using a busy-wait loop instead.

    CONCERNS FROM THE RFC

    Why is udelay a problem / necessary? Most callers of udelay are in device/
    driver initialization code, which is serial...

    As I see it, there is only benefit to sleeping over a delay; the
    notion of "refactoring" areas that use udelay was presented, but
    I see usleep as the refactoring. Consider i2c, if the bus is busy,
    you need to wait a bit (say 100us) before trying again, your
    current options are:

    * udelay(100)
    * msleep(1) = | COUNT
    1000 | 319
    500 | 414
    100 | 1146
    20 | 1832

    I am working on Android, so that is my focus for this. The following table
    is a modified usleep that simply printk's the amount of time requested to
    sleep; these tests were run on a kernel with udelay >= 20 --> usleep

    "boot" is power-on to lock screen
    "power collapse" is when the power button is pushed and the device suspends
    "resume" is when the power button is pushed and the lock screen is displayed
    (no touchscreen events or anything, just turning on the display)
    "use device" is from the unlock swipe to clicking around a bit; there is no
    sd card in this phone, so fail loading music, video, camera

    ACTION | TOTAL NUMBER OF USLEEP CALLS | NET TIME (us)
    boot | 22 | 1250
    power-collapse | 9 | 1200
    resume | 5 | 500
    use device | 59 | 7700

    The most interesting category to me is the "use device" field; 7700us of
    busy-wait time that could be put towards better responsiveness, or at the
    least less power usage.

    Signed-off-by: Patrick Pannuto
    Cc: apw@canonical.com
    Cc: corbet@lwn.net
    Cc: arjan@linux.intel.com
    Cc: Randy Dunlap
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Patrick Pannuto
     
  • This reverts commit 22b8f15c2f7130bb0386f548428df2ffd4e81903 to merge
    an advanced version.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

03 Aug, 2010

1 commit


23 Jul, 2010

2 commits

  • usleep[_range] are finer precision implementations of msleep
    and are designed to be drop-in replacements for udelay where
    a precise sleep / busy-wait is unnecessary. They also allow
    an easy interface to specify slack when a precise (ish)
    wakeup is unnecessary to help minimize wakeups

    Signed-off-by: Patrick Pannuto
    Cc: akinobu.mita@gmail.com
    Cc: sboyd@codeaurora.org
    Acked-by: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Patrick Pannuto
     
  • Steal some text from 6e453a67510 "Add support for deferrable timers". A
    reader shouldn't have to dig through the git logs for the basic
    description of a deferrable timer.

    Signed-off-by: J. Bruce Fields
    Cc: johnstul@us.ibm.com
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    J. Bruce Fields
     

09 Jun, 2010

1 commit

  • In the new push model, all idle CPUs indeed go into nohz mode. There is
    still the concept of idle load balancer (performing the load balancing
    on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz
    balancer when any of the nohz CPUs need idle load balancing.
    The kickee CPU does the idle load balancing on behalf of all idle CPUs
    instead of the normal idle balance.

    This addresses the below two problems with the current nohz ilb logic:
    * the idle load balancer continued to have periodic ticks during idle and
    wokeup frequently, even though it did not have any rebalancing to do on
    behalf of any of the idle CPUs.
    * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this
    periodic wakeup can result in a periodic additional interrupt on a CPU
    doing the timer broadcast.

    Also currently we are migrating the unpinned timers from an idle to the cpu
    doing idle load balancing (when all the cpus in the system are idle,
    there is no idle load balancing cpu and timers get added to the same idle cpu
    where the request was made. So the existing optimization works only on semi idle
    system).

    And In semi idle system, we no longer have periodic ticks on the idle load
    balancer CPU. Using that cpu will add more delays to the timers than intended
    (as that cpu's timer base may not be uptodate wrt jiffies etc). This was
    causing mysterious slowdowns during boot etc.

    For now, in the semi idle case, use the nearest busy cpu for migrating timers
    from an idle cpu. This is good for power-savings anyway.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     

05 Jun, 2010

1 commit

  • The commit 80b5184cc537718122e036afe7e62d202b70d077 ("kernel/: convert cpu
    notifier to return encapsulate errno value") changed the return value of
    cpu notifier callbacks.

    Those callbacks don't return NOTIFY_BAD on failures anymore. But there
    are a few callbacks which are called directly at init time and checking
    the return value.

    I forgot to change BUG_ON checking by the direct callers in the commit.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

29 May, 2010

1 commit


28 May, 2010

1 commit


26 May, 2010

2 commits

  • Fix nit-picking coding style detail.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • commit f00e047ef (timers: Fix slack calculation for expired timers)
    fixed the issue of slack on expired timers only partially. Linus
    noticed that jiffies is volatile so it is reloaded twice, which
    generates bad code.

    But its worse. This can defeat the time_after() check if jiffies are
    incremented between time_after() and the slack calculation.

    Fix it by reading jiffies into a local variable, which prevents the
    compiler from loading it twice. While at it make the > -1 check into
    >= 0 which is easier to read.

    Signed-off-by: Thomas Gleixner
    Cc: Arjan van de Ven
    Cc: Linus Torvalds

    Thomas Gleixner
     

24 May, 2010

1 commit

  • commit 3bbb9ec946 (timers: Introduce the concept of timer slack for
    legacy timers) does not take the case into account when the timer is
    already expired. This broke wireless drivers.

    The solution is not to apply slack to already expired timers.

    Signed-off-by: Thomas Gleixner
    Cc: Arjan van de Ven

    Jeff Chua
     

13 May, 2010

1 commit

  • Just some code cleanup to make touch_softlockup clearer and remove the
    softlockup_tick function as it is no longer needed.

    Also remove the /proc softlockup_thres call as it has been changed to
    watchdog_thres.

    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: Randy Dunlap
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Don Zickus
     

10 May, 2010

1 commit


07 Apr, 2010

1 commit

  • While HR timers have had the concept of timer slack for quite some time
    now, the legacy timers lacked this concept, and had to make do with
    round_jiffies() and friends.

    Timer slack is important for power management; grouping timers reduces the
    number of wakeups which in turn reduces power consumption.

    This patch introduces timer slack to the legacy timers using the following
    pieces:
    * A slack field in the timer struct
    * An api (set_timer_slack) that callers can use to set explicit timer slack
    * A default slack of 0.4% of the requested delay for callers that do not set
    any explicit slack
    * Rounding code that is part of mod_timer() that tries to
    group timers around jiffies values every 'power of two'
    (so quick timers will group around every 2, but longer timers
    will group around every 4, 8, 16, 32 etc)

    Signed-off-by: Arjan van de Ven
    Cc: johnstul@us.ibm.com
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

13 Mar, 2010

4 commits

  • If a timer callback leaks preempt_count we currently assert a
    BUG(). That makes it unnecessarily hard to retrieve information about
    the problem especially on laptops and headless stations.

    There is a decent chance to survive the preempt_count leak by
    restoring the preempt_count to the value before the callback. That
    allows in many cases to get valuable information about the root cause
    of the problem.

    We carried that fixup in preempt-rt for years and were able to decode
    such wreckage quite a few times.

    Signed-off-by: Thomas Gleixner
    Cc: Linux Torvalds
    Cc: Andrew Morton
    Cc: Arjan van de Veen

    Thomas Gleixner
     
  • The ident level is starting to be annoying. More white space than
    actual code. Split out the timer function call into its own function.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • A function scheduled with a timer must not exit with a different
    preempt count than it was entered. To make helping users running into
    the corresponding BUG() easier also print the name of the bad function
    not only its address.

    [ tglx: Sanitized printk ]

    Signed-off-by: Uwe Kleine-König
    Cc: johnstul@us.ibm.com
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Uwe Kleine-König
     
  • These functions forgot to run timer_stats_timer_clear_start_info(). It's
    unobvious what effect this has and whether it matters much - we won't be
    printing it out anyway if the timer's detached.

    Untested, just an Ingo trollpatch.

    [ Nevertheless correct - tglx ]

    Signed-off-by: Andrew Morton
    Cc: Ingo Molnar
    Cc: johnstul@us.ibm.com
    Signed-off-by: Thomas Gleixner

    Andrew Morton
     

21 Jan, 2010

1 commit

  • Paul questioned the context in which we should call
    perf_event_do_pending(). After looking at that I found that it should be
    called from IRQ context these days, however the fallback call-site is
    placed in softirq context. Ammend this by placing the callback in the IRQ
    timer path.

    Reported-by: Paul Mackerras
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Dec, 2009

1 commit


24 Sep, 2009

1 commit


21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 Sep, 2009

1 commit

  • * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
    time: Prevent 32 bit overflow with set_normalized_timespec()
    clocksource: Delay clocksource down rating to late boot
    clocksource: clocksource_select must be called with mutex locked
    clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
    timers: Drop a function prototype
    clocksource: Resolve cpu hotplug dead lock with TSC unstable
    timer.c: Fix S/390 comments
    timekeeping: Fix invalid getboottime() value
    timekeeping: Fix up read_persistent_clock() breakage on sh
    timekeeping: Increase granularity of read_persistent_clock(), build fix
    time: Introduce CLOCK_REALTIME_COARSE
    x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
    clocksource: Avoid clocksource watchdog circular locking dependency
    clocksource: Protect the watchdog rating changes with clocksource_mutex
    clocksource: Call clocksource_change_rating() outside of watchdog_lock
    timekeeping: Introduce read_boot_clock
    timekeeping: Increase granularity of read_persistent_clock()
    timekeeping: Update clocksource with stop_machine
    timekeeping: Add timekeeper read_clock helper functions
    timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
    ...

    Fix trivial conflict due to MIPS lemote -> loongson renaming.

    Linus Torvalds
     

29 Aug, 2009

1 commit

  • Add tracepoints which cover the timer life cycle. The tracepoints are
    integrated with the already existing debug_object debug points as far
    as possible.

    Based on patches from
    Mathieu: http://marc.info/?l=linux-kernel&m=123791201816247&w=2
    and
    Anton: http://marc.info/?l=linux-kernel&m=124331396919301&w=2

    [ tglx: Fixed timeout value in timer_start tracepoint, massaged
    comments and made the printk's more readable ]

    Signed-off-by: Xiao Guangrong
    Cc: Anton Blanchard
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: KOSAKI Motohiro
    Cc: Zhaolei
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Xiao Guangrong
     

26 Aug, 2009

1 commit


23 Aug, 2009

1 commit

  • All calls from outside RCU are of the form:

    if (rcu_pending(cpu))
    rcu_check_callbacks(cpu, user);

    This is silly, instead we put a call to rcu_pending() in
    rcu_check_callbacks(), and then make the outside calls be to
    rcu_check_callbacks(). This cuts down on the code a bit and
    also gives the compiler a better chance of optimizing.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

14 Aug, 2009

1 commit


05 Aug, 2009

1 commit

  • Each time a cpu goes to sleep on a NOHZ=y system the timer
    wheel is searched for the next timer interrupt. It can take
    quite a few cycles to find the next pending timer.

    This patch adds a field to tvec_base that caches the result of
    __next_timer_interrupt.

    The hit ratio is around 80% on my thinkpad under normal use, on
    a server I've seen hit ratios from 5% to 95% dependent on the
    workload.

    -v2: jiffies wrap fixes

    Signed-off-by: Martin Schwidefsky
    Acked-by: Thomas Gleixner
    Cc: john stultz
    Cc: Venki Pallipadi
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Martin Schwidefsky
     

19 Jul, 2009

1 commit


24 Jun, 2009

1 commit

  • When the kernel is configured with CONFIG_TIMER_STATS but timer
    stats are runtime disabled we still get calls to
    __timer_stats_timer_set_start_info which initializes some
    fields in the corresponding struct timer_list.

    So add some quick checks in the the timer stats setup functions
    to avoid function calls to __timer_stats_timer_set_start_info
    when timer stats are disabled.

    In an artificial workload that does nothing but playing ping
    pong with a single tcp packet via loopback this decreases cpu
    consumption by 1 - 1.5%.

    This is part of a modified function trace output on SLES11:

    perl-2497 [00] 28630647177732388 [+ 125]: sk_reset_timer
    Cc: Andrew Morton
    Cc: Martin Schwidefsky
    Cc: Mustafa Mesanovic
    Cc: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

16 Jun, 2009

1 commit

  • …kernel/git/tip/linux-2.6-tip

    * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timers: Logic to move non pinned timers
    timers: /proc/sys sysctl hook to enable timer migration
    timers: Identifying the existing pinned timers
    timers: Framework for identifying pinned timers
    timers: allow deferrable timers for intervals tv2-tv5 to be deferred

    Fix up conflicts in kernel/sched.c and kernel/timer.c manually

    Linus Torvalds
     

12 Jun, 2009

1 commit