21 Jan, 2010

1 commit

  • Paul questioned the context in which we should call
    perf_event_do_pending(). After looking at that I found that it should be
    called from IRQ context these days, however the fallback call-site is
    placed in softirq context. Ammend this by placing the callback in the IRQ
    timer path.

    Reported-by: Paul Mackerras
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Dec, 2009

1 commit


24 Sep, 2009

1 commit


21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 Sep, 2009

1 commit

  • * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
    time: Prevent 32 bit overflow with set_normalized_timespec()
    clocksource: Delay clocksource down rating to late boot
    clocksource: clocksource_select must be called with mutex locked
    clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
    timers: Drop a function prototype
    clocksource: Resolve cpu hotplug dead lock with TSC unstable
    timer.c: Fix S/390 comments
    timekeeping: Fix invalid getboottime() value
    timekeeping: Fix up read_persistent_clock() breakage on sh
    timekeeping: Increase granularity of read_persistent_clock(), build fix
    time: Introduce CLOCK_REALTIME_COARSE
    x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
    clocksource: Avoid clocksource watchdog circular locking dependency
    clocksource: Protect the watchdog rating changes with clocksource_mutex
    clocksource: Call clocksource_change_rating() outside of watchdog_lock
    timekeeping: Introduce read_boot_clock
    timekeeping: Increase granularity of read_persistent_clock()
    timekeeping: Update clocksource with stop_machine
    timekeeping: Add timekeeper read_clock helper functions
    timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
    ...

    Fix trivial conflict due to MIPS lemote -> loongson renaming.

    Linus Torvalds
     

29 Aug, 2009

1 commit

  • Add tracepoints which cover the timer life cycle. The tracepoints are
    integrated with the already existing debug_object debug points as far
    as possible.

    Based on patches from
    Mathieu: http://marc.info/?l=linux-kernel&m=123791201816247&w=2
    and
    Anton: http://marc.info/?l=linux-kernel&m=124331396919301&w=2

    [ tglx: Fixed timeout value in timer_start tracepoint, massaged
    comments and made the printk's more readable ]

    Signed-off-by: Xiao Guangrong
    Cc: Anton Blanchard
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: KOSAKI Motohiro
    Cc: Zhaolei
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Xiao Guangrong
     

26 Aug, 2009

1 commit


23 Aug, 2009

1 commit

  • All calls from outside RCU are of the form:

    if (rcu_pending(cpu))
    rcu_check_callbacks(cpu, user);

    This is silly, instead we put a call to rcu_pending() in
    rcu_check_callbacks(), and then make the outside calls be to
    rcu_check_callbacks(). This cuts down on the code a bit and
    also gives the compiler a better chance of optimizing.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

14 Aug, 2009

1 commit


05 Aug, 2009

1 commit

  • Each time a cpu goes to sleep on a NOHZ=y system the timer
    wheel is searched for the next timer interrupt. It can take
    quite a few cycles to find the next pending timer.

    This patch adds a field to tvec_base that caches the result of
    __next_timer_interrupt.

    The hit ratio is around 80% on my thinkpad under normal use, on
    a server I've seen hit ratios from 5% to 95% dependent on the
    workload.

    -v2: jiffies wrap fixes

    Signed-off-by: Martin Schwidefsky
    Acked-by: Thomas Gleixner
    Cc: john stultz
    Cc: Venki Pallipadi
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Martin Schwidefsky
     

19 Jul, 2009

1 commit


24 Jun, 2009

1 commit

  • When the kernel is configured with CONFIG_TIMER_STATS but timer
    stats are runtime disabled we still get calls to
    __timer_stats_timer_set_start_info which initializes some
    fields in the corresponding struct timer_list.

    So add some quick checks in the the timer stats setup functions
    to avoid function calls to __timer_stats_timer_set_start_info
    when timer stats are disabled.

    In an artificial workload that does nothing but playing ping
    pong with a single tcp packet via loopback this decreases cpu
    consumption by 1 - 1.5%.

    This is part of a modified function trace output on SLES11:

    perl-2497 [00] 28630647177732388 [+ 125]: sk_reset_timer
    Cc: Andrew Morton
    Cc: Martin Schwidefsky
    Cc: Mustafa Mesanovic
    Cc: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

16 Jun, 2009

1 commit

  • …kernel/git/tip/linux-2.6-tip

    * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timers: Logic to move non pinned timers
    timers: /proc/sys sysctl hook to enable timer migration
    timers: Identifying the existing pinned timers
    timers: Framework for identifying pinned timers
    timers: allow deferrable timers for intervals tv2-tv5 to be deferred

    Fix up conflicts in kernel/sched.c and kernel/timer.c manually

    Linus Torvalds
     

12 Jun, 2009

1 commit


11 Jun, 2009

1 commit


29 May, 2009

1 commit


15 May, 2009

2 commits

  • avenrun is an rough estimate so we don't have to worry about
    consistency of the three avenrun values. Remove the xtime lock
    dependency and provide a function to scale the values. Cleanup the
    users.

    [ Impact: cleanup ]

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra

    Thomas Gleixner
     
  • Dimitri Sivanich noticed that xtime_lock is held write locked across
    calc_load() which iterates over all online CPUs. That can cause long
    latencies for xtime_lock readers on large SMP systems.

    The load average calculation is an rough estimate anyway so there is
    no real need to protect the readers vs. the update. It's not a problem
    when the avenrun array is updated while a reader copies the values.

    Instead of iterating over all online CPUs let the scheduler_tick code
    update the number of active tasks shortly before the avenrun update
    happens. The avenrun update itself is handled by the CPU which calls
    do_timer().

    [ Impact: reduce xtime_lock write locked section ]

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra

    Thomas Gleixner
     

13 May, 2009

2 commits

  • * Arun R Bharadwaj [2009-04-16 12:11:36]:

    This patch migrates all non pinned timers and hrtimers to the current
    idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
    are not migrated.

    While migrating hrtimers, care should be taken to check if migrating
    a hrtimer would result in a latency or not. So we compare the expiry of the
    hrtimer with the next timer interrupt on the target cpu and migrate the
    hrtimer only if it expires *after* the next interrupt on the target cpu.
    So, added a clockevents_get_next_event() helper function to return the
    next_event on the target cpu's clock_event_device.

    [ tglx: cleanups and simplifications ]

    Signed-off-by: Arun R Bharadwaj
    Signed-off-by: Thomas Gleixner

    Arun R Bharadwaj
     
  • * Arun R Bharadwaj [2009-04-16 12:11:36]:

    This patch creates a new framework for identifying cpu-pinned timers
    and hrtimers.

    This framework is needed because pinned timers are expected to fire on
    the same CPU on which they are queued. So it is essential to identify
    these and not migrate them, in case there are any.

    For regular timers, the currently existing add_timer_on() can be used
    queue pinned timers and subsequently mod_timer_pinned() can be used
    to modify the 'expires' field.

    For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
    added to queue cpu-pinned hrtimer.

    [ tglx: use .._PINNED mode argument instead of creating tons of new
    functions ]

    Signed-off-by: Arun R Bharadwaj
    Signed-off-by: Thomas Gleixner

    Arun R Bharadwaj
     

02 May, 2009

1 commit

  • In the current kernel implementation only kernel timers for time interval
    tv1 are being deferred. This patch allows any timer that is configured as
    deferrable to be defer regardless of time interval.

    This patch was previously discussed in
    http://marc.info/?l=linux-kernel&m=123196343531966&w=2 and was acked by
    Venki Pallipadi, the author of the original deferrable timer patch.

    Signed-off-by: Jon Hunter
    Acked-by: Venkatesh Pallipadi
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Jon Hunter
     

29 Apr, 2009

1 commit


10 Apr, 2009

1 commit

  • …or-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    printk: fix wrong format string iter for printk
    futex: comment requeue key reference semantics

    * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    irq: fix cpumask memory leak on offstack cpumask kernels

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)
    posix-timers: fix RLIMIT_CPU && fork()
    timers: add missing kernel-doc

    Linus Torvalds
     

06 Apr, 2009

1 commit

  • While going over the wakeup code I noticed delayed wakeups only work
    for hardware counters but basically all software counters rely on
    them.

    This patch unifies and generalizes the delayed wakeup to fix this
    issue.

    Since we're dealing with NMI context bits here, use a cmpxchg() based
    single link list implementation to track counters that have pending
    wakeups.

    [ This should really be generic code for delayed wakeups, but since we
    cannot use cmpxchg()/xchg() in generic code, I've let it live in the
    perf_counter code. -- Eric Dumazet could use it to aggregate the
    network wakeups. ]

    Furthermore, the x86 method of using TIF flags was flawed in that its
    quite possible to end up setting the bit on the idle task, loosing the
    wakeup.

    The powerpc method uses per-cpu storage and does appear to be
    sufficient.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

02 Apr, 2009

1 commit

  • Add missing kernel-doc parameter notation and change function
    name to its new name:

    Warning(kernel/timer.c:543): No description found for parameter 'name'
    Warning(kernel/timer.c:543): No description found for parameter 'key'

    Signed-off-by: Randy Dunlap
    Cc: akpm
    Cc: Johannes Berg
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

31 Mar, 2009

1 commit

  • * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits)
    lockdep: fix deadlock in lockdep_trace_alloc
    lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB
    lockdep: annotate reclaim context (__GFP_NOFS), fix
    lockdep: build fix for !PROVE_LOCKING
    lockstat: warn about disabled lock debugging
    lockdep: use stringify.h
    lockdep: simplify check_prev_add_irq()
    lockdep: get_user_chars() redo
    lockdep: simplify get_user_chars()
    lockdep: add comments to mark_lock_irq()
    lockdep: remove macro usage from mark_held_locks()
    lockdep: fully reduce mark_lock_irq()
    lockdep: merge the !_READ mark_lock_irq() helpers
    lockdep: merge the _READ mark_lock_irq() helpers
    lockdep: simplify mark_lock_irq() helpers #3
    lockdep: further simplify mark_lock_irq() helpers
    lockdep: simplify the mark_lock_irq() helpers
    lockdep: split up mark_lock_irq()
    lockdep: generate usage strings
    lockdep: generate the state bit definitions
    ...

    Linus Torvalds
     

19 Feb, 2009

1 commit

  • Impact: new timer API

    Based on an idea from Martin Josefsson with the help of
    Patrick McHardy and Stephen Hemminger:

    introduce the mod_timer_pending() API which is a mod_timer()
    offspring that is an invariant on already removed timers.

    (regular mod_timer() re-activates non-pending timers.)

    This is useful for the networking code in that it can
    allow unserialized mod_timer_pending() timer-forwarding
    calls, but a single del_timer*() will stop the timer
    from being reactivated again.

    Also while at it:

    - optimize the regular mod_timer() path some more, the
    timer-stat and a debug check was needlessly duplicated
    in __mod_timer().

    - make the exports come straight after the function, as
    most other exports in timer.c already did.

    - eliminate __mod_timer() as an external API, change the
    users to mod_timer().

    The regular mod_timer() code path is not impacted
    significantly, due to inlining optimizations and due to
    the simplifications.

    Based-on-patch-from: Stephen Hemminger
    Acked-by: Stephen Hemminger
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Cc: netdev@vger.kernel.org
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

15 Feb, 2009

1 commit


14 Jan, 2009

4 commits


31 Dec, 2008

2 commits

  • The cpu time spent by the idle process actually doing something is
    currently accounted as idle time. This is plain wrong, the architectures
    that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
    time spent doing nothing and the time spent by idle doing work. The first
    is accounted with account_idle_time and the second with account_system_time.
    The architectures that use the account_xxx_time interface directly and not
    the account_xxx_ticks interface now need to do the check for the idle
    process in their arch code. In particular to improve the system vs true
    idle time accounting the arch code needs to measure the true idle time
    instead of just testing for the idle process.
    To improve the tick based accounting as well we would need an architecture
    primitive that can tell us if the pt_regs of the interrupted context
    points to the magic instruction that halts the cpu.

    In addition idle time is no more added to the stime of the idle process.
    This field now contains the system time of the idle process as it should
    be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
    every tick that occurs while idle is running will be accounted as idle
    time.

    This patch contains the necessary common code changes to be able to
    distinguish idle system time and true idle time. The architectures with
    support for VIRT_CPU_ACCOUNTING need some changes to exploit this.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The utimescaled / stimescaled fields in the task structure and the
    global cpustat should be set on all architectures. On s390 the calls
    to account_user_time_scaled and account_system_time_scaled never have
    been added. In addition system time that is accounted as guest time
    to the user time of a process is accounted to the scaled system time
    instead of the scaled user time.
    To fix the bugs and to prevent future forgetfulness this patch merges
    account_system_time_scaled into account_system_time and
    account_user_time_scaled into account_user_time.

    Cc: Benjamin Herrenschmidt
    Cc: Hidetoshi Seto
    Cc: Tony Luck
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Cc: Michael Neuling
    Acked-by: Paul Mackerras
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

14 Nov, 2008

2 commits

  • Conflicts:
    security/keys/internal.h
    security/keys/process_keys.c
    security/keys/request_key.c

    Fixed conflicts above by using the non 'tsk' versions.

    Signed-off-by: James Morris

    James Morris
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Cc: linux-audit@redhat.com
    Cc: containers@lists.linux-foundation.org
    Cc: linux-mm@kvack.org
    Signed-off-by: James Morris

    David Howells
     

06 Nov, 2008

1 commit

  • This patch (as1158b) adds round_jiffies_up() and friends. These
    routines work like the analogous round_jiffies() functions, except
    that they will never round down.

    The new routines will be useful for timeouts where we don't care
    exactly when the timer expires, provided it doesn't expire too soon.

    Signed-off-by: Alan Stern
    Signed-off-by: Jens Axboe

    Alan Stern
     

20 Oct, 2008

1 commit


21 Aug, 2008

1 commit

  • Add the comment to explain why the double lock in migrate_timers()
    can't deadlock.

    Change the code to use spinlock_irq() instead of local_irq_disable()
    + spin_lock().

    Signed-off-by: Oleg Nesterov
    Acked-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

11 Aug, 2008

1 commit