25 Jan, 2014

1 commit


21 Jan, 2014

1 commit

  • Pull timer changes from Ingo Molnar:
    - ARM clocksource/clockevent improvements and fixes
    - generic timekeeping updates: TAI fixes/improvements, cleanups
    - Posix cpu timer cleanups and improvements
    - dynticks updates: full dynticks bugfixes, optimizations and cleanups

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    clocksource: Timer-sun5i: Switch to sched_clock_register()
    timekeeping: Remove comment that's mostly out of date
    rtc-cmos: Add an alarm disable quirk
    timekeeper: fix comment typo for tk_setup_internals()
    timekeeping: Fix missing timekeeping_update in suspend path
    timekeeping: Fix CLOCK_TAI timer/nanosleep delays
    tick/timekeeping: Call update_wall_time outside the jiffies lock
    timekeeping: Avoid possible deadlock from clock_was_set_delayed
    timekeeping: Fix potential lost pv notification of time change
    timekeeping: Fix lost updates to tai adjustment
    clocksource: sh_cmt: Add clk_prepare/unprepare support
    clocksource: bcm_kona_timer: Remove unused bcm_timer_ids
    clocksource: vt8500: Remove deprecated IRQF_DISABLED
    clocksource: tegra: Remove deprecated IRQF_DISABLED
    clocksource: misc drivers: Remove deprecated IRQF_DISABLED
    clocksource: sh_mtu2: Remove unnecessary platform_set_drvdata()
    clocksource: sh_tmu: Remove unnecessary platform_set_drvdata()
    clocksource: armada-370-xp: Enable timer divider only when needed
    clocksource: clksrc-of: Warn if no clock sources are found
    clocksource: orion: Switch to sched_clock_register()
    ...

    Linus Torvalds
     

16 Jan, 2014

3 commits

  • Code usually starts with 'tab' instead of 7 'space' in kernel

    Signed-off-by: Alex Shi
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alex Shi
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: John Stultz
    Cc: Kevin Hilman
    Link: http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex.shi@linaro.org
    Signed-off-by: Frederic Weisbecker

    Alex Shi
     
  • We don't need to fetch the timekeeping max deferment under the
    jiffies_lock seqlock.

    If the clocksource is updated concurrently while we stop the tick,
    stop machine is called and the tick will be reevaluated again along with
    uptodate jiffies and its related values.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alex Shi
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: John Stultz
    Cc: Kevin Hilman
    Link: http://lkml.kernel.org/r/1387320692-28460-9-git-send-email-fweisbec@gmail.com
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     
  • This makes the code more symetric against the existing tick functions
    called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

    These function are also symetric as they mirror each other's action:
    we start to account idle time on irq exit and we stop this accounting
    on irq entry. Also the tick is stopped on irq exit and timekeeping
    catches up with the tickless time elapsed until we reach irq entry.

    This rename was suggested by Peter Zijlstra a long while ago but it
    got forgotten in the mass.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alex Shi
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: John Stultz
    Cc: Kevin Hilman
    Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

13 Jan, 2014

1 commit

  • In order to avoid the runtime condition and variable load turn
    sched_clock_stable into a static_key.

    Also provide a shorter implementation of local_clock() and
    cpu_clock(int) when sched_clock_stable==1.

    MAINLINE PRE POST

    sched_clock_stable: 1 1 1
    (cold) sched_clock: 329841 221876 215295
    (cold) local_clock: 301773 234692 220773
    (warm) sched_clock: 38375 25602 25659
    (warm) local_clock: 100371 33265 27242
    (warm) rdtsc: 27340 24214 24208
    sched_clock_stable: 0 0 0
    (cold) sched_clock: 382634 235941 237019
    (cold) local_clock: 396890 297017 294819
    (warm) sched_clock: 38194 25233 25609
    (warm) local_clock: 143452 71234 71232
    (warm) rdtsc: 27345 24245 24243

    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

12 Jan, 2014

2 commits


24 Dec, 2013

1 commit

  • Since the xtime lock was split into the timekeeping lock and
    the jiffies lock, we no longer need to call update_wall_time()
    while holding the jiffies lock.

    Thus, this patch splits update_wall_time() out from do_timer().

    This allows us to get away from calling clock_was_set_delayed()
    in update_wall_time() and instead use the standard clock_was_set()
    call that previously would deadlock, as it causes the jiffies lock
    to be acquired.

    Cc: Sasha Levin
    Cc: Thomas Gleixner
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Ingo Molnar
    Signed-off-by: John Stultz

    John Stultz
     

03 Dec, 2013

1 commit

  • A few functions use remote per CPU access APIs when they
    deal with local values.

    Just do the right conversion to improve performance, code
    readability and debug checks.

    While at it, lets extend some of these function names with *_this_cpu()
    suffix in order to display their purpose more clearly.

    Signed-off-by: Frederic Weisbecker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steven Rostedt

    Frederic Weisbecker
     

29 Nov, 2013

1 commit

  • If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.

    If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
    command line option "nohz=off" or not enabled due to missing hardware
    support, then tick_nohz_get_sleep_length() returns 0. That happens
    because ts->sleep_length is never set in that case.

    Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.

    Reported-by: Michal Hocko
    Reported-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

19 Nov, 2013

1 commit

  • RCU and the fine grained idle time accounting functions check
    tick_nohz_enabled. But that variable is merily telling that NOHZ has
    been enabled in the config and not been disabled on the command line.

    But it does not tell anything about nohz being active. That's what all
    this should check for.

    Matthew reported, that the idle accounting on his old P1 machine
    showed bogus values, when he enabled NOHZ in the config and did not
    disable it on the kernel command line. The reason is that his machine
    uses (refined) jiffies as a clocksource which explains why the "fine"
    grained accounting went into lala land, because it depends on when the
    system goes and leaves idle relative to the jiffies increment.

    Provide a tick_nohz_active indicator and let RCU and the accounting
    code use this instead of tick_nohz_enable.

    Reported-and-tested-by: Matthew Whitehead
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Steven Rostedt
    Reviewed-by: Paul E. McKenney
    Cc: john.stultz@linaro.org
    Cc: mwhitehe@redhat.com
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.de

    Thomas Gleixner
     

16 Aug, 2013

1 commit

  • tick_nohz_full_kick_all() is useful to notify all full dynticks
    CPUs that there is a system state change to checkout before
    re-evaluating the need for the tick.

    Unfortunately this is implemented using smp_call_function_many()
    that ignores the local CPU. This CPU also needs to re-evaluate
    the tick.

    on_each_cpu_mask() is not useful either because we don't want to
    re-evaluate the tick state in place but asynchronously from an IPI
    to avoid messing up with any random locking scenario.

    So lets call tick_nohz_full_kick() from tick_nohz_full_kick_all()
    so that the usual irq work takes care of it.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1375460996-16329-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

14 Aug, 2013

4 commits

  • …rederic/linux-dynticks into timers/nohz

    Pull nohz improvements from Frederic Weisbecker:

    " It mostly contains fixes and full dynticks off-case optimizations. I believe that
    distros want to enable this feature so it seems important to optimize the case
    where the "nohz_full=" parameter is empty. ie: I'm trying to remove any performance
    regression that comes with NO_HZ_FULL=y when the feature is not used.

    This patchset improves the current situation a lot (off-case appears to be around 11% faster
    with hackbench, although I guess it may vary depending on the configuration but it should be
    significantly faster in any case) now there is still some work to do: I can still observe a
    remaining loss of 1.6% throughput seen with hackbench compared to CONFIG_NO_HZ_FULL=n. "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Scheduler IPIs and task context switches are serious fast path.
    Let's try to hide as much as we can the impact of full
    dynticks APIs' off case that are called on these sites
    through the use of static keys.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • These APIs are frequenctly accessed and priority is given
    to optimize the full dynticks off-case in order to let
    distros enable this feature without suffering from
    significant performance regressions.

    Let's inline these APIs and optimize them with static keys.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • Rename the full dynticks's cpumask and cpumask state variables
    to some more exportable names.

    These will be used later from global headers to optimize
    the main full dynticks APIs in conjunction with static keys.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     

13 Aug, 2013

1 commit

  • The context tracking subsystem has the ability to selectively
    enable the tracking on any defined subset of CPU. This means that
    we can define a CPU range that doesn't run the context tracking
    and another range that does.

    Now what we want in practice is to enable the tracking on full
    dynticks CPUs only. In order to perform this, we just need to pass
    our full dynticks CPU range selection from the full dynticks
    subsystem to the context tracking.

    This way we can spare the overhead of RCU user extended quiescent
    state and vtime maintainance on the CPUs that are outside the
    full dynticks range. Just keep in mind the raw context tracking
    itself is still necessary everywhere.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     

29 Jul, 2013

1 commit

  • Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
    repeat mode), because it has been identified as the source of a
    significant performance regression in v3.8 and later as explained by
    Jeremy Eder:

    We believe we've identified a particular commit to the cpuidle code
    that seems to be impacting performance of variety of workloads.
    The simplest way to reproduce is using netperf TCP_RR test, so
    we're using that, on a pair of Sandy Bridge based servers. We also
    have data from a large database setup where performance is also
    measurably/positively impacted, though that test data isn't easily
    share-able.

    Included below are test results from 3 test kernels:

    kernel reverts
    -----------------------------------------------------------
    1) vanilla upstream (no reverts)

    2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c

    3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
    e11538d1f03914eb92af5a1a378375c05ae8520c

    In summary, netperf TCP_RR numbers improve by approximately 4%
    after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When
    69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency
    never seems to get above 40%. Taking that patch out gets C0 near
    100% quite often, and performance increases.

    The below data are histograms representing the %c0 residency @
    1-second sample rates (using turbostat), while under netperf test.

    - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
    - The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
    shows %c0 in the 80,90,100% bins.

    Below each kernel name are netperf TCP_RR trans/s numbers for the
    particular kernel that can be disclosed publicly, comparing the 3
    test kernels. We ran a 4th test with the vanilla kernel where
    we've also set /dev/cpu_dma_latency=0 to show overall impact
    boosting single-threaded TCP_RR performance over 11% above
    baseline.

    3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
    TCP_RR trans/s 54323.78

    -----------------------------------------------------------
    3.10-rc2 vanilla RX (no reverts)
    TCP_RR trans/s 48192.47

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 59]:
    ***********************************************************
    40.0000 - 50.0000 [ 1]: *
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    Sender %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 11]: ***********
    40.0000 - 50.0000 [ 49]:
    *************************************************
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    -----------------------------------------------------------
    3.10-rc2 perfteam2 RX (reverts commit
    e11538d1f03914eb92af5a1a378375c05ae8520c)
    TCP_RR trans/s 49698.69

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 1]: *
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 59]:
    ***********************************************************
    40.0000 - 50.0000 [ 0]:
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    Sender %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 2]: **
    40.0000 - 50.0000 [ 58]:
    **********************************************************
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    -----------------------------------------------------------
    3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
    and e11538d1f03914eb92af5a1a378375c05ae8520c)
    TCP_RR trans/s 47766.95

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 1]: *
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 27]: ***************************
    40.0000 - 50.0000 [ 2]: **
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 2]: **
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 28]: ****************************

    Sender:
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 11]: ***********
    40.0000 - 50.0000 [ 0]:
    50.0000 - 60.0000 [ 1]: *
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 3]: ***
    80.0000 - 90.0000 [ 7]: *******
    90.0000 - 100.0000 [ 38]: **************************************

    These results demonstrate gaining back the tendency of the CPU to
    stay in more responsive, performant C-states (and thus yield
    measurably better performance), by reverting commit
    69a37beabf1f0a6705c08e879bdd5d82ff6486c4.

    Requested-by: Jeremy Eder
    Tested-by: Len Brown
    Cc: 3.8+
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

25 Jul, 2013

2 commits

  • cpu is not used after commit 5b8621a68fdcd2baf1d3b413726f913a5254d46a

    Signed-off-by: Li Zhong
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman
    Signed-off-by: Frederic Weisbecker

    Li Zhong
     
  • If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine
    with an unstable TSC, it will produce a WARN_ON dump as well as taint
    the kernel. This is a bit extreme for a kernel that just enables a
    feature but doesn't use it.

    The warning should only happen if the user tries to use the feature by
    either adding nohz_full to the kernel command line, or by enabling
    CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note,
    this second feature should not (yet) be used by distros or anyone that
    doesn't care if NO_HZ is used or not.

    Signed-off-by: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman
    Signed-off-by: Frederic Weisbecker

    Steven Rostedt
     

15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

10 Jul, 2013

1 commit

  • …eric/linux-dynticks into timers/urgent

    Pull nohz updates/fixes from Frederic Weisbecker:

    ' Note that "watchdog: Boot-disable by default on full dynticks" is a temporary
    solution to solve the issue with the watchdog that prevents the tick from
    stopping. This is to make sure that 3.11 doesn't have that problem as several
    people complained about it.

    A proper and longer term solution has been proposed by Peterz:

    http://lkml.kernel.org/r/20130618103632.GO3204@twins.programming.kicks-ass.net
    '

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

20 Jun, 2013

2 commits

  • Building full dynticks now implies that all CPUs are forced
    into RCU nocb mode through CONFIG_RCU_NOCB_CPU_ALL.

    The dynamic check has become useless.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Li Zhong
    Cc: Borislav Petkov

    Frederic Weisbecker
     
  • If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
    kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
    due to the machine having a unstable clock, warn about it.

    We do not want users thinking that they are getting the benefit
    of nohz when their machine can not support it.

    Signed-off-by: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Signed-off-by: Frederic Weisbecker

    Steven Rostedt
     

31 May, 2013

1 commit

  • In tick_nohz_cpu_down_callback() if the cpu is the one handling
    timekeeping, we must return something that stops the CPU_DOWN_PREPARE
    notifiers and then start notify CPU_DOWN_FAILED on the already called
    notifier call backs.

    However traditional errno values are not handled by the notifier unless
    these are encapsulated using errno_to_notifier().

    Hence the current -EINVAL is misinterpreted and converted to junk after
    notifier_to_errno(), leaving the notifier subsystem to random behaviour
    such as eventually allowing the cpu to go down.

    Fix this by using the standard NOTIFY_BAD instead.

    Signed-off-by: Li Zhong
    Reviewed-by: Srivatsa S. Bhat
    Acked-by: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Li Zhong
     

16 May, 2013

1 commit

  • Pull timer fixes from Thomas Gleixner:

    - Cure for not using zalloc in the first place, which leads to random
    crashes with CPUMASK_OFF_STACK.

    - Revert a user space visible change which broke udev

    - Add a missing cpu_online early return introduced by the new full
    dyntick conversions

    - Plug a long standing race in the timer wheel cpu hotplug code.
    Sigh...

    - Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu
    up.

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
    timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
    tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
    tick: Cleanup NOHZ per cpu data on cpu down
    tick: Use zalloc_cpumask_var for allocating offstack cpumasks

    Linus Torvalds
     

14 May, 2013

1 commit

  • commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
    idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
    to bail out when the cpu is offline. That's causing subsequent
    failures as an offline CPU is supposed to die and not to fiddle with
    nohz magic.

    Return false in can_stop_idle_tick() if the cpu is offline.

    Reported-and-tested-by: Jiri Kosina
    Reported-and-tested-by: Prarit Bhargava
    Cc: Frederic Weisbecker
    Cc: Borislav Petkov
    Cc: Tony Luck
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

12 May, 2013

1 commit

  • Prarit reported a crash on CPU offline/online. The reason is that on
    CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
    up. If at cpu online an interrupt happens before the per cpu tick
    device is registered the irq_enter() check potentially sees stale data
    and dereferences a NULL pointer.

    Cleanup the data after the cpu is dead.

    Reported-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Cc: Mike Galbraith
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

04 May, 2013

1 commit

  • The scheduler doesn't yet fully support environments
    with a single task running without a periodic tick.

    In order to ensure we still maintain the duties of scheduler_tick(),
    keep at least 1 tick per second.

    This makes sure that we keep the progression of various scheduler
    accounting and background maintainance even with a very low granularity.
    Examples include cpu load, sched average, CFS entity vruntime,
    avenrun and events such as load balancing, amongst other details
    handled in sched_class::task_tick().

    This limitation will be removed in the future once we get
    these individual items to work in full dynticks CPUs.

    Suggested-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

29 Apr, 2013

1 commit

  • I saw following error when testing the latest nohz code on
    Power:

    [ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
    [ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
    [ 85.295402] Call Trace:
    [ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
    [ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
    [ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
    [ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
    [ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
    [ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
    [ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54

    The code below moves the test into local_irq_save/restore
    section to avoid the above complaint.

    Signed-off-by: Li Zhong
    Acked-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Link: http://lkml.kernel.org/r/1367119558.6391.34.camel@ThinkPad-T5421.cn.ibm.com
    Signed-off-by: Ingo Molnar

    Li Zhong
     

26 Apr, 2013

1 commit

  • One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine,
    which apparently can break out of its idle loop rather frequently, with
    high frequency.

    In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant
    context switching, because tick_nohz_stop_sched_tick() will, if
    delta_jiffies == 0, mis-identify this as a timer event - activating the
    TIMER_SOFTIRQ, which wakes up ksoftirqd.

    Fix this by treating delta_jiffies == 0 the same way we treat other short
    wakeups, delta_jiffies == 1.

    Cc: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

23 Apr, 2013

5 commits

  • It's not obvious to find out why the full dynticks subsystem
    doesn't always stop the tick: whether this is due to kthreads,
    posix timers, perf events, etc...

    These new tracepoints are here to help the user diagnose
    the failures and test this feature.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • When a task is scheduled in, it may have some properties
    of its own that could make the CPU reconsider the need for
    the tick: posix cpu timers, perf events, ...

    So notify the full dynticks subsystem when a task gets
    scheduled in and re-check the tick dependency at this
    stage. This is done through a self IPI to avoid messing
    up with any current lock scenario.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Interrupt exit is a natural place to stop the tick: it happens
    after all events happening before and during the irq which
    are liable to update the dependency on the tick occured. Also
    it makes sure that any check on tick dependency is well ordered
    against dynticks kick IPIs.

    Bring in the infrastructure that performs the tick dependency
    checks on irq exit and shut it down if these checks show that we
    can do it safely.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Implement the full dynticks kick that is performed from
    IPIs sent by various subsystems (scheduler, posix timers, ...)
    when they want to notify about a new event that may
    reconsider the dependency on the tick.

    Most of the time, such an event end up restarting the tick.

    (Part of the design with subsystems providing *_can_stop_tick()
    helpers suggested by Peter Zijlstra a while ago).

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • The scheduler IPI is used by the scheduler to kick
    full dynticks CPUs asynchronously when more than one
    task are running or when a new timer list timer is
    enqueued. This way the destination CPU can decide
    to restart the tick to handle this new situation.

    Now let's call that kick in the scheduler IPI.

    (Reusing the scheduler IPI rather than implementing
    a new IPI was suggested by Peter Zijlstra a while ago)

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

21 Apr, 2013

1 commit


19 Apr, 2013

1 commit

  • Provide a new kernel config that defaults all CPUs to be part
    of the full dynticks range, except the boot one for timekeeping.

    This default setting is overriden by the nohz_full= boot option
    if passed by the user.

    This is helpful for those who don't need a finegrained range
    of full dynticks CPU and also for automated testing.

    Suggested-by: Ingo Molnar
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker