19 Mar, 2014

1 commit


01 Feb, 2014

1 commit


28 Jan, 2014

3 commits


25 Jan, 2014

1 commit


21 Jan, 2014

2 commits

  • Pull timer changes from Ingo Molnar:
    - ARM clocksource/clockevent improvements and fixes
    - generic timekeeping updates: TAI fixes/improvements, cleanups
    - Posix cpu timer cleanups and improvements
    - dynticks updates: full dynticks bugfixes, optimizations and cleanups

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    clocksource: Timer-sun5i: Switch to sched_clock_register()
    timekeeping: Remove comment that's mostly out of date
    rtc-cmos: Add an alarm disable quirk
    timekeeper: fix comment typo for tk_setup_internals()
    timekeeping: Fix missing timekeeping_update in suspend path
    timekeeping: Fix CLOCK_TAI timer/nanosleep delays
    tick/timekeeping: Call update_wall_time outside the jiffies lock
    timekeeping: Avoid possible deadlock from clock_was_set_delayed
    timekeeping: Fix potential lost pv notification of time change
    timekeeping: Fix lost updates to tai adjustment
    clocksource: sh_cmt: Add clk_prepare/unprepare support
    clocksource: bcm_kona_timer: Remove unused bcm_timer_ids
    clocksource: vt8500: Remove deprecated IRQF_DISABLED
    clocksource: tegra: Remove deprecated IRQF_DISABLED
    clocksource: misc drivers: Remove deprecated IRQF_DISABLED
    clocksource: sh_mtu2: Remove unnecessary platform_set_drvdata()
    clocksource: sh_tmu: Remove unnecessary platform_set_drvdata()
    clocksource: armada-370-xp: Enable timer divider only when needed
    clocksource: clksrc-of: Warn if no clock sources are found
    clocksource: orion: Switch to sched_clock_register()
    ...

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:

    - Add the initial implementation of SCHED_DEADLINE support: a real-time
    scheduling policy where tasks that meet their deadlines and
    periodically execute their instances in less than their runtime quota
    see real-time scheduling and won't miss any of their deadlines.
    Tasks that go over their quota get delayed (Available to privileged
    users for now)

    - Clean up and fix preempt_enable_no_resched() abuse all around the
    tree

    - Do sched_clock() performance optimizations on x86 and elsewhere

    - Fix and improve auto-NUMA balancing

    - Fix and clean up the idle loop

    - Apply various cleanups and fixes

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    sched: Fix __sched_setscheduler() nice test
    sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
    sched: Fix up attr::sched_priority warning
    sched: Fix up scheduler syscall LTP fails
    sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
    sched/core: Fix htmldocs warnings
    sched/deadline: No need to check p if dl_se is valid
    sched/deadline: Remove unused variables
    sched/deadline: Fix sparse static warnings
    m68k: Fix build warning in mac_via.h
    sched, thermal: Clean up preempt_enable_no_resched() abuse
    sched, net: Fixup busy_loop_us_clock()
    sched, net: Clean up preempt_enable_no_resched() abuse
    sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
    sched/preempt, locking: Rework local_bh_{dis,en}able()
    sched/clock, x86: Avoid a runtime condition in native_sched_clock()
    sched/clock: Fix up clear_sched_clock_stable()
    sched/clock, x86: Use a static_key for sched_clock_stable
    sched/clock: Remove local_irq_disable() from the clocks
    sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
    ...

    Linus Torvalds
     

16 Jan, 2014

1 commit

  • This makes the code more symetric against the existing tick functions
    called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

    These function are also symetric as they mirror each other's action:
    we start to account idle time on irq exit and we stop this accounting
    on irq entry. Also the tick is stopped on irq exit and timekeeping
    catches up with the tickless time elapsed until we reach irq entry.

    This rename was suggested by Peter Zijlstra a long while ago but it
    got forgotten in the mass.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alex Shi
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: John Stultz
    Cc: Kevin Hilman
    Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

14 Jan, 2014

1 commit

  • Currently local_bh_disable() is out-of-line for no apparent reason.
    So inline it to save a few cycles on call/return nonsense, the
    function body is a single add on x86 (a few loads and store extra on
    load/store archs).

    Also expose two new local_bh functions:

    __local_bh_{dis,en}able_ip(unsigned long ip, unsigned int cnt);

    Which implement the actual local_bh_{dis,en}able() behaviour.

    The next patch uses the exposed @cnt argument to optimize bh lock
    functions.

    With build fixes from Jacob Pan.

    Cc: rjw@rjwysocki.net
    Cc: rui.zhang@intel.com
    Cc: jacob.jun.pan@linux.intel.com
    Cc: Mike Galbraith
    Cc: hpa@zytor.com
    Cc: Arjan van de Ven
    Cc: lenb@kernel.org
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Jan, 2014

1 commit

  • Currently all _bh_ lock functions do two preempt_count operations:

    local_bh_disable();
    preempt_disable();

    and for the unlock:

    preempt_enable_no_resched();
    local_bh_enable();

    Since its a waste of perfectly good cycles to modify the same variable
    twice when you can do it in one go; use the new
    __local_bh_{dis,en}able_ip() functions that allow us to provide a
    preempt_count value to add/sub.

    So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to
    add/sub to be done in one go.

    As a bonus it gets rid of the preempt_enable_no_resched() usage.

    This reduces a 1000 loops of:

    spin_lock_bh(&bh_lock);
    spin_unlock_bh(&bh_lock);

    from 53596 cycles to 51995 cycles. I didn't do enough measurements to
    say for absolute sure that the result is significant but the the few
    runs I did for each suggest it is so.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Cc: jacob.jun.pan@linux.intel.com
    Cc: Mike Galbraith
    Cc: hpa@zytor.com
    Cc: Arjan van de Ven
    Cc: lenb@kernel.org
    Cc: rjw@rjwysocki.net
    Cc: rui.zhang@intel.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Dec, 2013

1 commit

  • A few functions use remote per CPU access APIs when they
    deal with local values.

    Just do the right conversion to improve performance, code
    readability and debug checks.

    While at it, lets extend some of these function names with *_this_cpu()
    suffix in order to display their purpose more clearly.

    Signed-off-by: Frederic Weisbecker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steven Rostedt

    Frederic Weisbecker
     

27 Nov, 2013

2 commits

  • Instead of saving the hardirq state on a per CPU variable, which require
    an explicit call before the softirq handling and some complication,
    just save and restore the hardirq tracing state through functions
    return values and parameters.

    It simplifies a bit the black magic that works around the fact that
    softirqs can be called from hardirqs while hardirqs can nest on softirqs
    but those two cases have very different semantics and only the latter
    case assume both states.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Paul E. McKenney
    Link: http://lkml.kernel.org/r/1384906054-30676-1-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Prepare for dependent patch.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

20 Nov, 2013

1 commit

  • There was a reported deadlock on -rt which lockdep didn't report.

    It turns out that in irq_exit() we tell lockdep that the hardirq
    context ends and then do all kinds of locking afterwards.

    To fix it, move trace_hardirq_exit() to the very end of irq_exit(), this
    ensures all locking in tick_irq_exit() and rcu_irq_exit() are properly
    recorded as happening from hardirq context.

    This however leads to the 'fun' little problem of running softirqs
    while in hardirq context. To cure this make the softirq code a little
    more complex (in the CONFIG_TRACE_IRQFLAGS case).

    Due to stack swizzling arch dependent trickery we cannot pass an
    argument to __do_softirq() to tell it if it was done from hardirq
    context or not; so use a side-band argument.

    When we do __do_softirq() from hardirq context, 'atomically' flip to
    softirq context and back, so that no locking goes without being in
    either hard- or soft-irq context.

    I didn't find any new problems in mainline using this patch, but it
    did show the -rt problem.

    Reported-by: Sebastian Andrzej Siewior
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-dgwc5cdksbn0jk09vbmcc9sa@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Nov, 2013

1 commit

  • This commit was incomplete in that code to remove items from the per-cpu
    lists was missing and never acquired a user in the 5 years it has been in
    the tree. We're going to implement what it seems to try to archive in a
    simpler way, and this code is in the way of doing so.

    Signed-off-by: Christoph Hellwig
    Cc: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

12 Nov, 2013

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "The main changes in this cycle are:

    - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
    van Riel, Peter Zijlstra et al. Yay!

    - optimize preemption counter handling: merge the NEED_RESCHED flag
    into the preempt_count variable, by Peter Zijlstra.

    - wait.h fixes and code reorganization from Peter Zijlstra

    - cfs_bandwidth fixes from Ben Segall

    - SMP load-balancer cleanups from Peter Zijstra

    - idle balancer improvements from Jason Low

    - other fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
    ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
    stop_machine: Fix race between stop_two_cpus() and stop_cpus()
    sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
    sched: Fix asymmetric scheduling for POWER7
    sched: Move completion code from core.c to completion.c
    sched: Move wait code from core.c to wait.c
    sched: Move wait.c into kernel/sched/
    sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
    sched: Avoid throttle_cfs_rq() racing with period_timer stopping
    sched: Guarantee new group-entities always have weight
    sched: Fix hrtimer_cancel()/rq->lock deadlock
    sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
    sched: Fix race on toggling cfs_bandwidth_used
    sched: Remove extra put_online_cpus() inside sched_setaffinity()
    sched/rt: Fix task_tick_rt() comment
    sched/wait: Fix build breakage
    sched/wait: Introduce prepare_to_wait_event()
    sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
    sched: Remove get_online_cpus() usage
    sched: Fix race in migrate_swap_stop()
    ...

    Linus Torvalds
     

09 Oct, 2013

1 commit


01 Oct, 2013

6 commits

  • If irq_exit() is called on the arch's specified irq stack,
    it should be safe to run softirqs inline under that same
    irq stack as it is near empty by the time we call irq_exit().

    For example if we use the same stack for both hard and soft irqs here,
    the worst case scenario is:
    hardirq -> softirq -> hardirq. But then the softirq supersedes the
    first hardirq as the stack user since irq_exit() is called in
    a mostly empty stack. So the stack merge in this case looks acceptable.

    Stack overrun still have a chance to happen if hardirqs have more
    opportunities to nest, but then it's another problem to solve.

    So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol
    that can be defined when irq_exit() runs on the irq stack. That way
    we can spare some stack switch on irq processing and all the cache
    issues that come along.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     
  • For clarity, comment the various stack choices for softirqs
    processing, whether we execute them from ksoftirqd or
    local_irq_enable() calls.

    Their use on irq_exit() is already commented.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     
  • do_softirq() has a debug check that verifies that it is not nesting
    on softirqs processing, nor miscounting the softirq part of the preempt
    count.

    But making sure that softirqs processing don't nest is actually a more
    generic concern that applies to any caller of __do_softirq().

    Do take it one step further and generalize that debug check to
    any softirq processing.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     
  • Before processing softirqs on hardirq exit, we already
    do the check for pending softirqs while hardirqs are
    guaranteed to be disabled.

    So we can take a shortcut and safely jump to the arch
    specific implementation directly.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     
  • All arch overriden implementations of do_softirq() share the following
    common code: disable irqs (to avoid races with the pending check),
    check if there are softirqs pending, then execute __do_softirq() on
    a specific stack.

    Consolidate the common parts such that archs only worry about the
    stack switch.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     
  • The commit facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b
    ("irq: Sanitize invoke_softirq") converted irq exit
    calls of do_softirq() to __do_softirq() on all architectures,
    assuming it was only used there for its irq disablement
    properties.

    But as a side effect, the softirqs processed in the end
    of the hardirq are always called on the inline current
    stack that is used by irq_exit() instead of the softirq
    stack provided by the archs that override do_softirq().

    The result is mostly safe if the architecture runs irq_exit()
    on a separate irq stack because then softirqs are processed
    on that same stack that is near empty at this stage (assuming
    hardirq aren't nesting).

    Otherwise irq_exit() runs in the task stack and so does the softirq
    too. The interrupted call stack can be randomly deep already and
    the softirq can dig through it even further. To add insult to the
    injury, this softirq can be interrupted by a new hardirq, maximizing
    the chances for a stack overrun as reported in powerpc for example:

    do_IRQ: stack overflow: 1920
    CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
    Call Trace:
    [c0000000050a8740] .show_stack+0x130/0x200 (unreliable)
    [c0000000050a8810] .dump_stack+0x28/0x3c
    [c0000000050a8880] .do_IRQ+0x2b8/0x2c0
    [c0000000050a8930] hardware_interrupt_common+0x154/0x180
    --- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
    LR = .cp_start_xmit+0x390/0x820 [8139cp]
    [c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640
    [c0000000050a8e00] .sch_direct_xmit+0x110/0x260
    [c0000000050a8ea0] .dev_queue_xmit+0x260/0x630
    [c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
    [c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge]
    [c0000000050a9070] .dev_hard_start_xmit+0x394/0x640
    [c0000000050a9130] .dev_queue_xmit+0x428/0x630
    [c0000000050a91d0] .ip_finish_output+0x2a4/0x550
    [c0000000050a9290] .ip_local_out+0x50/0x70
    [c0000000050a9310] .ip_queue_xmit+0x148/0x420
    [c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0
    [c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0
    [c0000000050a9520] .tcp_rcv_established+0x1e8/0x930
    [c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570
    [c0000000050a96c0] .tcp_v4_rcv+0x734/0x930
    [c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360
    [c0000000050a9840] .ip_rcv_finish+0x148/0x400
    [c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00
    [c0000000050a99d0] .netif_receive_skb+0x44/0x110
    [c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
    [c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
    [c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
    [c0000000050a9c70] .nf_iterate+0x114/0x130
    [c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0
    [c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge]
    [c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00
    [c0000000050a9fa0] .netif_receive_skb+0x44/0x110
    [c0000000050aa040] .napi_gro_receive+0xe8/0x120
    [c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp]
    [c0000000050aa1d0] .net_rx_action+0x1dc/0x310
    [c0000000050aa2b0] .__do_softirq+0x158/0x330
    [c0000000050aa3b0] .irq_exit+0xc8/0x110
    [c0000000050aa430] .do_IRQ+0xdc/0x2c0
    [c0000000050aa4e0] hardware_interrupt_common+0x154/0x180
    --- Exception: 501 at .bad_range+0x1c/0x110
    LR = .get_page_from_freelist+0x908/0xbb0
    [c0000000050aa7d0] .list_del+0x18/0x50 (unreliable)
    [c0000000050aa850] .get_page_from_freelist+0x908/0xbb0
    [c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0
    [c0000000050aaba0] .alloc_pages_vma+0xd0/0x210
    [c0000000050aac60] .handle_pte_fault+0x814/0xb70
    [c0000000050aad50] .__get_user_pages+0x1a4/0x640
    [c0000000050aae60] .get_user_pages_fast+0xec/0x160
    [c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
    [c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
    [c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
    [c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
    [c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
    [c0000000050ab320] kvm_start_lightweight+0x1ec/0x1fc [kvm]
    [c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
    [c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
    [c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
    [c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
    [c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0
    [c0000000050abd80] .SyS_ioctl+0xd4/0xf0
    [c0000000050abe30] syscall_exit+0x0/0x98

    Since this is a regression, this patch proposes a minimalistic
    and low-risk solution by blindly forcing the hardirq exit processing of
    softirqs on the softirq stack. This way we should reduce significantly
    the opportunities for task stack overflow dug by softirqs.

    Longer term solutions may involve extending the hardirq stack coverage to
    irq_exit(), etc...

    Reported-by: Benjamin Herrenschmidt
    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: #3.9..
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     

25 Sep, 2013

2 commits

  • Rewrite the preempt_count macros in order to extract the 3 basic
    preempt_count value modifiers:

    __preempt_count_add()
    __preempt_count_sub()

    and the new:

    __preempt_count_dec_and_test()

    And since we're at it anyway, replace the unconventional
    $op_preempt_count names with the more conventional preempt_count_$op.

    Since these basic operators are equivalent to the previous _notrace()
    variants, do away with the _notrace() versions.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Replace the single preempt_count() 'function' that's an lvalue with
    two proper functions:

    preempt_count() - returns the preempt_count value as rvalue
    preempt_count_set() - Allows setting the preempt-count value

    Also provide preempt_count_ptr() as a convenience wrapper to implement
    all modifying operations.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-orxrbycjozopqfhb4dxdkdvb@git.kernel.org
    [ Fixed build failure. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Sep, 2013

1 commit


15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

03 Jul, 2013

1 commit

  • Pull core irq changes from Ingo Molnar:
    "The main changes:

    - generic-irqchip driver additions, cleanups and fixes

    - 3 new irqchip drivers: ARMv7-M NVIC, TB10x and Marvell Orion SoCs

    - irq_get_trigger_type() simplification and cross-arch cleanup

    - various cleanups, simplifications

    - documentation updates"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
    softirq: Use _RET_IP_
    genirq: Add the generic chip to the genirq docbook
    genirq: generic-chip: Export some irq_gc_ functions
    genirq: Fix can_request_irq() for IRQs without an action
    irqchip: exynos-combiner: Staticize combiner_init
    irqchip: Add support for ARMv7-M NVIC
    irqchip: Add TB10x interrupt controller driver
    irqdomain: Use irq_get_trigger_type() to get IRQ flags
    MIPS: octeon: Use irq_get_trigger_type() to get IRQ flags
    arm: orion: Use irq_get_trigger_type() to get IRQ flags
    mfd: stmpe: use irq_get_trigger_type() to get IRQ flags
    mfd: twl4030-irq: Use irq_get_trigger_type() to get IRQ flags
    gpio: mvebu: Use irq_get_trigger_type() to get IRQ flags
    genirq: Add irq_get_trigger_type() to get IRQ flags
    genirq: Irqchip: document gcflags arg of irq_alloc_domain_generic_chips
    genirq: Set irq thread to RT priority on creation
    irqchip: Add support for Marvell Orion SoCs
    genirq: Add kerneldoc for irq_disable.
    genirq: irqchip: Add mask to block out invalid irqs
    genirq: Generic chip: Add linear irq domain support
    ...

    Linus Torvalds
     

28 Jun, 2013

1 commit

  • Use the already defined macro to pass the function return address.

    Signed-off-by: Davidlohr Bueso
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1367347569.1784.3.camel@buesod1.americas.hpqcorp.net
    Signed-off-by: Thomas Gleixner

    Davidlohr Bueso
     

11 Jun, 2013

1 commit

  • The stop machine logic can lock up if all but one of the migration
    threads make it through the disable-irq step and the one remaining
    thread gets stuck in __do_softirq. The reason __do_softirq can hang is
    that it has a bail-out based on jiffies timeout, but in the lockup case,
    jiffies itself is not incremented.

    To work around this, re-add the max_restart counter in __do_irq and stop
    processing irqs after 10 restarts.

    Thanks to Tejun Heo and Rusty Russell and others for helping me track
    this down.

    This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce
    latencies").

    It may be worth looking into ath9k to see if it has issues with its irq
    handler at a later date.

    The hang stack traces look something like this:

    ------------[ cut here ]------------
    WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
    Watchdog detected hard LOCKUP on cpu 2
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    Pid: 23, comm: migration/2 Tainted: G C 3.9.4+ #11
    Call Trace:
    warn_slowpath_common+0x85/0x9f
    warn_slowpath_fmt+0x46/0x48
    watchdog_overflow_callback+0x9c/0xa7
    __perf_event_overflow+0x137/0x1cb
    perf_event_overflow+0x14/0x16
    intel_pmu_handle_irq+0x2dc/0x359
    perf_event_nmi_handler+0x19/0x1b
    nmi_handle+0x7f/0xc2
    do_nmi+0xbc/0x304
    end_repeat_nmi+0x1e/0x2e
    <>
    cpu_stopper_thread+0xae/0x162
    smpboot_thread_fn+0x258/0x260
    kthread+0xc7/0xcf
    ret_from_fork+0x7c/0xb0
    ---[ end trace 4947dfa9b0a4cec3 ]---
    BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    irq event stamp: 835637905
    hardirqs last enabled at (835637904): __do_softirq+0x9f/0x257
    hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
    softirqs last enabled at (5654720): __do_softirq+0x1ff/0x257
    softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
    CPU 1
    Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
    RIP: tasklet_hi_action+0xf0/0xf0
    Process migration/1
    Call Trace:

    __do_softirq+0x117/0x257
    irq_exit+0x5f/0xbb
    smp_apic_timer_interrupt+0x8a/0x98
    apic_timer_interrupt+0x72/0x80

    printk+0x4d/0x4f
    stop_machine_cpu_stop+0x22c/0x274
    cpu_stopper_thread+0xae/0x162
    smpboot_thread_fn+0x258/0x260
    kthread+0xc7/0xcf
    ret_from_fork+0x7c/0xb0

    Signed-off-by: Ben Greear
    Acked-by: Tejun Heo
    Acked-by: Pekka Riikonen
    Cc: Eric Dumazet
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Ben Greear
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

02 May, 2013

1 commit


01 May, 2013

1 commit


23 Apr, 2013

1 commit

  • Eventually try to disable tick on irq exit, now that the
    fundamental infrastructure is in place.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

03 Apr, 2013

1 commit

  • We are planning to convert the dynticks Kconfig options layout
    into a choice menu. The user must be able to easily pick
    any of the following implementations: constant periodic tick,
    idle dynticks, full dynticks.

    As this implies a mutual exclusion, the two dynticks implementions
    need to converge on the selection of a common Kconfig option in order
    to ease the sharing of a common infrastructure.

    It would thus seem pretty natural to reuse CONFIG_NO_HZ to
    that end. It already implements all the idle dynticks code
    and the full dynticks depends on all that code for now.
    So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
    CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

    On the other hand we want to stay backward compatible: if
    CONFIG_NO_HZ is set in an older config file, we want to
    enable CONFIG_NO_HZ_IDLE by default.

    But we can't afford both at the same time or we run into
    a circular dependency:

    1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
    CONFIG_NO_HZ
    2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

    We might be able to support that from Kconfig/Kbuild but it
    may not be wise to introduce such a confusing behaviour.

    So to solve this, create a new CONFIG_NO_HZ_COMMON option
    which gathers the common code between idle and full dynticks
    (that common code for now is simply the idle dynticks code)
    and select it from their referring Kconfig.

    Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
    to it for backward compatibility.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

06 Mar, 2013

1 commit

  • Pull irq fixes and cleanups from Thomas Gleixner:
    "Commit e5ab012c3271 ("nohz: Make tick_nohz_irq_exit() irq safe") is
    the first commit in the series and the minimal necessary bugfix, which
    needs to go back into stable.

    The remanining commits enforce irq disabling in irq_exit(), sanitize
    the hardirq/softirq preempt count transition and remove a bunch of no
    longer necessary conditionals."

    I personally love getting rid of the very subtle and confusing
    IRQ_EXIT_OFFSET thing. Even apart from the whole "more lines removed
    than added" thing.

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irq: Don't re-enable interrupts at the end of irq_exit
    irq: Remove IRQ_EXIT_OFFSET workaround
    Revert "nohz: Make tick_nohz_irq_exit() irq safe"
    irq: Sanitize invoke_softirq
    irq: Ensure irq_exit() code runs with interrupts disabled
    nohz: Make tick_nohz_irq_exit() irq safe

    Linus Torvalds
     

01 Mar, 2013

1 commit

  • Commit 74eed0163d0def3fce27228d9ccf3d36e207b286
    "irq: Ensure irq_exit() code runs with interrupts disabled"
    restore interrupts flags in the end of irq_exit() for archs
    that don't define __ARCH_IRQ_EXIT_IRQS_DISABLED.

    However always returning from irq_exit() with interrupts
    disabled should not be a problem for these archs. Prior to
    this commit this was already happening anytime we processed
    pending softirqs anyway.

    Suggested-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul E. McKenney

    Frederic Weisbecker
     

22 Feb, 2013

2 commits

  • The IRQ_EXIT_OFFSET trick was used to make sure the irq
    doesn't get preempted after we substract the HARDIRQ_OFFSET
    until we are entirely done with any code in irq_exit().

    This workaround was necessary because some archs may call
    irq_exit() with irqs enabled and there is still some code
    in the end of this function that is not covered by the
    HARDIRQ_OFFSET but want to stay non-preemptible.

    Now that irq are always disabled in irq_exit(), the whole code
    is guaranteed not to be preempted. We can thus remove this hack.

    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul E. McKenney

    Frederic Weisbecker
     
  • With the irq protection in irq_exit, we can remove the #ifdeffery and
    the bh_disable/enable dance in invoke_softirq()

    Signed-off-by: Thomas Gleixner
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Paul E. McKenney
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos

    Thomas Gleixner