22 Oct, 2016

1 commit


16 Oct, 2016

1 commit

  • Pull gcc plugins update from Kees Cook:
    "This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot
    time as possible, hoping to capitalize on any possible variation in
    CPU operation (due to runtime data differences, hardware differences,
    SMP ordering, thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example
    for how to manipulate kernel code using the gcc plugin internals"

    * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    latent_entropy: Mark functions with __latent_entropy
    gcc-plugins: Add latent_entropy plugin

    Linus Torvalds
     

11 Oct, 2016

1 commit

  • The __latent_entropy gcc attribute can be used only on functions and
    variables. If it is on a function then the plugin will instrument it for
    gathering control-flow entropy. If the attribute is on a variable then
    the plugin will initialize it with random contents. The variable must
    be an integer, an integer array type or a structure with integer fields.

    These specific functions have been selected because they are init
    functions (to help gather boot-time entropy), are called at unpredictable
    times, or they have variable loops, each of which provide some level of
    latent entropy.

    Signed-off-by: Emese Revfy
    [kees: expanded commit message]
    Signed-off-by: Kees Cook

    Emese Revfy
     

04 Oct, 2016

1 commit

  • Pull CPU hotplug updates from Thomas Gleixner:
    "Yet another batch of cpu hotplug core updates and conversions:

    - Provide core infrastructure for multi instance drivers so the
    drivers do not have to keep custom lists.

    - Convert custom lists to the new infrastructure. The block-mq custom
    list conversion comes through the block tree and makes the diffstat
    tip over to more lines removed than added.

    - Handle unbalanced hotplug enable/disable calls more gracefully.

    - Remove the obsolete CPU_STARTING/DYING notifier support.

    - Convert another batch of notifier users.

    The relayfs changes which conflicted with the conversion have been
    shipped to me by Andrew.

    The remaining lot is targeted for 4.10 so that we finally can remove
    the rest of the notifiers"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    cpufreq: Fix up conversion to hotplug state machine
    blk/mq: Reserve hotplug states for block multiqueue
    x86/apic/uv: Convert to hotplug state machine
    s390/mm/pfault: Convert to hotplug state machine
    mips/loongson/smp: Convert to hotplug state machine
    mips/octeon/smp: Convert to hotplug state machine
    fault-injection/cpu: Convert to hotplug state machine
    padata: Convert to hotplug state machine
    cpufreq: Convert to hotplug state machine
    ACPI/processor: Convert to hotplug state machine
    virtio scsi: Convert to hotplug state machine
    oprofile/timer: Convert to hotplug state machine
    block/softirq: Convert to hotplug state machine
    lib/irq_poll: Convert to hotplug state machine
    x86/microcode: Convert to hotplug state machine
    sh/SH-X3 SMP: Convert to hotplug state machine
    ia64/mca: Convert to hotplug state machine
    ARM/OMAP/wakeupgen: Convert to hotplug state machine
    ARM/shmobile: Convert to hotplug state machine
    arm64/FP/SIMD: Convert to hotplug state machine
    ...

    Linus Torvalds
     

30 Sep, 2016

1 commit

  • A while back, Paolo and Hannes sent an RFC patch adding threaded-able
    napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/)

    The problem seems to be that softirqs are very aggressive and are often
    handled by the current process, even if we are under stress and that
    ksoftirqd was scheduled, so that innocent threads would have more chance
    to make progress.

    This patch makes sure that if ksoftirq is running, we let it
    perform the softirq work.

    Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/

    Tested:

    - NIC receiving traffic handled by CPU 0
    - UDP receiver running on CPU 0, using a single UDP socket.
    - Incoming flood of UDP packets targeting the UDP socket.

    Before the patch, the UDP receiver could almost never get CPU cycles and
    could only receive ~2,000 packets per second.

    After the patch, CPU cycles are split 50/50 between user application and
    ksoftirqd/0, and we can effectively read ~900,000 packets per second,
    a huge improvement in DOS situation. (Note that more packets are now
    dropped by the NIC itself, since the BH handlers get less CPU cycles to
    drain RX ring buffer)

    Since the load runs in well identified threads context, an admin can
    more easily tune process scheduling parameters if needed.

    Reported-by: Paolo Abeni
    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Eric Dumazet
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: David Miller
    Cc: Hannes Frederic Sowa
    Cc: Jesper Dangaard Brouer
    Cc: Jonathan Corbet
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1472665349.14381.356.camel@edumazet-glaptop3.roam.corp.google.com
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     

07 Sep, 2016

1 commit


26 Mar, 2016

1 commit

  • KASAN needs to know whether the allocation happens in an IRQ handler.
    This lets us strip everything below the IRQ entry point to reduce the
    number of unique stack traces needed to be stored.

    Move the definition of __irq_entry to so that the
    users don't need to pull in . Also introduce the
    __softirq_entry macro which is similar to __irq_entry, but puts the
    corresponding functions to the .softirqentry.text section.

    Signed-off-by: Alexander Potapenko
    Acked-by: Steven Rostedt
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

29 Feb, 2016

1 commit

  • The preempt_disable() invokes preempt_count_add() which saves the caller
    in ->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for
    its caller but for the parent of the caller. Which means we get the correct
    caller for something like spin_lock() unless the architectures inlines
    those invocations. It is always wrong for preempt_disable() or
    local_bh_disable().

    This patch makes the function get_lock_parent_ip() which tries
    CALLER_ADDR0,1,2 if the former is a locking function.
    This seems to record the preempt_disable() caller properly for
    preempt_disable() itself as well as for get_cpu_var() or
    local_bh_disable().

    Steven asked for the get_parent_ip() -> get_lock_parent_ip() rename.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160226135456.GB18244@linutronix.de
    Signed-off-by: Ingo Molnar

    Sebastian Andrzej Siewior
     

10 Feb, 2015

1 commit

  • Pull core locking updates from Ingo Molnar:
    "The main changes are:

    - mutex, completions and rtmutex micro-optimizations
    - lock debugging fix
    - various cleanups in the MCS and the futex code"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/rtmutex: Optimize setting task running after being blocked
    locking/rwsem: Use task->state helpers
    sched/completion: Add lock-free checking of the blocking case
    sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state
    locking/mutex: Explicitly mark task as running after wakeup
    futex: Fix argument handling in futex_lock_pi() calls
    doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants
    locking/Documentation: Update code path
    softirq/preempt: Add missing current->preempt_disable_ip update
    locking/osq: No need for load/acquire when acquire-polling
    locking/mcs: Better differentiate between MCS variants
    locking/mutex: Introduce ww_mutex_set_context_slowpath()
    locking/mutex: Move MCS related comments to proper location
    locking/mutex: Checking the stamp is WW only

    Linus Torvalds
     

15 Jan, 2015

2 commits

  • Simplify run_ksoftirqd() by using the new cond_resched_rcu_qs() function
    that conditionally reschedules, but unconditionally supplies an RCU
    quiescent state. This commit is separate from the previous commit by
    Calvin Owens because Calvin's approach can be backported, while this
    commit cannot be. The reason that this commit cannot be backported is
    that cond_resched_rcu_qs() does not always provide the needed quiescent
    state in earlier kernels.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • While debugging an issue with excessive softirq usage, I encountered the
    following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
    infrastructure"):

    [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]

    ...but despite this note, the patch still calls RCU with IRQs disabled.

    This seemingly innocuous change caused a significant regression in softirq
    CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
    introducing 0.01% packet loss, the softirq usage would jump to around 25%,
    spiking as high as 50%. Before the change, the usage would never exceed 5%.

    Moving the call to rcu_note_context_switch() after the cond_sched() call,
    as it was originally before the hotplug patch, completely eliminated this
    problem.

    Signed-off-by: Calvin Owens
    Cc: stable@vger.kernel.org
    Signed-off-by: Paul E. McKenney

    Calvin Owens
     

14 Jan, 2015

1 commit

  • While debugging some "sleeping function called from invalid context" bug I
    realized that the debugging message "Preemption disabled at:" pointed to
    an incorrect function.

    In particular if the last function/action that disabled preemption was
    spin_lock_bh() then current->preempt_disable_ip won't be updated.

    The reason for this is that __local_bh_disable_ip() will increase
    preempt_count manually instead of calling preempt_count_add(), which
    would handle the update correctly.

    It look like the manual handling was done to work around some lockdep issue.

    So add the missing update of current->preempt_disable_ip to
    __local_bh_disable_ip() as well.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Paul E. McKenney
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20150107090441.GC4365@osiris
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

04 Nov, 2014

1 commit

  • The "cpu" argument to rcu_note_context_switch() is always the current
    CPU, so drop it. This in turn allows the "cpu" argument to
    rcu_preempt_note_context_switch() to be removed, which allows the sole
    use of "cpu" in both functions to be replaced with a this_cpu_ptr().
    Again, the anticipated cross-CPU uses of these functions has been
    replaced by NO_HZ_FULL.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     

15 Oct, 2014

1 commit

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     

08 Sep, 2014

1 commit

  • The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
    old-style per-CPU variable access and write to ->passed_quiesce even
    if it is already set. This commit therefore updates to use the new-style
    per-CPU variable access functions and avoids the spurious writes.
    This commit also eliminates the "cpu" argument to these functions because
    they are always invoked on the indicated CPU.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

27 Aug, 2014

1 commit

  • Convert uses of __get_cpu_var for creating a address from a percpu
    offset to this_cpu_ptr.

    The two cases where get_cpu_var is used to actually access a percpu
    variable are changed to use this_cpu_read/raw_cpu_read.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

22 May, 2014

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    " 1. Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2014/4/28/634.

    2. Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2014/4/28/645.

    3. Torture-test changes. These were posted to LKML at
    https://lkml.org/lkml/2014/4/28/667.

    4. Variable-name renaming cleanup, sent separately due to conflicts.
    This was posted to LKML at https://lkml.org/lkml/2014/5/13/854.

    5. Patch to suppress RCU stall warnings while sysrq requests are
    being processed. This patch is the RCU portions of the patch
    that Rik posted to LKML at https://lkml.org/lkml/2014/4/29/457.
    The reason for pushing this patch ahead instead of waiting until
    3.17 is that the NMI-based stack traces are messing up sysrq
    output, and in some cases also messing up the system as well."

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

06 May, 2014

1 commit


29 Apr, 2014

1 commit

  • Calling rcu_bh_qs() after every softirq action is not really needed.
    What RCU needs is at least one rcu_bh_qs() per softirq round to note a
    quiescent state was passed for rcu_bh.

    Note for Paul and myself : this could be inlined as a single instruction
    and avoid smp_processor_id()
    (sone this_cpu_write(rcu_bh_data.passed_quiesce, 1))

    Signed-off-by: Eric Dumazet
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Eric Dumazet
     

28 Apr, 2014

1 commit

  • On x86 the allocation of irq descriptors may allocate interrupts which
    are in the range of the GSI interrupts. That's wrong as those
    interrupts are hardwired and we don't have the irq domain translation
    like PPC. So one of these interrupts can be hooked up later to one of
    the devices which are hard wired to it and the io_apic init code for
    that particular interrupt line happily reuses that descriptor with a
    completely different configuration so hell breaks lose.

    Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
    except for a few usage sites which have not yet blown up in our face
    for whatever reason. But for drivers which need an irq range, like the
    GPIO drivers, we have no limit in place and we don't want to expose
    such a detail to a driver.

    To cure this introduce a function which an architecture can implement
    to impose a lower bound on the dynamic interrupt allocations.

    Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
    the end of the hardwired interrupt space, so all dynamic allocations
    happen above.

    That not only allows the GPIO driver to work sanely, it also protects
    the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
    htirq code. They need to be cleaned up as well, but that's a separate
    issue.

    Reported-by: Jin Yao
    Signed-off-by: Thomas Gleixner
    Tested-by: Mika Westerberg
    Cc: Mathias Nyman
    Cc: Linus Torvalds
    Cc: Grant Likely
    Cc: H. Peter Anvin
    Cc: Rafael J. Wysocki
    Cc: Andy Shevchenko
    Cc: Krogerus Heikki
    Cc: Linus Walleij
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

19 Mar, 2014

1 commit


01 Feb, 2014

1 commit


28 Jan, 2014

3 commits


25 Jan, 2014

1 commit


21 Jan, 2014

2 commits

  • Pull timer changes from Ingo Molnar:
    - ARM clocksource/clockevent improvements and fixes
    - generic timekeeping updates: TAI fixes/improvements, cleanups
    - Posix cpu timer cleanups and improvements
    - dynticks updates: full dynticks bugfixes, optimizations and cleanups

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    clocksource: Timer-sun5i: Switch to sched_clock_register()
    timekeeping: Remove comment that's mostly out of date
    rtc-cmos: Add an alarm disable quirk
    timekeeper: fix comment typo for tk_setup_internals()
    timekeeping: Fix missing timekeeping_update in suspend path
    timekeeping: Fix CLOCK_TAI timer/nanosleep delays
    tick/timekeeping: Call update_wall_time outside the jiffies lock
    timekeeping: Avoid possible deadlock from clock_was_set_delayed
    timekeeping: Fix potential lost pv notification of time change
    timekeeping: Fix lost updates to tai adjustment
    clocksource: sh_cmt: Add clk_prepare/unprepare support
    clocksource: bcm_kona_timer: Remove unused bcm_timer_ids
    clocksource: vt8500: Remove deprecated IRQF_DISABLED
    clocksource: tegra: Remove deprecated IRQF_DISABLED
    clocksource: misc drivers: Remove deprecated IRQF_DISABLED
    clocksource: sh_mtu2: Remove unnecessary platform_set_drvdata()
    clocksource: sh_tmu: Remove unnecessary platform_set_drvdata()
    clocksource: armada-370-xp: Enable timer divider only when needed
    clocksource: clksrc-of: Warn if no clock sources are found
    clocksource: orion: Switch to sched_clock_register()
    ...

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:

    - Add the initial implementation of SCHED_DEADLINE support: a real-time
    scheduling policy where tasks that meet their deadlines and
    periodically execute their instances in less than their runtime quota
    see real-time scheduling and won't miss any of their deadlines.
    Tasks that go over their quota get delayed (Available to privileged
    users for now)

    - Clean up and fix preempt_enable_no_resched() abuse all around the
    tree

    - Do sched_clock() performance optimizations on x86 and elsewhere

    - Fix and improve auto-NUMA balancing

    - Fix and clean up the idle loop

    - Apply various cleanups and fixes

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    sched: Fix __sched_setscheduler() nice test
    sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
    sched: Fix up attr::sched_priority warning
    sched: Fix up scheduler syscall LTP fails
    sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
    sched/core: Fix htmldocs warnings
    sched/deadline: No need to check p if dl_se is valid
    sched/deadline: Remove unused variables
    sched/deadline: Fix sparse static warnings
    m68k: Fix build warning in mac_via.h
    sched, thermal: Clean up preempt_enable_no_resched() abuse
    sched, net: Fixup busy_loop_us_clock()
    sched, net: Clean up preempt_enable_no_resched() abuse
    sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
    sched/preempt, locking: Rework local_bh_{dis,en}able()
    sched/clock, x86: Avoid a runtime condition in native_sched_clock()
    sched/clock: Fix up clear_sched_clock_stable()
    sched/clock, x86: Use a static_key for sched_clock_stable
    sched/clock: Remove local_irq_disable() from the clocks
    sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
    ...

    Linus Torvalds
     

16 Jan, 2014

1 commit

  • This makes the code more symetric against the existing tick functions
    called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

    These function are also symetric as they mirror each other's action:
    we start to account idle time on irq exit and we stop this accounting
    on irq entry. Also the tick is stopped on irq exit and timekeeping
    catches up with the tickless time elapsed until we reach irq entry.

    This rename was suggested by Peter Zijlstra a long while ago but it
    got forgotten in the mass.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alex Shi
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: John Stultz
    Cc: Kevin Hilman
    Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

14 Jan, 2014

1 commit

  • Currently local_bh_disable() is out-of-line for no apparent reason.
    So inline it to save a few cycles on call/return nonsense, the
    function body is a single add on x86 (a few loads and store extra on
    load/store archs).

    Also expose two new local_bh functions:

    __local_bh_{dis,en}able_ip(unsigned long ip, unsigned int cnt);

    Which implement the actual local_bh_{dis,en}able() behaviour.

    The next patch uses the exposed @cnt argument to optimize bh lock
    functions.

    With build fixes from Jacob Pan.

    Cc: rjw@rjwysocki.net
    Cc: rui.zhang@intel.com
    Cc: jacob.jun.pan@linux.intel.com
    Cc: Mike Galbraith
    Cc: hpa@zytor.com
    Cc: Arjan van de Ven
    Cc: lenb@kernel.org
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Jan, 2014

1 commit

  • Currently all _bh_ lock functions do two preempt_count operations:

    local_bh_disable();
    preempt_disable();

    and for the unlock:

    preempt_enable_no_resched();
    local_bh_enable();

    Since its a waste of perfectly good cycles to modify the same variable
    twice when you can do it in one go; use the new
    __local_bh_{dis,en}able_ip() functions that allow us to provide a
    preempt_count value to add/sub.

    So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to
    add/sub to be done in one go.

    As a bonus it gets rid of the preempt_enable_no_resched() usage.

    This reduces a 1000 loops of:

    spin_lock_bh(&bh_lock);
    spin_unlock_bh(&bh_lock);

    from 53596 cycles to 51995 cycles. I didn't do enough measurements to
    say for absolute sure that the result is significant but the the few
    runs I did for each suggest it is so.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Cc: jacob.jun.pan@linux.intel.com
    Cc: Mike Galbraith
    Cc: hpa@zytor.com
    Cc: Arjan van de Ven
    Cc: lenb@kernel.org
    Cc: rjw@rjwysocki.net
    Cc: rui.zhang@intel.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Dec, 2013

1 commit

  • A few functions use remote per CPU access APIs when they
    deal with local values.

    Just do the right conversion to improve performance, code
    readability and debug checks.

    While at it, lets extend some of these function names with *_this_cpu()
    suffix in order to display their purpose more clearly.

    Signed-off-by: Frederic Weisbecker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steven Rostedt

    Frederic Weisbecker
     

27 Nov, 2013

2 commits

  • Instead of saving the hardirq state on a per CPU variable, which require
    an explicit call before the softirq handling and some complication,
    just save and restore the hardirq tracing state through functions
    return values and parameters.

    It simplifies a bit the black magic that works around the fact that
    softirqs can be called from hardirqs while hardirqs can nest on softirqs
    but those two cases have very different semantics and only the latter
    case assume both states.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Paul E. McKenney
    Link: http://lkml.kernel.org/r/1384906054-30676-1-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Prepare for dependent patch.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

20 Nov, 2013

1 commit

  • There was a reported deadlock on -rt which lockdep didn't report.

    It turns out that in irq_exit() we tell lockdep that the hardirq
    context ends and then do all kinds of locking afterwards.

    To fix it, move trace_hardirq_exit() to the very end of irq_exit(), this
    ensures all locking in tick_irq_exit() and rcu_irq_exit() are properly
    recorded as happening from hardirq context.

    This however leads to the 'fun' little problem of running softirqs
    while in hardirq context. To cure this make the softirq code a little
    more complex (in the CONFIG_TRACE_IRQFLAGS case).

    Due to stack swizzling arch dependent trickery we cannot pass an
    argument to __do_softirq() to tell it if it was done from hardirq
    context or not; so use a side-band argument.

    When we do __do_softirq() from hardirq context, 'atomically' flip to
    softirq context and back, so that no locking goes without being in
    either hard- or soft-irq context.

    I didn't find any new problems in mainline using this patch, but it
    did show the -rt problem.

    Reported-by: Sebastian Andrzej Siewior
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-dgwc5cdksbn0jk09vbmcc9sa@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Nov, 2013

1 commit

  • This commit was incomplete in that code to remove items from the per-cpu
    lists was missing and never acquired a user in the 5 years it has been in
    the tree. We're going to implement what it seems to try to archive in a
    simpler way, and this code is in the way of doing so.

    Signed-off-by: Christoph Hellwig
    Cc: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

12 Nov, 2013

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "The main changes in this cycle are:

    - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
    van Riel, Peter Zijlstra et al. Yay!

    - optimize preemption counter handling: merge the NEED_RESCHED flag
    into the preempt_count variable, by Peter Zijlstra.

    - wait.h fixes and code reorganization from Peter Zijlstra

    - cfs_bandwidth fixes from Ben Segall

    - SMP load-balancer cleanups from Peter Zijstra

    - idle balancer improvements from Jason Low

    - other fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
    ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
    stop_machine: Fix race between stop_two_cpus() and stop_cpus()
    sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
    sched: Fix asymmetric scheduling for POWER7
    sched: Move completion code from core.c to completion.c
    sched: Move wait code from core.c to wait.c
    sched: Move wait.c into kernel/sched/
    sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
    sched: Avoid throttle_cfs_rq() racing with period_timer stopping
    sched: Guarantee new group-entities always have weight
    sched: Fix hrtimer_cancel()/rq->lock deadlock
    sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
    sched: Fix race on toggling cfs_bandwidth_used
    sched: Remove extra put_online_cpus() inside sched_setaffinity()
    sched/rt: Fix task_tick_rt() comment
    sched/wait: Fix build breakage
    sched/wait: Introduce prepare_to_wait_event()
    sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
    sched: Remove get_online_cpus() usage
    sched: Fix race in migrate_swap_stop()
    ...

    Linus Torvalds
     

09 Oct, 2013

1 commit


01 Oct, 2013

2 commits

  • If irq_exit() is called on the arch's specified irq stack,
    it should be safe to run softirqs inline under that same
    irq stack as it is near empty by the time we call irq_exit().

    For example if we use the same stack for both hard and soft irqs here,
    the worst case scenario is:
    hardirq -> softirq -> hardirq. But then the softirq supersedes the
    first hardirq as the stack user since irq_exit() is called in
    a mostly empty stack. So the stack merge in this case looks acceptable.

    Stack overrun still have a chance to happen if hardirqs have more
    opportunities to nest, but then it's another problem to solve.

    So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol
    that can be defined when irq_exit() runs on the irq stack. That way
    we can spare some stack switch on irq processing and all the cache
    issues that come along.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     
  • For clarity, comment the various stack choices for softirqs
    processing, whether we execute them from ksoftirqd or
    local_irq_enable() calls.

    Their use on irq_exit() is already commented.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker