15 Jul, 2013

2 commits

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • Pull slab update from Pekka Enberg:
    "Highlights:

    - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

    - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

    - Fix for excessive slab freelist draining (Wanpeng Li)

    - SLUB and SLOB cleanups and fixes (various people)"

    I ended up editing the branch, and this avoids two commits at the end
    that were immediately reverted, and I instead just applied the oneliner
    fix in between myself.

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
    slub: Check for page NULL before doing the node_match check
    mm/slab: Give s_next and s_stop slab-specific names
    slob: Check for NULL pointer before calling ctor()
    slub: Make cpu partial slab support configurable
    slab: add kmalloc() to kernel API documentation
    slab: fix init_lock_keys
    slob: use DIV_ROUND_UP where possible
    slub: do not put a slab to cpu partial list when cpu_partial is 0
    mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
    mm/slub: Drop unnecessary nr_partials
    mm/slab: Fix /proc/slabinfo unwriteable for slab
    mm/slab: Sharing s_next and s_stop between slab and slub
    mm/slab: Fix drain freelist excessively
    slob: Rework #ifdeffery in slab.h
    mm, slab: moved kmem_cache_alloc_node comment to correct place

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • Add support for extracting LZ4-compressed kernel images, as well as
    LZ4-compressed ramdisk images in the kernel boot process.

    Signed-off-by: Kyungsik Lee
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Russell King
    Cc: Borislav Petkov
    Cc: Florian Fainelli
    Cc: Yann Collet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyungsik Lee
     

08 Jul, 2013

1 commit

  • CPU partial support can introduce level of indeterminism that is not
    wanted in certain context (like a realtime kernel). Make it
    configurable.

    This patch is based on Christoph Lameter's "slub: Make cpu partial slab
    support configurable V2".

    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     

07 Jul, 2013

1 commit

  • Pull timer core updates from Thomas Gleixner:
    "The timer changes contain:

    - posix timer code consolidation and fixes for odd corner cases

    - sched_clock implementation moved from ARM to core code to avoid
    duplication by other architectures

    - alarm timer updates

    - clocksource and clockevents unregistration facilities

    - clocksource/events support for new hardware

    - precise nanoseconds RTC readout (Xen feature)

    - generic support for Xen suspend/resume oddities

    - the usual lot of fixes and cleanups all over the place

    The parts which touch other areas (ARM/XEN) have been coordinated with
    the relevant maintainers. Though this results in an handful of
    trivial to solve merge conflicts, which we preferred over nasty cross
    tree merge dependencies.

    The patches which have been committed in the last few days are bug
    fixes plus the posix timer lot. The latter was in akpms queue and
    next for quite some time; they just got forgotten and Frederic
    collected them last minute."

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
    hrtimer: Remove unused variable
    hrtimers: Move SMP function call to thread context
    clocksource: Reselect clocksource when watchdog validated high-res capability
    posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
    posix_timers: fix racy timer delta caching on task exit
    posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
    selftests: add basic posix timers selftests
    posix_cpu_timers: consolidate expired timers check
    posix_cpu_timers: consolidate timer list cleanups
    posix_cpu_timer: consolidate expiry time type
    tick: Sanitize broadcast control logic
    tick: Prevent uncontrolled switch to oneshot mode
    tick: Make oneshot broadcast robust vs. CPU offlining
    x86: xen: Sync the CMOS RTC as well as the Xen wallclock
    x86: xen: Sync the wallclock when the system time is set
    timekeeping: Indicate that clock was set in the pvclock gtod notifier
    timekeeping: Pass flags instead of multiple bools to timekeeping_update()
    xen: Remove clock_was_set() call in the resume path
    hrtimers: Support resuming with two or more CPUs online (but stopped)
    timer: Fix jiffies wrap behavior of round_jiffies_common()
    ...

    Linus Torvalds
     

05 Jul, 2013

1 commit


04 Jul, 2013

3 commits

  • Trivial, but it really looks better.

    Signed-off-by: Toralf Förster
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toralf Förster
     
  • do_one_initcall() uses a 64 byte string buffer to save a message. This
    buffer is declared static and is only used at boot up and when a module
    is loaded. As 64 bytes is very small, and this function has very limited
    scope, there's no reason to waste permanent memory with this string and
    not just simply put it on the stack.

    Signed-off-by: Steven Rostedt
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • Now there are only 2 members in struct page_cgroup. Update config MEMCG
    description accordingly.

    Signed-off-by: Sergey Dyasly
    Acked-by: Michal Hocko
    Acked-by: KOSAKI Motohiro
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Dyasly
     

03 Jul, 2013

2 commits

  • Pull perf updates from Ingo Molnar:
    "Kernel improvements:

    - watchdog driver improvements by Li Zefan
    - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
    - event multiplexing via hrtimers and other improvements by Stephane
    Eranian
    - kernel stack use optimization by Andrew Hunter
    - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
    - NMI handling rate-limits by Dave Hansen
    - various hw_breakpoint fixes by Oleg Nesterov
    - hw_breakpoint overflow period sampling and related signal handling
    fixes by Jiri Olsa
    - Intel Haswell PMU support by Andi Kleen

    Tooling improvements:

    - Reset SIGTERM handler in workload child process, fix from David
    Ahern.
    - Makefile reorganization, prep work for Kconfig patches, from Jiri
    Olsa.
    - Add automated make test suite, from Jiri Olsa.
    - Add --percent-limit option to 'top' and 'report', from Namhyung
    Kim.
    - Sorting improvements, from Namhyung Kim.
    - Expand definition of sysfs format attribute, from Michael Ellerman.

    Tooling fixes:

    - 'perf tests' fixes from Jiri Olsa.
    - Make Power7 CPI stack events available in sysfs, from Sukadev
    Bhattiprolu.
    - Handle death by SIGTERM in 'perf record', fix from David Ahern.
    - Fix printing of perf_event_paranoid message, from David Ahern.
    - Handle realloc failures in 'perf kvm', from David Ahern.
    - Fix divide by 0 in variance, from David Ahern.
    - Save parent pid in thread struct, from David Ahern.
    - Handle JITed code in shared memory, from Andi Kleen.
    - Fixes for 'perf diff', from Jiri Olsa.
    - Remove some unused struct members, from Jiri Olsa.
    - Add missing liblk.a dependency for python/perf.so, fix from Jiri
    Olsa.
    - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
    - No need to do locking when adding hists in perf report, only 'top'
    needs that, from Namhyung Kim.
    - Fix alignment of symbol column in in the hists browser (top,
    report) when -v is given, from NAmhyung Kim.
    - Fix 'perf top' -E option behavior, from Namhyung Kim.
    - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
    - Fix compile errors in bp_signal 'perf test', from Sukadev
    Bhattiprolu.

    ... and more things"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
    perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
    perf/x86: Fix shared register mutual exclusion enforcement
    perf/x86/intel: Support full width counting
    x86: Add NMI duration tracepoints
    perf: Drop sample rate when sampling is too slow
    x86: Warn when NMI handlers take large amounts of time
    hw_breakpoint: Introduce "struct bp_cpuinfo"
    hw_breakpoint: Simplify *register_wide_hw_breakpoint()
    hw_breakpoint: Introduce cpumask_of_bp()
    hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
    hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
    perf/x86/intel: Add mem-loads/stores support for Haswell
    perf/x86/intel: Support Haswell/v4 LBR format
    perf/x86/intel: Move NMI clearing to end of PMI handler
    perf/x86/intel: Add Haswell PEBS support
    perf/x86/intel: Add simple Haswell PMU support
    perf/x86/intel: Add Haswell PEBS record support
    perf/x86/intel: Fix sparse warning
    perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
    perf/x86/amd: Add IOMMU Performance Counter resource management
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The major changes:

    - Simplify RCU's grace-period and callback processing based on the new
    numbering for callbacks.

    - Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
    single-CPU low-latency systems.

    - SRCU-related changes and fixes.

    - Miscellaneous fixes, including converting a few remaining printk()
    calls to pr_*().

    - Documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs
    rcu: Shrink TINY_RCU by moving exit_rcu()
    rcu: Remove TINY_PREEMPT_RCU tracing documentation
    rcu: Consolidate rcutiny_plugin.h ifdefs
    rcu: Remove rcu_preempt_note_context_switch()
    rcu: Remove the CONFIG_TINY_RCU ifdefs in rcutiny.h
    rcu: Remove check_cpu_stall_preempt()
    rcu: Simplify RCU_TINY RCU callback invocation
    rcu: Remove rcu_preempt_process_callbacks()
    rcu: Remove rcu_preempt_remove_callbacks()
    rcu: Remove rcu_preempt_check_callbacks()
    rcu: Remove show_tiny_preempt_stats()
    rcu: Remove TINY_PREEMPT_RCU
    powerpc,kvm: fix imbalance srcu_read_[un]lock()
    rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
    rcu: Apply Dave Jones's NOCB Kconfig help feedback
    rcu: Merge adjacent identical ifdefs
    rcu: Drive quiescent-state-forcing delay from HZ
    rcu: Remove "Experimental" flags
    kthread: Add kworker kthreads to OS-jitter documentation
    ...

    Linus Torvalds
     

25 Jun, 2013

1 commit

  • Some drivers can be built on more platforms than they run on. This is
    a burden for users and distributors who package a kernel. They have to
    manually deselect some (for them useless) drivers when updating their
    configs via oldconfig. And yet, sometimes it is even impossible to
    disable the drivers without patching the kernel.

    Introduce a new config option COMPILE_TEST and make all those drivers
    to depend on the platform they run on, or on the COMPILE_TEST option.
    Now, when users/distributors choose COMPILE_TEST=n they will not have
    the drivers in their allmodconfig setups, but developers still can
    compile-test them with COMPILE_TEST=y.

    Now the drivers where we use this new option:
    * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
    processors so it should depend on x86.
    * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
    * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
    systems -- which do not actually support the hardware via that
    method.
    * INTEL_MID_PTI: It is specific to the Penwell type of Intel Atom
    device.

    [v2]
    * remove EXPERT dependency

    [gregkh - remove chipidea portion, as it's incorrect, and also doesn't
    apply to my driver-core tree]

    Signed-off-by: Jiri Slaby
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jeff Mahoney
    Cc: Alexander Shishkin
    Cc: linux-usb@vger.kernel.org
    Cc: Florian Tobias Schandinat
    Cc: linux-geode@lists.infradead.org
    Cc: linux-fbdev@vger.kernel.org
    Cc: Richard Cochran
    Cc: netdev@vger.kernel.org
    Cc: Ben Hutchings
    Cc: "Keller, Jacob E"
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     

18 Jun, 2013

1 commit


13 Jun, 2013

1 commit


11 Jun, 2013

5 commits

  • …u.2013.06.10a' and 'tiny.2013.06.10a' into HEAD

    cbnum.2013.06.10a: Apply simplifications stemming from the new callback
    numbering.

    doc.2013.06.10a: Documentation updates.

    fixes.2013.06.10a: Miscellaneous fixes.

    srcu.2013.06.10a: Updates to SRCU.

    tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.

    Paul E. McKenney
     
  • TINY_PREEMPT_RCU adds significant code and complexity, but does not
    offer commensurate benefits. People currently using TINY_PREEMPT_RCU
    can get much better memory footprint with TINY_RCU, or, if they really
    need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
    minor degradation in memory footprint. Please note that this move
    has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
    and on LWN (http://lwn.net/Articles/541037/).

    This commit therefore removes TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The Kconfig help text for the RCU_NOCB_CPU_NONE, RCU_NOCB_CPU_ZERO,
    and RCU_NOCB_CPU_ALL Kconfig options was unclear, so this commit
    adds a bit more detail.

    Reported-by: Dave Jones
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • After a release or two, features are no longer experimental. Therefore,
    this commit removes the "Experimental" tag from them.

    Reported-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit fixes a lockdep-detected deadlock by moving a wake_up()
    call out from a rnp->lock critical section. Please see below for
    the long version of this story.

    On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

    > [12572.705832] ======================================================
    > [12572.750317] [ INFO: possible circular locking dependency detected ]
    > [12572.796978] 3.10.0-rc3+ #39 Not tainted
    > [12572.833381] -------------------------------------------------------
    > [12572.862233] trinity-child17/31341 is trying to acquire lock:
    > [12572.870390] (rcu_node_0){..-.-.}, at: [] rcu_read_unlock_special+0x9f/0x4c0
    > [12572.878859]
    > but task is already holding lock:
    > [12572.894894] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12572.903381]
    > which lock already depends on the new lock.
    >
    > [12572.927541]
    > the existing dependency chain (in reverse order) is:
    > [12572.943736]
    > -> #4 (&ctx->lock){-.-...}:
    > [12572.960032] [] lock_acquire+0x91/0x1f0
    > [12572.968337] [] _raw_spin_lock+0x40/0x80
    > [12572.976633] [] __perf_event_task_sched_out+0x2e7/0x5e0
    > [12572.984969] [] perf_event_task_sched_out+0x93/0xa0
    > [12572.993326] [] __schedule+0x2cf/0x9c0
    > [12573.001652] [] schedule_user+0x2e/0x70
    > [12573.009998] [] retint_careful+0x12/0x2e
    > [12573.018321]
    > -> #3 (&rq->lock){-.-.-.}:
    > [12573.034628] [] lock_acquire+0x91/0x1f0
    > [12573.042930] [] _raw_spin_lock+0x40/0x80
    > [12573.051248] [] wake_up_new_task+0xb7/0x260
    > [12573.059579] [] do_fork+0x105/0x470
    > [12573.067880] [] kernel_thread+0x26/0x30
    > [12573.076202] [] rest_init+0x23/0x140
    > [12573.084508] [] start_kernel+0x3f1/0x3fe
    > [12573.092852] [] x86_64_start_reservations+0x2a/0x2c
    > [12573.101233] [] x86_64_start_kernel+0xcc/0xcf
    > [12573.109528]
    > -> #2 (&p->pi_lock){-.-.-.}:
    > [12573.125675] [] lock_acquire+0x91/0x1f0
    > [12573.133829] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.141964] [] try_to_wake_up+0x31/0x320
    > [12573.150065] [] default_wake_function+0x12/0x20
    > [12573.158151] [] autoremove_wake_function+0x18/0x40
    > [12573.166195] [] __wake_up_common+0x58/0x90
    > [12573.174215] [] __wake_up+0x39/0x50
    > [12573.182146] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.190119] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.198023] [] rcu_nocb_kthread+0x114/0x930
    > [12573.205860] [] kthread+0xed/0x100
    > [12573.213656] [] ret_from_fork+0x7c/0xb0
    > [12573.221379]
    > -> #1 (&rsp->gp_wq){..-.-.}:
    > [12573.236329] [] lock_acquire+0x91/0x1f0
    > [12573.243783] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.251178] [] __wake_up+0x23/0x50
    > [12573.258505] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.265891] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.273248] [] rcu_nocb_kthread+0x114/0x930
    > [12573.280564] [] kthread+0xed/0x100
    > [12573.287807] [] ret_from_fork+0x7c/0xb0

    Notice the above call chain.

    rcu_start_future_gp() is called with the rnp->lock held. Then it calls
    rcu_start_gp_advance, which does a wakeup.

    You can't do wakeups while holding the rnp->lock, as that would mean
    that you could not do a rcu_read_unlock() while holding the rq lock, or
    any lock that was taken while holding the rq lock. This is because...
    (See below).

    > [12573.295067]
    > -> #0 (rcu_node_0){..-.-.}:
    > [12573.309293] [] __lock_acquire+0x1786/0x1af0
    > [12573.316568] [] lock_acquire+0x91/0x1f0
    > [12573.323825] [] _raw_spin_lock+0x40/0x80
    > [12573.331081] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.338377] [] __rcu_read_unlock+0x96/0xa0
    > [12573.345648] [] perf_lock_task_context+0x143/0x2d0
    > [12573.352942] [] find_get_context+0x4e/0x1f0
    > [12573.360211] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.367514] [] SyS_perf_event_open+0x9/0x10
    > [12573.374816] [] tracesys+0xdd/0xe2

    Notice the above trace.

    perf took its own ctx->lock, which can be taken while holding the rq
    lock. While holding this lock, it did a rcu_read_unlock(). The
    perf_lock_task_context() basically looks like:

    rcu_read_lock();
    raw_spin_lock(ctx->lock);
    rcu_read_unlock();

    Now, what looks to have happened, is that we scheduled after taking that
    first rcu_read_lock() but before taking the spin lock. When we scheduled
    back in and took the ctx->lock, the following rcu_read_unlock()
    triggered the "special" code.

    The rcu_read_unlock_special() takes the rnp->lock, which gives us a
    possible deadlock scenario.

    CPU0 CPU1 CPU2
    ---- ---- ----

    rcu_nocb_kthread()
    lock(rq->lock);
    lock(ctx->lock);
    lock(rnp->lock);

    wake_up();

    lock(rq->lock);

    rcu_read_unlock();

    rcu_read_unlock_special();

    lock(rnp->lock);
    lock(ctx->lock);

    **** DEADLOCK ****

    > [12573.382068]
    > other info that might help us debug this:
    >
    > [12573.403229] Chain exists of:
    > rcu_node_0 --> &rq->lock --> &ctx->lock
    >
    > [12573.424471] Possible unsafe locking scenario:
    >
    > [12573.438499] CPU0 CPU1
    > [12573.445599] ---- ----
    > [12573.452691] lock(&ctx->lock);
    > [12573.459799] lock(&rq->lock);
    > [12573.467010] lock(&ctx->lock);
    > [12573.474192] lock(rcu_node_0);
    > [12573.481262]
    > *** DEADLOCK ***
    >
    > [12573.501931] 1 lock held by trinity-child17/31341:
    > [12573.508990] #0: (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12573.516475]
    > stack backtrace:
    > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
    > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
    > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
    > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
    > [12573.567856] Call Trace:
    > [12573.575011] [] dump_stack+0x19/0x1b
    > [12573.582284] [] print_circular_bug+0x200/0x20f
    > [12573.589637] [] __lock_acquire+0x1786/0x1af0
    > [12573.596982] [] ? sched_clock_cpu+0xb5/0x100
    > [12573.604344] [] lock_acquire+0x91/0x1f0
    > [12573.611652] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.619030] [] _raw_spin_lock+0x40/0x80
    > [12573.626331] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.633671] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.640992] [] ? perf_lock_task_context+0x7d/0x2d0
    > [12573.648330] [] ? put_lock_stats.isra.29+0xe/0x40
    > [12573.655662] [] ? delay_tsc+0x90/0xe0
    > [12573.662964] [] __rcu_read_unlock+0x96/0xa0
    > [12573.670276] [] perf_lock_task_context+0x143/0x2d0
    > [12573.677622] [] ? __perf_event_enable+0x370/0x370
    > [12573.684981] [] find_get_context+0x4e/0x1f0
    > [12573.692358] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.699753] [] ? get_parent_ip+0xd/0x50
    > [12573.707135] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
    > [12573.714599] [] SyS_perf_event_open+0x9/0x10
    > [12573.721996] [] tracesys+0xdd/0xe2

    This commit delays the wakeup via irq_work(), which is what
    perf and ftrace use to perform wakeups in critical sections.

    Reported-by: Dave Jones
    Signed-off-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney

    Steven Rostedt
     

04 Jun, 2013

1 commit

  • Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"),
    it has been basically impossible to build a kernel with CONFIG_HOTPLUG
    turned off. Remove all the remaining references to it.

    Cc: Russell King
    Cc: Doug Thompson
    Cc: Bjorn Helgaas
    Cc: Steven Whitehouse
    Cc: Arnd Bergmann
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Andrew Morton
    Signed-off-by: Stephen Rothwell
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Hans Verkuil
    Signed-off-by: Greg Kroah-Hartman

    Stephen Rothwell
     

28 May, 2013

1 commit

  • The current scheme of using the timer tick was fine for per-thread
    events. However, it was causing bias issues in system-wide mode
    (including for uncore PMUs). Event groups would not get their fair
    share of runtime on the PMU. With tickless kernels, if a core is idle
    there is no timer tick, and thus no event rotation (multiplexing).
    However, there are events (especially uncore events) which do count
    even though cores are asleep.

    This patch changes the timer source for multiplexing. It introduces a
    per-PMU per-cpu hrtimer. The advantage is that even when a core goes
    idle, it will come back to service the hrtimer, thus multiplexing on
    system-wide events works much better.

    The per-PMU implementation (suggested by PeterZ) enables adjusting the
    multiplexing interval per PMU. The preferred interval is stashed into
    the struct pmu. If not set, it will be forced to the default interval
    value.

    In order to minimize the impact of the hrtimer, it is turned on and
    off on demand. When the PMU on a CPU is overcommited, the hrtimer is
    activated. It is stopped when the PMU is not overcommitted.

    In order for this to work properly, we had to change the order of
    initialization in start_kernel() such that hrtimer_init() is run
    before perf_event_init().

    The default interval in milliseconds is set to a timer tick just like
    with the old code. We will provide a sysctl to tune this in another
    patch.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

04 May, 2013

1 commit

  • Commit 0637e029392386e6996f5d6574aadccee8315efa
    ("nohz: Select wide RCU nocb for full dynticks") intended
    to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
    enabled.

    However this option is part of a choice menu and Kconfig's
    "select" instruction has no effect on such targets.

    Fix this by using reverse dependencies on the targets we
    don't want instead.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

02 May, 2013

4 commits

  • The full dynticks tree needs the latest RCU and sched
    upstream updates in order to fix some dependencies.

    Merge a common upstream merge point that has these
    updates.

    Conflicts:
    include/linux/perf_event.h
    kernel/rcutree.h
    kernel/rcutree_plugin.h

    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     
  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     
  • Commit f91eb62f71b3 ("init: scream bloody murder if interrupts are
    enabled too early") added three new warnings. The first two seemed
    reasonable, but the third included a warning when an initcall returned
    non-zero. Although, the third WARN() does include an imbalanced preempt
    disabled, or irqs disable, it shouldn't warn if it only had an initcall
    that just returns non-zero.

    In fact, according to Linus, it shouldn't print at all. As it only
    prints with initcall_debug set, and that already shows enough
    information to fix things.

    Link: http://lkml.kernel.org/r/CA+55aFzaBC5SFi7=F2mfm+KWY5qTsBmOqgbbs8E+LUS8JK-sBg@mail.gmail.com

    Suggested-by: Linus Torvalds
    Reported-by: Konrad Rzeszutek Wilk
    Signed-off-by: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

01 May, 2013

2 commits

  • The kconfig language requires that dependent options all follow the
    menuconfig symbol in order to be collapsed below it. Recently some hidden
    options were added below the EXPERT menuconfig, but did not depend on
    EXPERT (because hidden options can't). This broke the display. So
    re-order all these options, and while we're here stick the PCI quirks
    under the EXPERT menu (since it isn't sitting with any related options).

    Before this commit, we get:
    [*] Configure standard kernel features (expert users) --->
    [ ] Sysctl syscall support
    [*] Load all symbols for debugging/ksymoops
    ...
    [ ] Embedded system

    Now we get the older (and correct) behavior:
    [*] Configure standard kernel features (expert users) --->
    [ ] Embedded system
    And if you go into the expert menu you get the expert options:
    [ ] Sysctl syscall support
    [*] Load all symbols for debugging/ksymoops
    ...

    Signed-off-by: Mike Frysinger
    Acked-by: Randy Dunlap
    Cc: zhangwei(Jovi)
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • These are the only users of call_usermodehelper_fns(). This function
    suffers from not being able to determine if the cleanup is called. Even
    if in this places the cleanup pointer is NULL, convert them to use the
    separate call_usermodehelper_setup() + call_usermodehelper_exec()
    functions so we can remove the _fns variant.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     

30 Apr, 2013

7 commits

  • Pull core timer updates from Ingo Molnar:
    "The main changes in this cycle's merge are:

    - Implement shadow timekeeper to shorten in kernel reader side
    blocking, by Thomas Gleixner.

    - Posix timers enhancements by Pavel Emelyanov:

    - allocate timer ID per process, so that exact timer ID allocations
    can be re-created be checkpoint/restore code.

    - debuggability and tooling (/proc/PID/timers, etc.) improvements.

    - suspend/resume enhancements by Feng Tang: on certain new Intel Atom
    processors (Penwell and Cloverview), there is a feature that the
    TSC won't stop in S3 state, so the TSC value won't be reset to 0
    after resume. This can be taken advantage of by the generic via
    the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
    recover/approximate sleep time, the main (and precise) clocksource
    can be used.

    - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
    CPUs the file goes beyond 4MB of size and thus the current
    simplistic seqfile approach fails. Convert /proc/timer_list to a
    proper seq_file with its own iterator.

    - Cleanups and refactorings of the core timekeeping code by John
    Stultz.

    - International Atomic Clock time is managed by the NTP code
    internally currently but not exposed externally. Separate the TAI
    code out and add CLOCK_TAI support and TAI support to the hrtimer
    and posix-timer code, by John Stultz.

    - Add deep idle support enhacement to the broadcast clockevents core
    timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
    clockevents feature (which will be utilized by future clockevents
    driver updates), which allows the use of IRQ affinities to avoid
    spurious wakeups of idle CPUs - the right CPU with an expiring
    timer will be woken.

    - Add new ARM bcm281xx clocksource driver, by Christian Daudt

    - ... various other fixes and cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
    clockevents: Set dummy handler on CPU_DEAD shutdown
    timekeeping: Update tk->cycle_last in resume
    posix-timers: Remove unused variable
    clockevents: Switch into oneshot mode even if broadcast registered late
    timer_list: Convert timer list to be a proper seq_file
    timer_list: Split timer_list_show_tickdevices
    posix-timers: Show sigevent info in proc file
    posix-timers: Introduce /proc/PID/timers file
    posix timers: Allocate timer id per process (v2)
    timekeeping: Make sure to notify hrtimers when TAI offset changes
    hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
    hrtimer: Add expiry time overflow check in hrtimer_interrupt
    timekeeping: Shorten seq_count region
    timekeeping: Implement a shadow timekeeper
    timekeeping: Delay update of clock->cycle_last
    timekeeping: Store cycle_last value in timekeeper struct as well
    ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
    timekeeping: Simplify tai updating from do_adjtimex
    timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
    timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
    ...

    Linus Torvalds
     
  • Pull SMP/hotplug changes from Ingo Molnar:
    "This is a pretty large, multi-arch series unifying and generalizing
    the various disjunct pieces of idle routines that architectures have
    historically copied from each other and have grown in random, wildly
    inconsistent and sometimes buggy directions:

    101 files changed, 455 insertions(+), 1328 deletions(-)

    this went through a number of review and test iterations before it was
    committed, it was tested on various architectures, was exposed to
    linux-next for quite some time - nevertheless it might cause problems
    on architectures that don't read the mailing lists and don't regularly
    test linux-next.

    This cat herding excercise was motivated by the -rt kernel, and was
    brought to you by Thomas "the Whip" Gleixner."

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    idle: Remove GENERIC_IDLE_LOOP config switch
    um: Use generic idle loop
    ia64: Make sure interrupts enabled when we "safe_halt()"
    sparc: Use generic idle loop
    idle: Remove unused ARCH_HAS_DEFAULT_IDLE
    bfin: Fix typo in arch_cpu_idle()
    xtensa: Use generic idle loop
    x86: Use generic idle loop
    unicore: Use generic idle loop
    tile: Use generic idle loop
    tile: Enter idle with preemption disabled
    sh: Use generic idle loop
    score: Use generic idle loop
    s390: Use generic idle loop
    powerpc: Use generic idle loop
    parisc: Use generic idle loop
    openrisc: Use generic idle loop
    mn10300: Use generic idle loop
    mips: Use generic idle loop
    microblaze: Use generic idle loop
    ...

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:
    "The main changes in this development cycle were:

    - full dynticks preparatory work by Frederic Weisbecker

    - factor out the cpu time accounting code better, by Li Zefan

    - multi-CPU load balancer cleanups and improvements by Joonsoo Kim

    - various smaller fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    sched: Fix init NOHZ_IDLE flag
    sched: Prevent to re-select dst-cpu in load_balance()
    sched: Rename load_balance_tmpmask to load_balance_mask
    sched: Move up affinity check to mitigate useless redoing overhead
    sched: Don't consider other cpus in our group in case of NEWLY_IDLE
    sched: Explicitly cpu_idle_type checking in rebalance_domains()
    sched: Change position of resched_cpu() in load_balance()
    sched: Fix wrong rq's runnable_avg update with rt tasks
    sched: Document task_struct::personality field
    sched/cpuacct/UML: Fix header file dependency bug on the UML build
    cgroup: Kill subsys.active flag
    sched/cpuacct: No need to check subsys active state
    sched/cpuacct: Initialize cpuacct subsystem earlier
    sched/cpuacct: Initialize root cpuacct earlier
    sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
    sched/cpuacct: Clean up cpuacct.h
    sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
    sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
    sched/cpuacct: Add cpuacct_acount_field()
    sched/cpuacct: Add cpuacct_init()
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle are mostly related to preparatory work
    for the full-dynticks work:

    - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
    advantage of numbered callbacks, do callback accelerations based on
    numbered callbacks. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/960

    - RCU documentation updates. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/570

    - Miscellaneous fixes. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/594"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    rcu: Make rcu_accelerate_cbs() note need for future grace periods
    rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
    rcu: Rename n_nocb_gp_requests to need_future_gp
    rcu: Push lock release to rcu_start_gp()'s callers
    rcu: Repurpose no-CBs event tracing to future-GP events
    rcu: Rearrange locking in rcu_start_gp()
    rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
    rcu: Accelerate RCU callbacks at grace-period end
    rcu: Export RCU_FAST_NO_HZ parameters to sysfs
    rcu: Distinguish "rcuo" kthreads by RCU flavor
    rcu: Add event tracing for no-CBs CPUs' grace periods
    rcu: Add event tracing for no-CBs CPUs' callback registration
    rcu: Introduce proper blocking to no-CBs kthreads GP waits
    rcu: Provide compile-time control for no-CBs CPUs
    rcu: Tone down debugging during boot-up and shutdown.
    rcu: Add softirq-stall indications to stall-warning messages
    rcu: Documentation update
    rcu: Make bugginess of code sample more evident
    rcu: Fix hlist_bl_set_first_rcu() annotation
    rcu: Delete unused rcu_node "wakemask" field
    ...

    Linus Torvalds
     
  • Also enables cleanup of some 80-col trickery.

    Cc: Richard Weinberger
    Cc: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • If the kernel was booted with the "quiet" boot option we have currently no
    chance to see why an initrd fails. Change KERN_WARNING to KERN_ERR to see
    what is going on.

    Signed-off-by: Richard Weinberger
    Cc: "H. Peter Anvin"
    Cc: Rusty Russell
    Cc: Jim Cromie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • As I was testing a lot of my code recently, and having several
    "successes", I accidentally noticed in the dmesg this little line:

    start_kernel(): bug: interrupts were enabled *very* early, fixing it

    Sure enough, one of my patches two commits ago enabled interrupts early.
    The sad part here is that I never noticed it, and I ran several tests with
    ktest too, and ktest did not notice this line.

    What ktest looks for (and so does many other automated testing scripts) is
    a back trace produced by a WARN_ON() or BUG(). As a back trace was never
    produced, my buggy patch could have slipped into linux-next, or even
    worse, mainline.

    Adding a WARN(!irqs_disabled()) makes this bug a little more obvious:

    PID hash table entries: 4096 (order: 3, 32768 bytes)
    __ex_table already sorted, skipping sort
    Checking aperture...
    No AGP bridge found
    Calgary: detecting Calgary via BIOS EBDA area
    Calgary: Unable to locate Rio Grande table in EBDA - bailing!
    Memory: 2003252k/2054848k available (4857k kernel code, 460k absent, 51136k reserved, 6210k data, 1096k init)
    ------------[ cut here ]------------
    WARNING: at /home/rostedt/work/git/linux-trace.git/init/main.c:543 start_kernel+0x21e/0x415()
    Hardware name: To Be Filled By O.E.M.
    Interrupts were enabled *very* early, fixing it
    Modules linked in:
    Pid: 0, comm: swapper/0 Not tainted 3.8.0-test+ #286
    Call Trace:
    warn_slowpath_common+0x83/0x9b
    warn_slowpath_fmt+0x46/0x48
    start_kernel+0x21e/0x415
    x86_64_start_reservations+0x10e/0x112
    x86_64_start_kernel+0x102/0x111
    ---[ end trace 007d8b0491b4f5d8 ]---
    Preemptible hierarchical RCU implementation.
    RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
    NR_IRQS:4352 nr_irqs:712 16
    Console: colour VGA+ 80x25
    console [ttyS0] enabled, bootconsole disabled

    Do you see it?

    The original version of this patch just slapped a WARN_ON() in there and
    kept the printk(). Ard van Breemen suggested using the WARN() interface,
    which makes the code a bit cleaner.

    Also, while examining other warnings in init/main.c, I found two other
    locations that deserve a bloody murder scream if their conditions are hit,
    and updated them accordingly.

    Signed-off-by: Steven Rostedt
    Cc: Ard van Breemen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

27 Apr, 2013

1 commit

  • Turn the full dynticks passive dependency on VIRT_CPU_ACCOUNTING_GEN
    to an active one.

    The full dynticks Kconfig is currently hidden behind the full dynticks
    cputime accounting, which is an awkward and counter-intuitive layout:
    the user first has to select the dynticks cputime accounting in order
    to make the full dynticks feature to be visible.

    We definetly want it the other way around. The usual way to perform
    this kind of active dependency is use "select" on the depended target.
    Now we can't use the Kconfig "select" instruction when the target is
    a "choice".

    So this patch inspires on how the RCU subsystem Kconfig interact
    with its dependencies on SMP and PREEMPT: we make sure that cputime
    accounting can't propose another option than VIRT_CPU_ACCOUNTING_GEN
    when NO_HZ_FULL is selected by using the right "depends on" instruction
    for each cputime accounting choices.

    v2: Keep full dynticks cputime accounting available even without
    full dynticks, as per Paul McKenney's suggestion.

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

25 Apr, 2013

1 commit


19 Apr, 2013

1 commit

  • We need full dynticks CPU to also be RCU nocb so
    that we don't have to keep the tick to handle RCU
    callbacks.

    Make sure the range passed to nohz_full= boot
    parameter is a subset of rcu_nocbs=

    The CPUs that fail to meet this requirement will be
    excluded from the nohz_full range. This is checked
    early in boot time, before any CPU has the opportunity
    to stop its tick.

    Suggested-by: Steven Rostedt
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

10 Apr, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    * Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ
    take advantage of numbered callbacks, do additional callback
    accelerations based on numbered callbacks. Posted to LKML
    at https://lkml.org/lkml/2013/3/18/960.

    * RCU documentation updates. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/570.

    * Miscellaneous fixes. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/594.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar