06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

04 May, 2013

1 commit

  • Commit 0637e029392386e6996f5d6574aadccee8315efa
    ("nohz: Select wide RCU nocb for full dynticks") intended
    to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
    enabled.

    However this option is part of a choice menu and Kconfig's
    "select" instruction has no effect on such targets.

    Fix this by using reverse dependencies on the targets we
    don't want instead.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

02 May, 2013

4 commits

  • The full dynticks tree needs the latest RCU and sched
    upstream updates in order to fix some dependencies.

    Merge a common upstream merge point that has these
    updates.

    Conflicts:
    include/linux/perf_event.h
    kernel/rcutree.h
    kernel/rcutree_plugin.h

    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     
  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     
  • Commit f91eb62f71b3 ("init: scream bloody murder if interrupts are
    enabled too early") added three new warnings. The first two seemed
    reasonable, but the third included a warning when an initcall returned
    non-zero. Although, the third WARN() does include an imbalanced preempt
    disabled, or irqs disable, it shouldn't warn if it only had an initcall
    that just returns non-zero.

    In fact, according to Linus, it shouldn't print at all. As it only
    prints with initcall_debug set, and that already shows enough
    information to fix things.

    Link: http://lkml.kernel.org/r/CA+55aFzaBC5SFi7=F2mfm+KWY5qTsBmOqgbbs8E+LUS8JK-sBg@mail.gmail.com

    Suggested-by: Linus Torvalds
    Reported-by: Konrad Rzeszutek Wilk
    Signed-off-by: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

01 May, 2013

2 commits

  • The kconfig language requires that dependent options all follow the
    menuconfig symbol in order to be collapsed below it. Recently some hidden
    options were added below the EXPERT menuconfig, but did not depend on
    EXPERT (because hidden options can't). This broke the display. So
    re-order all these options, and while we're here stick the PCI quirks
    under the EXPERT menu (since it isn't sitting with any related options).

    Before this commit, we get:
    [*] Configure standard kernel features (expert users) --->
    [ ] Sysctl syscall support
    [*] Load all symbols for debugging/ksymoops
    ...
    [ ] Embedded system

    Now we get the older (and correct) behavior:
    [*] Configure standard kernel features (expert users) --->
    [ ] Embedded system
    And if you go into the expert menu you get the expert options:
    [ ] Sysctl syscall support
    [*] Load all symbols for debugging/ksymoops
    ...

    Signed-off-by: Mike Frysinger
    Acked-by: Randy Dunlap
    Cc: zhangwei(Jovi)
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • These are the only users of call_usermodehelper_fns(). This function
    suffers from not being able to determine if the cleanup is called. Even
    if in this places the cleanup pointer is NULL, convert them to use the
    separate call_usermodehelper_setup() + call_usermodehelper_exec()
    functions so we can remove the _fns variant.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     

30 Apr, 2013

7 commits

  • Pull core timer updates from Ingo Molnar:
    "The main changes in this cycle's merge are:

    - Implement shadow timekeeper to shorten in kernel reader side
    blocking, by Thomas Gleixner.

    - Posix timers enhancements by Pavel Emelyanov:

    - allocate timer ID per process, so that exact timer ID allocations
    can be re-created be checkpoint/restore code.

    - debuggability and tooling (/proc/PID/timers, etc.) improvements.

    - suspend/resume enhancements by Feng Tang: on certain new Intel Atom
    processors (Penwell and Cloverview), there is a feature that the
    TSC won't stop in S3 state, so the TSC value won't be reset to 0
    after resume. This can be taken advantage of by the generic via
    the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
    recover/approximate sleep time, the main (and precise) clocksource
    can be used.

    - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
    CPUs the file goes beyond 4MB of size and thus the current
    simplistic seqfile approach fails. Convert /proc/timer_list to a
    proper seq_file with its own iterator.

    - Cleanups and refactorings of the core timekeeping code by John
    Stultz.

    - International Atomic Clock time is managed by the NTP code
    internally currently but not exposed externally. Separate the TAI
    code out and add CLOCK_TAI support and TAI support to the hrtimer
    and posix-timer code, by John Stultz.

    - Add deep idle support enhacement to the broadcast clockevents core
    timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
    clockevents feature (which will be utilized by future clockevents
    driver updates), which allows the use of IRQ affinities to avoid
    spurious wakeups of idle CPUs - the right CPU with an expiring
    timer will be woken.

    - Add new ARM bcm281xx clocksource driver, by Christian Daudt

    - ... various other fixes and cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
    clockevents: Set dummy handler on CPU_DEAD shutdown
    timekeeping: Update tk->cycle_last in resume
    posix-timers: Remove unused variable
    clockevents: Switch into oneshot mode even if broadcast registered late
    timer_list: Convert timer list to be a proper seq_file
    timer_list: Split timer_list_show_tickdevices
    posix-timers: Show sigevent info in proc file
    posix-timers: Introduce /proc/PID/timers file
    posix timers: Allocate timer id per process (v2)
    timekeeping: Make sure to notify hrtimers when TAI offset changes
    hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
    hrtimer: Add expiry time overflow check in hrtimer_interrupt
    timekeeping: Shorten seq_count region
    timekeeping: Implement a shadow timekeeper
    timekeeping: Delay update of clock->cycle_last
    timekeeping: Store cycle_last value in timekeeper struct as well
    ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
    timekeeping: Simplify tai updating from do_adjtimex
    timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
    timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
    ...

    Linus Torvalds
     
  • Pull SMP/hotplug changes from Ingo Molnar:
    "This is a pretty large, multi-arch series unifying and generalizing
    the various disjunct pieces of idle routines that architectures have
    historically copied from each other and have grown in random, wildly
    inconsistent and sometimes buggy directions:

    101 files changed, 455 insertions(+), 1328 deletions(-)

    this went through a number of review and test iterations before it was
    committed, it was tested on various architectures, was exposed to
    linux-next for quite some time - nevertheless it might cause problems
    on architectures that don't read the mailing lists and don't regularly
    test linux-next.

    This cat herding excercise was motivated by the -rt kernel, and was
    brought to you by Thomas "the Whip" Gleixner."

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    idle: Remove GENERIC_IDLE_LOOP config switch
    um: Use generic idle loop
    ia64: Make sure interrupts enabled when we "safe_halt()"
    sparc: Use generic idle loop
    idle: Remove unused ARCH_HAS_DEFAULT_IDLE
    bfin: Fix typo in arch_cpu_idle()
    xtensa: Use generic idle loop
    x86: Use generic idle loop
    unicore: Use generic idle loop
    tile: Use generic idle loop
    tile: Enter idle with preemption disabled
    sh: Use generic idle loop
    score: Use generic idle loop
    s390: Use generic idle loop
    powerpc: Use generic idle loop
    parisc: Use generic idle loop
    openrisc: Use generic idle loop
    mn10300: Use generic idle loop
    mips: Use generic idle loop
    microblaze: Use generic idle loop
    ...

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:
    "The main changes in this development cycle were:

    - full dynticks preparatory work by Frederic Weisbecker

    - factor out the cpu time accounting code better, by Li Zefan

    - multi-CPU load balancer cleanups and improvements by Joonsoo Kim

    - various smaller fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    sched: Fix init NOHZ_IDLE flag
    sched: Prevent to re-select dst-cpu in load_balance()
    sched: Rename load_balance_tmpmask to load_balance_mask
    sched: Move up affinity check to mitigate useless redoing overhead
    sched: Don't consider other cpus in our group in case of NEWLY_IDLE
    sched: Explicitly cpu_idle_type checking in rebalance_domains()
    sched: Change position of resched_cpu() in load_balance()
    sched: Fix wrong rq's runnable_avg update with rt tasks
    sched: Document task_struct::personality field
    sched/cpuacct/UML: Fix header file dependency bug on the UML build
    cgroup: Kill subsys.active flag
    sched/cpuacct: No need to check subsys active state
    sched/cpuacct: Initialize cpuacct subsystem earlier
    sched/cpuacct: Initialize root cpuacct earlier
    sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
    sched/cpuacct: Clean up cpuacct.h
    sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
    sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
    sched/cpuacct: Add cpuacct_acount_field()
    sched/cpuacct: Add cpuacct_init()
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle are mostly related to preparatory work
    for the full-dynticks work:

    - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
    advantage of numbered callbacks, do callback accelerations based on
    numbered callbacks. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/960

    - RCU documentation updates. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/570

    - Miscellaneous fixes. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/594"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    rcu: Make rcu_accelerate_cbs() note need for future grace periods
    rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
    rcu: Rename n_nocb_gp_requests to need_future_gp
    rcu: Push lock release to rcu_start_gp()'s callers
    rcu: Repurpose no-CBs event tracing to future-GP events
    rcu: Rearrange locking in rcu_start_gp()
    rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
    rcu: Accelerate RCU callbacks at grace-period end
    rcu: Export RCU_FAST_NO_HZ parameters to sysfs
    rcu: Distinguish "rcuo" kthreads by RCU flavor
    rcu: Add event tracing for no-CBs CPUs' grace periods
    rcu: Add event tracing for no-CBs CPUs' callback registration
    rcu: Introduce proper blocking to no-CBs kthreads GP waits
    rcu: Provide compile-time control for no-CBs CPUs
    rcu: Tone down debugging during boot-up and shutdown.
    rcu: Add softirq-stall indications to stall-warning messages
    rcu: Documentation update
    rcu: Make bugginess of code sample more evident
    rcu: Fix hlist_bl_set_first_rcu() annotation
    rcu: Delete unused rcu_node "wakemask" field
    ...

    Linus Torvalds
     
  • Also enables cleanup of some 80-col trickery.

    Cc: Richard Weinberger
    Cc: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • If the kernel was booted with the "quiet" boot option we have currently no
    chance to see why an initrd fails. Change KERN_WARNING to KERN_ERR to see
    what is going on.

    Signed-off-by: Richard Weinberger
    Cc: "H. Peter Anvin"
    Cc: Rusty Russell
    Cc: Jim Cromie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • As I was testing a lot of my code recently, and having several
    "successes", I accidentally noticed in the dmesg this little line:

    start_kernel(): bug: interrupts were enabled *very* early, fixing it

    Sure enough, one of my patches two commits ago enabled interrupts early.
    The sad part here is that I never noticed it, and I ran several tests with
    ktest too, and ktest did not notice this line.

    What ktest looks for (and so does many other automated testing scripts) is
    a back trace produced by a WARN_ON() or BUG(). As a back trace was never
    produced, my buggy patch could have slipped into linux-next, or even
    worse, mainline.

    Adding a WARN(!irqs_disabled()) makes this bug a little more obvious:

    PID hash table entries: 4096 (order: 3, 32768 bytes)
    __ex_table already sorted, skipping sort
    Checking aperture...
    No AGP bridge found
    Calgary: detecting Calgary via BIOS EBDA area
    Calgary: Unable to locate Rio Grande table in EBDA - bailing!
    Memory: 2003252k/2054848k available (4857k kernel code, 460k absent, 51136k reserved, 6210k data, 1096k init)
    ------------[ cut here ]------------
    WARNING: at /home/rostedt/work/git/linux-trace.git/init/main.c:543 start_kernel+0x21e/0x415()
    Hardware name: To Be Filled By O.E.M.
    Interrupts were enabled *very* early, fixing it
    Modules linked in:
    Pid: 0, comm: swapper/0 Not tainted 3.8.0-test+ #286
    Call Trace:
    warn_slowpath_common+0x83/0x9b
    warn_slowpath_fmt+0x46/0x48
    start_kernel+0x21e/0x415
    x86_64_start_reservations+0x10e/0x112
    x86_64_start_kernel+0x102/0x111
    ---[ end trace 007d8b0491b4f5d8 ]---
    Preemptible hierarchical RCU implementation.
    RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
    NR_IRQS:4352 nr_irqs:712 16
    Console: colour VGA+ 80x25
    console [ttyS0] enabled, bootconsole disabled

    Do you see it?

    The original version of this patch just slapped a WARN_ON() in there and
    kept the printk(). Ard van Breemen suggested using the WARN() interface,
    which makes the code a bit cleaner.

    Also, while examining other warnings in init/main.c, I found two other
    locations that deserve a bloody murder scream if their conditions are hit,
    and updated them accordingly.

    Signed-off-by: Steven Rostedt
    Cc: Ard van Breemen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

27 Apr, 2013

1 commit

  • Turn the full dynticks passive dependency on VIRT_CPU_ACCOUNTING_GEN
    to an active one.

    The full dynticks Kconfig is currently hidden behind the full dynticks
    cputime accounting, which is an awkward and counter-intuitive layout:
    the user first has to select the dynticks cputime accounting in order
    to make the full dynticks feature to be visible.

    We definetly want it the other way around. The usual way to perform
    this kind of active dependency is use "select" on the depended target.
    Now we can't use the Kconfig "select" instruction when the target is
    a "choice".

    So this patch inspires on how the RCU subsystem Kconfig interact
    with its dependencies on SMP and PREEMPT: we make sure that cputime
    accounting can't propose another option than VIRT_CPU_ACCOUNTING_GEN
    when NO_HZ_FULL is selected by using the right "depends on" instruction
    for each cputime accounting choices.

    v2: Keep full dynticks cputime accounting available even without
    full dynticks, as per Paul McKenney's suggestion.

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

25 Apr, 2013

1 commit


19 Apr, 2013

1 commit

  • We need full dynticks CPU to also be RCU nocb so
    that we don't have to keep the tick to handle RCU
    callbacks.

    Make sure the range passed to nohz_full= boot
    parameter is a subset of rcu_nocbs=

    The CPUs that fail to meet this requirement will be
    excluded from the nohz_full range. This is checked
    early in boot time, before any CPU has the opportunity
    to stop its tick.

    Suggested-by: Steven Rostedt
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

10 Apr, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    * Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ
    take advantage of numbered callbacks, do additional callback
    accelerations based on numbered callbacks. Posted to LKML
    at https://lkml.org/lkml/2013/3/18/960.

    * RCU documentation updates. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/570.

    * Miscellaneous fixes. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/594.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

08 Apr, 2013

1 commit

  • For now this calls cpu_idle(), but in the long run we want to move the
    cpu bringup code to the core and therefor we add a state argument.

    Signed-off-by: Thomas Gleixner
    Cc: Linus Torvalds
    Cc: Rusty Russell
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Reviewed-by: Cc: Srivatsa S. Bhat
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130321215233.583190032@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

03 Apr, 2013

1 commit

  • We are planning to convert the dynticks Kconfig options layout
    into a choice menu. The user must be able to easily pick
    any of the following implementations: constant periodic tick,
    idle dynticks, full dynticks.

    As this implies a mutual exclusion, the two dynticks implementions
    need to converge on the selection of a common Kconfig option in order
    to ease the sharing of a common infrastructure.

    It would thus seem pretty natural to reuse CONFIG_NO_HZ to
    that end. It already implements all the idle dynticks code
    and the full dynticks depends on all that code for now.
    So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
    CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

    On the other hand we want to stay backward compatible: if
    CONFIG_NO_HZ is set in an older config file, we want to
    enable CONFIG_NO_HZ_IDLE by default.

    But we can't afford both at the same time or we run into
    a circular dependency:

    1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
    CONFIG_NO_HZ
    2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

    We might be able to support that from Kconfig/Kbuild but it
    may not be wise to introduce such a confusing behaviour.

    So to solve this, create a new CONFIG_NO_HZ_COMMON option
    which gathers the common code between idle and full dynticks
    (that common code for now is simply the idle dynticks code)
    and select it from their referring Kconfig.

    Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
    to it for backward compatibility.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

26 Mar, 2013

3 commits

  • Because RCU callbacks are now associated with the number of the grace
    period that they must wait for, CPUs can now take advance callbacks
    corresponding to grace periods that ended while a given CPU was in
    dyntick-idle mode. This eliminates the need to try forcing the RCU
    state machine while entering idle, thus reducing the CPU intensiveness
    of RCU_FAST_NO_HZ, which should increase its energy efficiency.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
    the CPU number, for example, "rcuo". This is problematic given that
    there are either two or three RCU flavors, each of which gets a per-CPU
    kthread with exactly the same name. This commit therefore introduces
    a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
    'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
    to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
    "rcuob/0", "rcuop/0", and "rcuos/0".

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Dietmar Eggemann

    Paul E. McKenney
     
  • Currently, the only way to specify no-CBs CPUs is via the rcu_nocbs
    kernel command-line parameter. This is inconvenient in some cases,
    particularly for randconfig testing, so this commit adds a new set of
    kernel configuration parameters. CONFIG_RCU_NOCB_CPU_NONE (the default)
    retains the old behavior, CONFIG_RCU_NOCB_CPU_ZERO offloads callback
    processing from CPU 0 (along with any other CPUs specified by the
    rcu_nocbs boot-time parameter), and CONFIG_RCU_NOCB_CPU_ALL offloads
    callback processing from all CPUs.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

13 Mar, 2013

2 commits

  • Remove "config EXPERIMENTAL" itself, now that every "depends on" it has
    been removed from the tree.

    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore
    at least one no-CBs CPU must remain online at any given time. These
    restrictions are problematic in some situations, such as cases where
    all CPUs must run a real-time workload that needs to be insulated from
    OS jitter and latencies due to RCU callback invocation. This commit
    therefore provides no-CBs CPUs a (very crude and energy-inefficient)
    way to start and to wait for grace periods independently of the normal
    RCU callback mechanisms. This approach allows any or all of the CPUs to
    be designated as no-CBs CPUs, and allows any proper subset of the CPUs
    (whether no-CBs CPUs or not) to be offlined.

    This commit also provides a fix for a locking bug spotted by Xie
    ChanglongX .

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

08 Mar, 2013

1 commit

  • Until we provide the nohz_mask boot parameter, keeping
    the context tracking probes disabled by default is pointless
    since what we want is to runtime test this code anyway.

    It's furthermore confusing for the users which don't expect
    the probes to be off when they select RCU user mode or full
    dynticks cputime accounting.

    Let's enable these probes selftests by default for now.

    Suggested: Steven Rostedt
    Signed-off-by: Frederic Weisbecker
    Cc: Li Zhong
    Cc: Kevin Hilman
    Cc: Mats Liljegren
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Namhyung Kim
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: Paul E. McKenney

    Frederic Weisbecker
     

07 Mar, 2013

1 commit

  • To convert the clockevents code to cpumask_var_t we need to move the
    init call after the allocator setup.

    Clockevents are earliest registered from time_init() as they need
    interrupts being set up, so this is safe.

    Signed-off-by: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20130306111537.304379448@linutronix.de
    Cc: Rusty Russell

    Thomas Gleixner
     

02 Mar, 2013

1 commit

  • Pull new ARC architecture from Vineet Gupta:
    "Initial ARC Linux port with some fixes on top for 3.9-rc1:

    I would like to introduce the Linux port to ARC Processors (from
    Synopsys) for 3.9-rc1. The patch-set has been discussed on the public
    lists since Nov and has received a fair bit of review, specially from
    Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb...

    The arch bits are in arch/arc, some asm-generic changes (acked by
    Arnd), a minor change to PARISC (acked by Helge).

    The series is a touch bigger for a new port for 2 main reasons:

    1. It enables a basic kernel in first sub-series and adds
    ptrace/kgdb/.. later

    2. Some of the fallout of review (DeviceTree support, multi-platform-
    image support) were added on top of orig series, primarily to
    record the revision history.

    This updated pull request additionally contains

    - fixes due to our GNU tools catching up with the new syscall/ptrace
    ABI

    - some (minor) cross-arch Kconfig updates."

    * tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits)
    ARC: split elf.h into uapi and export it for userspace
    ARC: Fixup the current ABI version
    ARC: gdbserver using regset interface possibly broken
    ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window
    ARC: make a copy of flat DT
    ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed"
    ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled
    ARC: Fix pt_orig_r8 access
    ARC: [3.9] Fallout of hlist iterator update
    ARC: 64bit RTSC timestamp hardware issue
    ARC: Don't fiddle with non-existent caches
    ARC: Add self to MAINTAINERS
    ARC: Provide a default serial.h for uart drivers needing BASE_BAUD
    ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux
    ARC: [Review] Multi-platform image #8: platform registers SMP callbacks
    ARC: [Review] Multi-platform image #7: SMP common code to use callbacks
    ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional
    ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
    ARC: [Review] Multi-platform image #4: Isolate platform headers
    ARC: [Review] Multi-platform image #3: switch to board callback
    ...

    Linus Torvalds
     

26 Feb, 2013

2 commits

  • Pull user namespace and namespace infrastructure changes from Eric W Biederman:
    "This set of changes starts with a few small enhnacements to the user
    namespace. reboot support, allowing more arbitrary mappings, and
    support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
    user namespace root.

    I do my best to document that if you care about limiting your
    unprivileged users that when you have the user namespace support
    enabled you will need to enable memory control groups.

    There is a minor bug fix to prevent overflowing the stack if someone
    creates way too many user namespaces.

    The bulk of the changes are a continuation of the kuid/kgid push down
    work through the filesystems. These changes make using uids and gids
    typesafe which ensures that these filesystems are safe to use when
    multiple user namespaces are in use. The filesystems converted for
    3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
    changes for these filesystems were a little more involved so I split
    the changes into smaller hopefully obviously correct changes.

    XFS is the only filesystem that remains. I was hoping I could get
    that in this release so that user namespace support would be enabled
    with an allyesconfig or an allmodconfig but it looks like the xfs
    changes need another couple of days before it they are ready."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
    cifs: Enable building with user namespaces enabled.
    cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
    cifs: Convert struct cifs_sb_info to use kuids and kgids
    cifs: Modify struct smb_vol to use kuids and kgids
    cifs: Convert struct cifsFileInfo to use a kuid
    cifs: Convert struct cifs_fattr to use kuid and kgids
    cifs: Convert struct tcon_link to use a kuid.
    cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
    cifs: Convert from a kuid before printing current_fsuid
    cifs: Use kuids and kgids SID to uid/gid mapping
    cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
    cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
    cifs: Override unmappable incoming uids and gids
    nfsd: Enable building with user namespaces enabled.
    nfsd: Properly compare and initialize kuids and kgids
    nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
    nfsd: Modify nfsd4_cb_sec to use kuids and kgids
    nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
    nfsd: Convert nfsxdr to use kuids and kgids
    nfsd: Convert nfs3xdr to use kuids and kgids
    ...

    Linus Torvalds
     
  • Pull module update from Rusty Russell:
    "The sweeping change is to make add_taint() explicitly indicate whether
    to disable lockdep, but it's a mechanical change."

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    MODSIGN: Add option to not sign modules during modules_install
    MODSIGN: Add -s option to sign-file
    MODSIGN: Specify the hash algorithm on sign-file command line
    MODSIGN: Simplify Makefile with a Kconfig helper
    module: clean up load_module a little more.
    modpost: Ignore ARC specific non-alloc sections
    module: constify within_module_*
    taint: add explicit flag to show whether lock dep is still OK.
    module: printk message when module signature fail taints kernel.

    Linus Torvalds
     

22 Feb, 2013

2 commits

  • Pull misc ia64 bits from Tony Luck.

    * tag 'please-pull-misc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
    MAINTAINERS: update SGI & ia64 Altix stuff
    sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch

    Linus Torvalds
     
  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     

20 Feb, 2013

3 commits

  • Pull async changes from Tejun Heo:
    "These are followups for the earlier deadlock issue involving async
    ending up waiting for itself through block requesting module[1]. The
    following changes are made by these commits.

    - Instead of requesting default elevator on each request_queue init,
    block now requests it once early during boot.

    - Kmod triggers warning if invoked from an async worker.

    - Async synchronization implementation has been reimplemented. It's
    a lot simpler now."

    * 'for-3.9-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    async: initialise list heads to fix crash
    async: replace list of active domains with global list of pending items
    async: keep pending tasks on async_domain and remove async_pending
    async: use ULLONG_MAX for infinity cookie value
    async: bring sanity to the use of words domain and running
    async, kmod: warn on synchronous request_module() from async workers
    block: don't request module during elevator init
    init, block: try to load default elevator module early during boot

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:
    "Main changes:

    - scheduler side full-dynticks (user-space execution is undisturbed
    and receives no timer IRQs) preparation changes that convert the
    cputime accounting code to be full-dynticks ready, from Frederic
    Weisbecker.

    - Initial sched.h split-up changes, by Clark Williams

    - select_idle_sibling() performance improvement by Mike Galbraith:

    " 1 tbench pair (worst case) in a 10 core + SMT package:

    pre 15.22 MB/sec 1 procs
    post 252.01 MB/sec 1 procs "

    - sched_rr_get_interval() ABI fix/change. We think this detail is not
    used by apps (so it's not an ABI in practice), but lets keep it
    under observation.

    - misc RT scheduling cleanups, optimizations"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    sched/rt: Add header to
    cputime: Remove irqsave from seqlock readers
    sched, powerpc: Fix sched.h split-up build failure
    cputime: Restore CPU_ACCOUNTING config defaults for PPC64
    sched/rt: Move rt specific bits into new header file
    sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
    sched: Move sched.h sysctl bits into separate header
    sched: Fix signedness bug in yield_to()
    sched: Fix select_idle_sibling() bouncing cow syndrome
    sched/rt: Further simplify pick_rt_task()
    sched/rt: Do not account zero delta_exec in update_curr_rt()
    cputime: Safely read cputime of full dynticks CPUs
    kvm: Prepare to add generic guest entry/exit callbacks
    cputime: Use accessors to read task cputime stats
    cputime: Allow dynamic switch between tick/virtual based cputime accounting
    cputime: Generic on-demand virtual cputime accounting
    cputime: Move default nsecs_to_cputime() to jiffies based cputime file
    cputime: Librarize per nsecs resolution cputime definitions
    cputime: Avoid multiplication overflow on utime scaling
    context_tracking: Export context state for generic vtime
    ...

    Fix up conflict in kernel/context_tracking.c due to comment additions.

    Linus Torvalds
     
  • Pull irq core changes from Ingo Molnar:
    "The biggest changes are the IRQ-work and printk changes from Frederic
    Weisbecker, which prepare the code for 'full dynticks' (the ability to
    stop or slow down the periodic tick arbitrarily, not just in idle time
    as today):

    - Don't stop tick with irq works pending. This fix is generally
    useful and concerns archs that can't raise self IPIs.

    - Flush irq works before CPU offlining.

    - Introduce "lazy" irq works that can wait for the next tick to be
    executed, unless it's stopped.

    - Implement klogd wake up using irq work. This removes the ad-hoc
    printk_tick()/printk_needs_cpu() hooks and make it working even in
    dynticks mode.

    - Cleanups and fixes."

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Export enable/disable_percpu_irq()
    arch Kconfig: Remove references to IRQ_PER_CPU
    irq_work: Remove return value from the irq_work_queue() function
    genirq: Avoid deadlock in spurious handling
    printk: Wake up klogd using irq_work
    irq_work: Make self-IPIs optable
    irq_work: Warn if there's still work on cpu_down
    irq_work: Flush work on CPU_DYING
    irq_work: Don't stop the tick with pending works
    nohz: Add API to check tick state
    irq_work: Remove CONFIG_HAVE_IRQ_WORK
    irq_work: Fix racy check on work pending flag
    irq_work: Fix racy IRQ_WORK_BUSY flag setting

    Linus Torvalds
     

16 Feb, 2013

1 commit

  • PARISC defines /proc/sys/kernel/unaligned-trap to runtime toggle
    unaligned access emulation.

    The exact mechanics of enablig/disabling are still arch specific, we can
    make the sysctl usable by other arches.

    Signed-off-by: Vineet Gupta
    Acked-by: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn

    Vineet Gupta
     

13 Feb, 2013

3 commits