Eric Lee / smarc-fsl-linux-kernel

11 Jun, 2013

1 commit

016a8d5be rcu: Don't call wakeup() with rcu_node structure ->lock held ... Browse Code »

This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section. Please see below for
the long version of this story.

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [] lock_acquire+0x91/0x1f0
> [12572.968337] [] _raw_spin_lock+0x40/0x80
> [12572.976633] [] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [] __schedule+0x2cf/0x9c0
> [12573.001652] [] schedule_user+0x2e/0x70
> [12573.009998] [] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [] lock_acquire+0x91/0x1f0
> [12573.042930] [] _raw_spin_lock+0x40/0x80
> [12573.051248] [] wake_up_new_task+0xb7/0x260
> [12573.059579] [] do_fork+0x105/0x470
> [12573.067880] [] kernel_thread+0x26/0x30
> [12573.076202] [] rest_init+0x23/0x140
> [12573.084508] [] start_kernel+0x3f1/0x3fe
> [12573.092852] [] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [] lock_acquire+0x91/0x1f0
> [12573.133829] [] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [] try_to_wake_up+0x31/0x320
> [12573.150065] [] default_wake_function+0x12/0x20
> [12573.158151] [] autoremove_wake_function+0x18/0x40
> [12573.166195] [] __wake_up_common+0x58/0x90
> [12573.174215] [] __wake_up+0x39/0x50
> [12573.182146] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [] kthread+0xed/0x100
> [12573.213656] [] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [] lock_acquire+0x91/0x1f0
> [12573.243783] [] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [] __wake_up+0x23/0x50
> [12573.258505] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [] kthread+0xed/0x100
> [12573.287807] [] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).

> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [] __lock_acquire+0x1786/0x1af0
> [12573.316568] [] lock_acquire+0x91/0x1f0
> [12573.323825] [] _raw_spin_lock+0x40/0x80
> [12573.331081] [] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [] find_get_context+0x4e/0x1f0
> [12573.360211] [] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [] SyS_perf_event_open+0x9/0x10
> [12573.374816] [] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.

CPU0 CPU1 CPU2
---- ---- ----

rcu_nocb_kthread()
lock(rq->lock);
lock(ctx->lock);
lock(rnp->lock);

wake_up();

lock(rq->lock);

rcu_read_unlock();

rcu_read_unlock_special();

lock(rnp->lock);
lock(ctx->lock);

**** DEADLOCK ****

> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [] dump_stack+0x19/0x1b
> [12573.582284] [] print_circular_bug+0x200/0x20f
> [12573.589637] [] __lock_acquire+0x1786/0x1af0
> [12573.596982] [] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [] lock_acquire+0x91/0x1f0
> [12573.611652] [] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [] _raw_spin_lock+0x40/0x80
> [12573.626331] [] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [] ? delay_tsc+0x90/0xe0
> [12573.662964] [] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [] ? __perf_event_enable+0x370/0x370
> [12573.684981] [] find_get_context+0x4e/0x1f0
> [12573.692358] [] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [] ? get_parent_ip+0xd/0x50
> [12573.707135] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [] SyS_perf_event_open+0x9/0x10
> [12573.721996] [] tracesys+0xdd/0xe2

This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.

Reported-by: Dave Jones
Signed-off-by: Steven Rostedt
Signed-off-by: Paul E. McKenney

Steven Rostedt
2013-06-11 04:37:11 +0800

06 May, 2013

1 commit

534c97b09 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.

This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.

This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:

- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.

- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.

- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.

The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.

Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:

- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.

- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.

- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.

The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.

This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.

This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:

- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.

- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.

- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.

- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.

- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)

Future plans:

- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.

- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.

I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.

More technical details can be found in Documentation/timers/NO_HZ.txt"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...

Linus Torvalds
2013-05-06 04:23:27 +0800

04 May, 2013

1 commit

73c308287 rcu: Fix full dynticks' dependency on wide RCU nocb mode ... Browse Code »

Commit 0637e029392386e6996f5d6574aadccee8315efa
("nohz: Select wide RCU nocb for full dynticks") intended
to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
enabled.

However this option is part of a choice menu and Kconfig's
"select" instruction has no effect on such targets.

Fix this by using reverse dependencies on the targets we
don't want instead.

Reviewed-by: Paul E. McKenney
Signed-off-by: Frederic Weisbecker
Cc: Christoph Lameter
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-05-04 14:30:34 +0800

02 May, 2013

1 commit

c032862fb Merge commit '8700c95adb03 ' into timers/nohz ... Browse Code »

The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.

Merge a common upstream merge point that has these
updates.

Conflicts:
include/linux/perf_event.h
kernel/rcutree.h
kernel/rcutree_plugin.h

Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2013-05-02 23:54:19 +0800

01 May, 2013

1 commit

657a52095 init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display ... Browse Code »

The kconfig language requires that dependent options all follow the
menuconfig symbol in order to be collapsed below it. Recently some hidden
options were added below the EXPERT menuconfig, but did not depend on
EXPERT (because hidden options can't). This broke the display. So
re-order all these options, and while we're here stick the PCI quirks
under the EXPERT menu (since it isn't sitting with any related options).

Before this commit, we get:
[*] Configure standard kernel features (expert users) --->
[ ] Sysctl syscall support
[*] Load all symbols for debugging/ksymoops
...
[ ] Embedded system

Now we get the older (and correct) behavior:
[*] Configure standard kernel features (expert users) --->
[ ] Embedded system
And if you go into the expert menu you get the expert options:
[ ] Sysctl syscall support
[*] Load all symbols for debugging/ksymoops
...

Signed-off-by: Mike Frysinger
Acked-by: Randy Dunlap
Cc: zhangwei(Jovi)
Cc: Michal Marek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Frysinger
2013-05-01 08:04:09 +0800

30 Apr, 2013

1 commit

16fa94b53 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes from Ingo Molnar:
"The main changes in this development cycle were:

- full dynticks preparatory work by Frederic Weisbecker

- factor out the cpu time accounting code better, by Li Zefan

- multi-CPU load balancer cleanups and improvements by Joonsoo Kim

- various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
sched: Fix init NOHZ_IDLE flag
sched: Prevent to re-select dst-cpu in load_balance()
sched: Rename load_balance_tmpmask to load_balance_mask
sched: Move up affinity check to mitigate useless redoing overhead
sched: Don't consider other cpus in our group in case of NEWLY_IDLE
sched: Explicitly cpu_idle_type checking in rebalance_domains()
sched: Change position of resched_cpu() in load_balance()
sched: Fix wrong rq's runnable_avg update with rt tasks
sched: Document task_struct::personality field
sched/cpuacct/UML: Fix header file dependency bug on the UML build
cgroup: Kill subsys.active flag
sched/cpuacct: No need to check subsys active state
sched/cpuacct: Initialize cpuacct subsystem earlier
sched/cpuacct: Initialize root cpuacct earlier
sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
sched/cpuacct: Clean up cpuacct.h
sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
sched/cpuacct: Add cpuacct_acount_field()
sched/cpuacct: Add cpuacct_init()
...

Linus Torvalds
2013-04-30 22:43:28 +0800

27 Apr, 2013

1 commit

c58b0df12 nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config ... Browse Code »

Turn the full dynticks passive dependency on VIRT_CPU_ACCOUNTING_GEN
to an active one.

The full dynticks Kconfig is currently hidden behind the full dynticks
cputime accounting, which is an awkward and counter-intuitive layout:
the user first has to select the dynticks cputime accounting in order
to make the full dynticks feature to be visible.

We definetly want it the other way around. The usual way to perform
this kind of active dependency is use "select" on the depended target.
Now we can't use the Kconfig "select" instruction when the target is
a "choice".

So this patch inspires on how the RCU subsystem Kconfig interact
with its dependencies on SMP and PREEMPT: we make sure that cputime
accounting can't propose another option than VIRT_CPU_ACCOUNTING_GEN
when NO_HZ_FULL is selected by using the right "depends on" instruction
for each cputime accounting choices.

v2: Keep full dynticks cputime accounting available even without
full dynticks, as per Paul McKenney's suggestion.

Reported-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Christoph Lameter
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-27 00:56:59 +0800

10 Apr, 2013

1 commit

8fcfae317 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck… ... Browse Code »

…/linux-rcu into core/rcu

Pull RCU updates from Paul E. McKenney:

* Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ
take advantage of numbered callbacks, do additional callback
accelerations based on numbered callbacks. Posted to LKML
at https://lkml.org/lkml/2013/3/18/960.

* RCU documentation updates. Posted to LKML at
https://lkml.org/lkml/2013/3/18/570.

* Miscellaneous fixes. Posted to LKML at
https://lkml.org/lkml/2013/3/18/594.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2013-04-10 18:55:49 +0800

03 Apr, 2013

1 commit

3451d0243 nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON ... Browse Code »

We are planning to convert the dynticks Kconfig options layout
into a choice menu. The user must be able to easily pick
any of the following implementations: constant periodic tick,
idle dynticks, full dynticks.

As this implies a mutual exclusion, the two dynticks implementions
need to converge on the selection of a common Kconfig option in order
to ease the sharing of a common infrastructure.

It would thus seem pretty natural to reuse CONFIG_NO_HZ to
that end. It already implements all the idle dynticks code
and the full dynticks depends on all that code for now.
So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

On the other hand we want to stay backward compatible: if
CONFIG_NO_HZ is set in an older config file, we want to
enable CONFIG_NO_HZ_IDLE by default.

But we can't afford both at the same time or we run into
a circular dependency:

1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
CONFIG_NO_HZ
2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

We might be able to support that from Kconfig/Kbuild but it
may not be wise to introduce such a confusing behaviour.

So to solve this, create a new CONFIG_NO_HZ_COMMON option
which gathers the common code between idle and full dynticks
(that common code for now is simply the idle dynticks code)
and select it from their referring Kconfig.

Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
to it for backward compatibility.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-03 19:56:03 +0800

26 Mar, 2013

3 commits

c0f4dfd4f rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks ... Browse Code »

Because RCU callbacks are now associated with the number of the grace
period that they must wait for, CPUs can now take advance callbacks
corresponding to grace periods that ended while a given CPU was in
dyntick-idle mode. This eliminates the need to try forcing the RCU
state machine while entering idle, thus reducing the CPU intensiveness
of RCU_FAST_NO_HZ, which should increase its energy efficiency.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:51 +0800
a48898585 rcu: Distinguish "rcuo" kthreads by RCU flavor ... Browse Code »

Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
the CPU number, for example, "rcuo". This is problematic given that
there are either two or three RCU flavors, each of which gets a per-CPU
kthread with exactly the same name. This commit therefore introduces
a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
"rcuob/0", "rcuop/0", and "rcuos/0".

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Dietmar Eggemann

Paul E. McKenney
2013-03-26 23:04:48 +0800
911af505e rcu: Provide compile-time control for no-CBs CPUs ... Browse Code »

Currently, the only way to specify no-CBs CPUs is via the rcu_nocbs
kernel command-line parameter. This is inconvenient in some cases,
particularly for randconfig testing, so this commit adds a new set of
kernel configuration parameters. CONFIG_RCU_NOCB_CPU_NONE (the default)
retains the old behavior, CONFIG_RCU_NOCB_CPU_ZERO offloads callback
processing from CPU 0 (along with any other CPUs specified by the
rcu_nocbs boot-time parameter), and CONFIG_RCU_NOCB_CPU_ALL offloads
callback processing from all CPUs.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:43 +0800

13 Mar, 2013

2 commits

3d374d09f final removal of CONFIG_EXPERIMENTAL ... Browse Code »

Remove "config EXPERIMENTAL" itself, now that every "depends on" it has
been removed from the tree.

Signed-off-by: Kees Cook
Signed-off-by: Greg Kroah-Hartman

Kees Cook
2013-03-13 07:30:27 +0800
34ed62461 rcu: Remove restrictions on no-CBs CPUs ... Browse Code »

Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore
at least one no-CBs CPU must remain online at any given time. These
restrictions are problematic in some situations, such as cases where
all CPUs must run a real-time workload that needs to be insulated from
OS jitter and latencies due to RCU callback invocation. This commit
therefore provides no-CBs CPUs a (very crude and energy-inefficient)
way to start and to wait for grace periods independently of the normal
RCU callback mechanisms. This approach allows any or all of the CPUs to
be designated as no-CBs CPUs, and allows any proper subset of the CPUs
(whether no-CBs CPUs or not) to be offlined.

This commit also provides a fix for a locking bug spotted by Xie
ChanglongX .

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-13 02:17:51 +0800

08 Mar, 2013

1 commit

8b4387664 context_tracking: Enable probes by default for selftesting ... Browse Code »

Until we provide the nohz_mask boot parameter, keeping
the context tracking probes disabled by default is pointless
since what we want is to runtime test this code anyway.

It's furthermore confusing for the users which don't expect
the probes to be off when they select RCU user mode or full
dynticks cputime accounting.

Let's enable these probes selftests by default for now.

Suggested: Steven Rostedt
Signed-off-by: Frederic Weisbecker
Cc: Li Zhong
Cc: Kevin Hilman
Cc: Mats Liljegren
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Namhyung Kim
Cc: Andrew Morton
Cc: Thomas Gleixner
Cc: Paul E. McKenney

Frederic Weisbecker
2013-03-08 00:10:41 +0800

02 Mar, 2013

1 commit

e23b62256 Merge tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc ... Browse Code »

Pull new ARC architecture from Vineet Gupta:
"Initial ARC Linux port with some fixes on top for 3.9-rc1:

I would like to introduce the Linux port to ARC Processors (from
Synopsys) for 3.9-rc1. The patch-set has been discussed on the public
lists since Nov and has received a fair bit of review, specially from
Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb...

The arch bits are in arch/arc, some asm-generic changes (acked by
Arnd), a minor change to PARISC (acked by Helge).

The series is a touch bigger for a new port for 2 main reasons:

1. It enables a basic kernel in first sub-series and adds
ptrace/kgdb/.. later

2. Some of the fallout of review (DeviceTree support, multi-platform-
image support) were added on top of orig series, primarily to
record the revision history.

This updated pull request additionally contains

- fixes due to our GNU tools catching up with the new syscall/ptrace
ABI

- some (minor) cross-arch Kconfig updates."

* tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits)
ARC: split elf.h into uapi and export it for userspace
ARC: Fixup the current ABI version
ARC: gdbserver using regset interface possibly broken
ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window
ARC: make a copy of flat DT
ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed"
ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled
ARC: Fix pt_orig_r8 access
ARC: [3.9] Fallout of hlist iterator update
ARC: 64bit RTSC timestamp hardware issue
ARC: Don't fiddle with non-existent caches
ARC: Add self to MAINTAINERS
ARC: Provide a default serial.h for uart drivers needing BASE_BAUD
ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux
ARC: [Review] Multi-platform image #8: platform registers SMP callbacks
ARC: [Review] Multi-platform image #7: SMP common code to use callbacks
ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional
ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
ARC: [Review] Multi-platform image #4: Isolate platform headers
ARC: [Review] Multi-platform image #3: switch to board callback
...

Linus Torvalds
2013-03-02 23:58:56 +0800

26 Feb, 2013

2 commits

94f2f1423 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.

I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.

There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.

The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.

XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...

Linus Torvalds
2013-02-26 08:00:49 +0800
9043a2650 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module update from Rusty Russell:
"The sweeping change is to make add_taint() explicitly indicate whether
to disable lockdep, but it's a mechanical change."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Add option to not sign modules during modules_install
MODSIGN: Add -s option to sign-file
MODSIGN: Specify the hash algorithm on sign-file command line
MODSIGN: Simplify Makefile with a Kconfig helper
module: clean up load_module a little more.
modpost: Ignore ARC specific non-alloc sections
module: constify within_module_*
taint: add explicit flag to show whether lock dep is still OK.
module: printk message when module signature fail taints kernel.

Linus Torvalds
2013-02-26 07:41:43 +0800

22 Feb, 2013

2 commits

27ea6dfdc Merge tag 'please-pull-misc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux ... Browse Code »

Pull misc ia64 bits from Tony Luck.

* tag 'please-pull-misc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
MAINTAINERS: update SGI & ia64 Altix stuff
sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch

Linus Torvalds
2013-02-22 09:55:48 +0800
06991c28f Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1

There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:

- add a new function called devm_ioremap_resource() to properly be
able to check return values.

- remove CONFIG_EXPERIMENTAL

Other than those patches, there's not much here, some minor fixes and
updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...

Linus Torvalds
2013-02-22 04:05:51 +0800

20 Feb, 2013

2 commits

d652e1eb8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes from Ingo Molnar:
"Main changes:

- scheduler side full-dynticks (user-space execution is undisturbed
and receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready, from Frederic
Weisbecker.

- Initial sched.h split-up changes, by Clark Williams

- select_idle_sibling() performance improvement by Mike Galbraith:

" 1 tbench pair (worst case) in a 10 core + SMT package:

pre 15.22 MB/sec 1 procs
post 252.01 MB/sec 1 procs "

- sched_rr_get_interval() ABI fix/change. We think this detail is not
used by apps (so it's not an ABI in practice), but lets keep it
under observation.

- misc RT scheduling cleanups, optimizations"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
sched/rt: Add header to
cputime: Remove irqsave from seqlock readers
sched, powerpc: Fix sched.h split-up build failure
cputime: Restore CPU_ACCOUNTING config defaults for PPC64
sched/rt: Move rt specific bits into new header file
sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
sched: Move sched.h sysctl bits into separate header
sched: Fix signedness bug in yield_to()
sched: Fix select_idle_sibling() bouncing cow syndrome
sched/rt: Further simplify pick_rt_task()
sched/rt: Do not account zero delta_exec in update_curr_rt()
cputime: Safely read cputime of full dynticks CPUs
kvm: Prepare to add generic guest entry/exit callbacks
cputime: Use accessors to read task cputime stats
cputime: Allow dynamic switch between tick/virtual based cputime accounting
cputime: Generic on-demand virtual cputime accounting
cputime: Move default nsecs_to_cputime() to jiffies based cputime file
cputime: Librarize per nsecs resolution cputime definitions
cputime: Avoid multiplication overflow on utime scaling
context_tracking: Export context state for generic vtime
...

Fix up conflict in kernel/context_tracking.c due to comment additions.

Linus Torvalds
2013-02-20 10:19:48 +0800
b7133a9a1 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq core changes from Ingo Molnar:
"The biggest changes are the IRQ-work and printk changes from Frederic
Weisbecker, which prepare the code for 'full dynticks' (the ability to
stop or slow down the periodic tick arbitrarily, not just in idle time
as today):

- Don't stop tick with irq works pending. This fix is generally
useful and concerns archs that can't raise self IPIs.

- Flush irq works before CPU offlining.

- Introduce "lazy" irq works that can wait for the next tick to be
executed, unless it's stopped.

- Implement klogd wake up using irq work. This removes the ad-hoc
printk_tick()/printk_needs_cpu() hooks and make it working even in
dynticks mode.

- Cleanups and fixes."

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Export enable/disable_percpu_irq()
arch Kconfig: Remove references to IRQ_PER_CPU
irq_work: Remove return value from the irq_work_queue() function
genirq: Avoid deadlock in spurious handling
printk: Wake up klogd using irq_work
irq_work: Make self-IPIs optable
irq_work: Warn if there's still work on cpu_down
irq_work: Flush work on CPU_DYING
irq_work: Don't stop the tick with pending works
nohz: Add API to check tick state
irq_work: Remove CONFIG_HAVE_IRQ_WORK
irq_work: Fix racy check on work pending flag
irq_work: Fix racy IRQ_WORK_BUSY flag setting

Linus Torvalds
2013-02-20 09:47:58 +0800

16 Feb, 2013

1 commit

bf14e3b97 sysctl: Enable PARISC "unaligned-trap" to be used cross-arch ... Browse Code »

PARISC defines /proc/sys/kernel/unaligned-trap to runtime toggle
unaligned access emulation.

The exact mechanics of enablig/disabling are still arch specific, we can
make the sysctl usable by other arches.

Signed-off-by: Vineet Gupta
Acked-by: Helge Deller
Cc: "James E.J. Bottomley"
Cc: Helge Deller
Cc: "Eric W. Biederman"
Cc: Serge Hallyn

Vineet Gupta
2013-02-16 01:46:05 +0800

13 Feb, 2013

8 commits

139321c65 cifs: Enable building with user namespaces enabled. ... Browse Code »

Cc: Steve French
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-13 23:28:56 +0800
c9617a44b nfsd: Enable building with user namespaces enabled. ... Browse Code »

Now that the kuids and kgids conversion have propogated
through net/sunrpc/ and the fs/nfsd/ it is safe to enable
building nfsd when user namespaces are enabled.

Cc: "J. Bruce Fields"
Cc: Trond Myklebust
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-13 22:16:10 +0800
4277bbf75 nfs: Enable building with user namespaces enabled. ... Browse Code »

Now that the kuids and kgids conversion have propogated
through net/sunrpc/ and the fs/nfs/ it is safe to enable
building nfs when user namespaces are enabled.

Cc: "J. Bruce Fields"
Cc: Trond Myklebust
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-13 22:15:34 +0800
1ac7fd819 ncpfs: Support interacting with multiple user namespaces ... Browse Code »

ncpfs does not natively support uids and gids so this conversion was
simply a matter of updating the the type of the mounteduid, the uid
and the gid on the superblock. Fixing the ioctls that read them,
updating the mount option parser and the mount option printer.

Cc: Petr Vandrovec
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2013-02-13 22:15:13 +0800
0f07bd375 gfs2: Enable building with user namespaces enabled ... Browse Code »

Now that all of the necessary work has been done to push kuids and
kgids throughout gfs2 and to convert between kuids and kgids when
reading and writing the on disk structures it is safe to enable gfs2
when multiple user namespaces are enabled.

Cc: Steven Whitehouse
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-13 22:15:12 +0800
ecb528e3e ocfs2: Enable building with user namespaces enabled ... Browse Code »

Now that ocfs2 has been converted to store uids and gids in
kuid_t and kgid_t and all of the conversions have been added
to the appropriate places it is safe to allow building and
using ocfs2 with user namespace support enabled.

Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-13 22:14:32 +0800
515ee7bd9 coda: Allow coda to be built when user namespace support is enabled ... Browse Code »

Now that the coda kernel to userspace has been modified to convert
between kuids and kgids and uids and gids, and all internal
coda structures have be modified to store uids and gids as
kuids and kgids it is safe to allow code to be built with
user namespace support enabled.

Cc: Jan Harkes
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-13 22:00:55 +0800
a0a5386ac afs: Support interacting with multiple user namespaces ... Browse Code »

Modify struct afs_file_status to store owner as a kuid_t and group as
a kgid_t.

In xdr_decode_AFSFetchStatus as owner is now a kuid_t and group is now
a kgid_t don't use the EXTRACT macro. Instead perform the work of
the extract macro explicitly. Read the value with ntohl and
convert it to the appropriate type with make_kuid or make_kgid.
Test if the value is different from what is stored in status and
update changed. Update the value in status.

In xdr_encode_AFS_StoreStatus call from_kuid or from_kgid as
we are computing the on the wire encoding.

Initialize uids with GLOBAL_ROOT_UID instead of 0.
Initialize gids with GLOBAL_ROOT_GID instead of 0.

Cc: David Howells
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2013-02-13 22:00:51 +0800

12 Feb, 2013

2 commits

4fa814be2 9p: Allow building 9p with user namespaces enabled. ... Browse Code »

Now that the uid_t -> kuid_t, gid_t -> kgid_t conversion
has been completed in 9p allow 9p to be built when user
namespaces are enabled.

Cc: Eric Van Hensbergen
Cc: Ron Minnich
Cc: Latchesar Ionkov
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-12 19:19:35 +0800
d5ea055f1 ceph: Enable building when user namespaces are enabled. ... Browse Code »

Now that conversions happen from kuids and kgids when generating ceph
messages and conversion happen to kuids and kgids after receiving
celph messages, and all intermediate data structures store uids and
gids as type kuid_t and kgid_t it is safe to enable ceph with
user namespace support enabled.

Cc: Sage Weil
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-12 19:19:28 +0800

08 Feb, 2013

1 commit

02fc8d372 cputime: Restore CPU_ACCOUNTING config defaults for PPC64 ... Browse Code »

Commit abf917cd91cb ("cputime: Generic on-demand virtual cputime
accounting") inadvertantly changed the default CPU_ACCOUNTING
config for PPC64. Repair that.

Signed-off-by: Stephen Rothwell
Acked-by: Frederic Weisbecker
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: ppc-dev
Cc: Benjamin Herrenschmidt
Link: http://lkml.kernel.org/r/20130208141938.f31b7b9e1acac5bbe769ee4c@canb.auug.org.au
Signed-off-by: Ingo Molnar

Stephen Rothwell
2013-02-08 22:23:12 +0800

05 Feb, 2013

2 commits

077931446 Merge branch 'nohz/printk-v8' into irq/core ... Browse Code »

Conflicts:
kernel/irq_work.c

Add support for printk in full dynticks CPU.

* Don't stop tick with irq works pending. This
fix is generally useful and concerns archs that
can't raise self IPIs.

* Flush irq works before CPU offlining.

* Introduce "lazy" irq works that can wait for the
next tick to be executed, unless it's stopped.

* Implement klogd wake up using irq work. This
removes the ad-hoc printk_tick()/printk_needs_cpu()
hooks and make it working even in dynticks mode.

Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2013-02-05 07:48:46 +0800
9228b5f24 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck… ... Browse Code »

…/linux-rcu into core/rcu

Pull RCU updates from Paul E. McKenney:

1. Changes to rcutorture and to RCU documentation. Posted to LKML at
https://lkml.org/lkml/2013/1/26/188.

2. Enhancements to uniprocessor handling in tiny RCU. Posted to LKML
at https://lkml.org/lkml/2013/1/27/2.

3. Tag RCU callbacks with grace-period number to simplify callback
advancement. Posted to LKML at https://lkml.org/lkml/2013/1/26/203.

4. Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/1/26/204.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2013-02-05 02:06:34 +0800

29 Jan, 2013

2 commits

9fc52d832 rcu: Allow TREE_PREEMPT_RCU on UP systems ... Browse Code »

The TINY_PREEMPT_RCU is complex, does not provide that much memory
savings, and therefore TREE_PREEMPT_RCU should be used instead. The
systems where the difference between TINY_PREEMPT_RCU and TREE_PREEMPT_RCU
are quite small compared to the memory footprint of CONFIG_PREEMPT.

This commit therefore takes a first step towards eliminating
TINY_PREEMPT_RCU by allowing TREE_PREEMPT_RCU to be configured on !SMP
systems.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-01-29 14:06:22 +0800
6bfc09e23 rcu: Provide RCU CPU stall warnings for tiny RCU ... Browse Code »

Tiny RCU has historically omitted RCU CPU stall warnings in order to
reduce memory requirements, however, lack of these warnings caused
Thomas Gleixner some debugging pain recently. Therefore, this commit
adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps
the memory footprint small, while still enabling CPU stall warnings
in kernels built to enable them.

Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON
config variable to simplify #if expressions.

Reported-by: Thomas Gleixner
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-01-29 14:06:21 +0800

28 Jan, 2013

1 commit

abf917cd9 cputime: Generic on-demand virtual cputime accounting ... Browse Code »

If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.

Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.

However implementing CONFIG_VIRT_CPU_ACCOUNTING require
low level hooks and involves more overhead. But we already
have a generic context tracking subsystem that is required
for RCU needs by archs which plan to shut down the tick
outside idle.

This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.

There are some upsides of doing this:

- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).

- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.

And one downside:

- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-01-28 02:23:27 +0800

27 Jan, 2013

1 commit

e11f0ae38 userns: Recommend use of memory control groups. ... Browse Code »

In the help text describing user namespaces recommend use of memory
control groups. In many cases memory control groups are the only
mechanism there is to limit how much memory a user who can create
user namespaces can use.

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2013-01-27 14:20:06 +0800