Doug / smarc-fsl-linux-kernel | Embedian Git Server

14 Apr, 2011

8 commits

7608dec2c sched: Drop the rq argument to sched_class::select_task_rq() ... Browse Code »

In preparation of calling select_task_rq() without rq->lock held, drop
the dependency on the rq argument.

Reviewed-by: Frank Rowand
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20110405152729.031077745@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-04-14 14:52:36 +0800
013fdb808 sched: Serialize p->cpus_allowed and ttwu() using p->pi_lock ... Browse Code »

Currently p->pi_lock already serializes p->sched_class, also put
p->cpus_allowed and try_to_wake_up() under it, this prepares the way
to do the first part of ttwu() without holding rq->lock.

By having p->sched_class and p->cpus_allowed serialized by p->pi_lock,
we prepare the way to call select_task_rq() without holding rq->lock.

Reviewed-by: Frank Rowand
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20110405152728.990364093@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-04-14 14:52:35 +0800
fd2f4419b sched: Provide p->on_rq ... Browse Code »

Provide a generic p->on_rq because the p->se.on_rq semantics are
unfavourable for lockless wakeups but needed for sched_fair.

In particular, p->on_rq is only cleared when we actually dequeue the
task in schedule() and not on any random dequeue as done by things
like __migrate_task() and __sched_setscheduler().

This also allows us to remove p->se usage from !sched_fair code.

Reviewed-by: Frank Rowand
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110405152728.949545047@chello.nl

Peter Zijlstra
2011-04-14 14:52:35 +0800
d7c01d27a sched: Clean up ttwu() stats ... Browse Code »

Collect all ttwu() stat code into a single function and ensure its
always called for an actual wakeup (changing p->state to
TASK_RUNNING).

Reviewed-by: Frank Rowand
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110405152728.908177058@chello.nl

Peter Zijlstra
2011-04-14 14:52:34 +0800
893633817 sched: Change the ttwu() success details ... Browse Code »

try_to_wake_up() would only return a success when it would have to
place a task on a rq, change that to every time we change p->state to
TASK_RUNNING, because that's the real measure of wakeups.

This results in that success is always true for the tracepoints.

Reviewed-by: Frank Rowand
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110405152728.866866929@chello.nl

Peter Zijlstra
2011-04-14 14:52:34 +0800
c2f7115e2 sched: Move wq_worker_waking to the correct site ... Browse Code »

wq_worker_waking_up() needs to match wq_worker_sleeping(), since the
latter is only called on deactivate, move the former near activate.

Signed-off-by: Peter Zijlstra
Cc: Tejun Heo
Link: http://lkml.kernel.org/n/top-t3m7n70n9frmv4pv2n5fwmov@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-04-14 14:52:33 +0800
c6eb3dda2 mutex: Use p->on_cpu for the adaptive spin ... Browse Code »

Since we now have p->on_cpu unconditionally available, use it to
re-implement mutex_spin_on_owner.

Requested-by: Thomas Gleixner
Reviewed-by: Frank Rowand
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl

Peter Zijlstra
2011-04-14 14:52:33 +0800
3ca7a440d sched: Always provide p->on_cpu ... Browse Code »

Always provide p->on_cpu so that we can determine if its on a cpu
without having to lock the rq.

Reviewed-by: Frank Rowand
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20110405152728.785452014@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-04-14 14:52:32 +0800

13 Apr, 2011

1 commit

6631e635c block: don't flush plugged IO on forced preemtion scheduling ... Browse Code »

We really only want to unplug the pending IO when the process actually
goes to sleep. So move the test for flushing the plug up to the place
where we actually deactivate the task - where we have properly checked
for preemption and for the process really sleeping.

Acked-by: Jens Axboe
Acked-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-04-13 23:08:20 +0800

08 Apr, 2011

2 commits

8b9686ff4 Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for… ... Browse Code »

…-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32, fpu: Fix FPU exception handling on non-SSE systems
x86, hibernate: Initialize mmu_cr4_features during boot
x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
x86: visws: Fixup irq overhaul fallout

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Clean up rebalance_domains() load-balance interval calculation

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Fix cpumask leak in __setup_irq()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf probe: Fix listing incorrect line number with inline function
perf probe: Fix to find recursively inlined function
perf probe: Fix multiple --vars options behavior
perf probe: Fix to remove redundant close
perf probe: Fix to ensure function declared file

Linus Torvalds
2011-04-08 03:12:58 +0800
42933bac1 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 ... Browse Code »

* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
Fix common misspellings

Linus Torvalds
2011-04-08 02:14:49 +0800

05 Apr, 2011

1 commit

49c022e65 sched: Clean up rebalance_domains() load-balance interval calculation ... Browse Code »

Instead of the possible multiple-evaluation of num_online_cpus()
in rebalance_domains() that Linus reported, avoid it altogether
in the normal case since it's implemented with a Hamming weight
function over a cpu bitmask which can be darn expensive for those
with big iron.

This also makes it cleaner, smaller and documents the code.

Reported-by: Linus Torvalds
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-04-05 16:29:36 +0800

31 Mar, 2011

2 commits

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800
a51e91981 sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasks ... Browse Code »

sched_setscheduler() (in sched.c) is called in order of changing the
scheduling policy and/or the real-time priority of a task. Thus,
if we find out that neither of those are actually being modified, it
is possible to return earlier and save the overhead of a full
deactivate+activate cycle of the task in question.

Beside that, if we have more than one SCHED_FIFO task with the same
priority on the same rq (which means they share the same priority queue)
having one of them changing its position in the priority queue because of
a sched_setscheduler (as it happens by means of the deactivate+activate)
that does not actually change the priority violates POSIX which states,
for SCHED_FIFO:

"If a thread whose policy or priority has been modified by
pthread_setschedprio() is a running thread or is runnable, the effect on
its position in the thread list depends on the direction of the
modification, as follows: a. b. If the priority is unchanged, the
thread does not change position in the thread list. c. "

http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html

(ed: And the POSIX specification here does, briefly and somewhat unexpectedly,
match what common sense tells us as well. )

Signed-off-by: Dario Faggioli
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Dario Faggioli
2011-03-31 19:00:34 +0800

26 Mar, 2011

1 commit

8dd90265a Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched, doc: Update sched-design-CFS.txt
sched: Remove unused 'rq' variable and cpu_rq() call from alloc_fair_sched_group()
sched.h: Fix a typo ("its")
sched: Fix yield_to kernel-doc

Linus Torvalds
2011-03-26 08:59:38 +0800

25 Mar, 2011

1 commit

6c5103890 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...

Fix up conflicts in fs/{aio.c,super.c}

Linus Torvalds
2011-03-25 01:16:26 +0800

24 Mar, 2011

1 commit

b0e77598f userns: user namespaces: convert several capable() calls ... Browse Code »

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:08 +0800

23 Mar, 2011

1 commit

20dd67407 sched: Remove unused 'rq' variable and cpu_rq() call from alloc_fair_sched_group() ... Browse Code »

Signed-off-by: Sergey Senozhatsky
Cc: Steven Rostedt
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Sergey Senozhatsky
2011-03-23 20:27:58 +0800

20 Mar, 2011

1 commit

16addf954 sched: Fix yield_to kernel-doc ... Browse Code »

Add missing function parameters for yield_to():

Warning(kernel/sched.c:5470): No description found for parameter 'p'
Warning(kernel/sched.c:5470): No description found for parameter 'preempt'

Signed-off-by: Randy Dunlap
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Randy Dunlap
2011-03-20 02:19:36 +0800

17 Mar, 2011

2 commits

f74b94441 Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl ... Browse Code »

* 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
BKL: That's all, folks
fs/locks.c: Remove stale FIXME left over from BKL conversion
ipx: remove the BKL
appletalk: remove the BKL
x25: remove the BKL
ufs: remove the BKL
hpfs: remove the BKL
drivers: remove extraneous includes of smp_lock.h
tracing: don't trace the BKL
adfs: remove the big kernel lock

Linus Torvalds
2011-03-17 08:21:00 +0800
1fd06bb15 sched.c: fix kernel-doc for runqueue_is_locked() ... Browse Code »

Fix kernel-doc warning for runqueue_is_locked():

Warning(kernel/sched.c:664): missing initial short description on line:

Signed-off-by: Randy Dunlap
Cc: Ingo Molnar
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-03-17 01:47:04 +0800

16 Mar, 2011

4 commits

58cbe2476 sched, kernel-doc: Fix runqueue_is_locked() description ... Browse Code »

Fix kernel-doc warning for runqueue_is_locked():

Warning(kernel/sched.c:664): missing initial short description

Signed-off-by: Randy Dunlap
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Ingo Molnar

Randy Dunlap
2011-03-16 21:00:23 +0800
5f6fb4546 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
x86: Enable forced interrupt threading support
x86: Mark low level interrupts IRQF_NO_THREAD
x86: Use generic show_interrupts
x86: ioapic: Avoid redundant lookup of irq_cfg
x86: ioapic: Use new move_irq functions
x86: Use the proper accessors in fixup_irqs()
x86: ioapic: Use irq_data->state
x86: ioapic: Simplify irq chip and handler setup
x86: Cleanup the genirq name space
genirq: Add chip flag to force mask on suspend
genirq: Add desc->irq_data accessor
genirq: Add comments to Kconfig switches
genirq: Fixup fasteoi handler for oneshot mode
genirq: Provide forced interrupt threading
sched: Switch wait_task_inactive to schedule_hrtimeout()
genirq: Add IRQF_NO_THREAD
genirq: Allow shared oneshot interrupts
genirq: Prepare the handling of shared oneshot interrupts
genirq: Make warning in handle_percpu_event useful
x86: ioapic: Move trigger defines to io_apic.h
...

Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
space changes clashing with the Xen cleanups. The set_irq_msi() had
moved to xen_bind_pirq_msi_to_irq().

Linus Torvalds
2011-03-16 10:23:40 +0800
9620639b7 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
sched: Resched proper CPU on yield_to()
sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
sched: Clean up the IRQ_TIME_ACCOUNTING code
sched: Add #ifdef around irq time accounting functions
sched, autogroup: Stop claiming ownership of the root task group
sched, autogroup: Stop going ahead if autogroup is disabled
sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
sched: Fix the group_imb logic
sched: Clean up some f_b_g() comments
sched: Clean up remnants of sd_idle
sched: Wholesale removal of sd_idle logic
sched: Add yield_to(task, preempt) functionality
sched: Use a buddy to implement yield_task_fair()
sched: Limit the scope of clear_buddies
sched: Check the right ->nr_running in yield_task_fair()
sched: Avoid expensive initial update_cfs_load(), on UP too
sched: Fix switch_from_fair()
sched: Simplify the idle scheduling class
softirqs: Account ksoftirqd time as cpustat softirq
...

Linus Torvalds
2011-03-16 09:37:30 +0800
a926021cb Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
perf probe: Clean up probe_point_lazy_walker() return value
tracing: Fix irqoff selftest expanding max buffer
tracing: Align 4 byte ints together in struct tracer
tracing: Export trace_set_clr_event()
tracing: Explain about unstable clock on resume with ring buffer warning
ftrace/graph: Trace function entry before updating index
ftrace: Add .ref.text as one of the safe areas to trace
tracing: Adjust conditional expression latency formatting.
tracing: Fix event alignment: skb:kfree_skb
tracing: Fix event alignment: mce:mce_record
tracing: Fix event alignment: kvm:kvm_hv_hypercall
tracing: Fix event alignment: module:module_request
tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
tracing: Remove lock_depth from event entry
perf header: Stop using 'self'
perf session: Use evlist/evsel for managing perf.data attributes
perf top: Don't let events to eat up whole header line
perf top: Fix events overflow in top command
ring-buffer: Remove unused #include <linux/trace_irq.h>
tracing: Add an 'overwrite' trace_option.
...

Linus Torvalds
2011-03-16 09:31:30 +0800

11 Mar, 2011

1 commit

bf294b41c SUNRPC: Close a race in __rpc_wait_for_completion_task() ... Browse Code »

Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust

Trond Myklebust
2011-03-11 04:04:52 +0800

10 Mar, 2011

1 commit

73c101011 block: initial patch for on-stack per-task plugging ... Browse Code »

This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:45:54 +0800

05 Mar, 2011

1 commit

4ba8216cd BKL: That's all, folks ... Browse Code »

This removes the implementation of the big kernel lock,
at last. A lot of people have worked on this in the
past, I so the credit for this patch should be with
everyone who participated in the hunt.

The names on the Cc list are the people that were the
most active in this, according to the recorded git
history, in alphabetical order.

Signed-off-by: Arnd Bergmann
Acked-by: Alan Cox
Cc: Alessio Igor Bogani
Cc: Al Viro
Cc: Andrew Hendry
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Eric W. Biederman
Cc: Frederic Weisbecker
Cc: Hans Verkuil
Acked-by: Ingo Molnar
Cc: Jan Blunck
Cc: John Kacur
Cc: Jonathan Corbet
Cc: Linus Torvalds
Cc: Matthew Wilcox
Cc: Oliver Neukum
Cc: Paul Menage
Acked-by: Thomas Gleixner
Cc: Trond Myklebust

Arnd Bergmann
2011-03-05 17:56:00 +0800

04 Mar, 2011

2 commits

6d1cafd8b sched: Resched proper CPU on yield_to() ... Browse Code »

yield_to_task_fair() has code to resched the CPU of yielding task when the
intention is to resched the CPU of the task that is being yielded to.

Change here fixes the problem and also makes the resched conditional on
rq != p_rq.

Signed-off-by: Venkatesh Pallipadi
Reviewed-by: Rik van Riel
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2011-03-04 18:14:31 +0800
c02aa73b1 sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy ... Browse Code »

The current scheduler implementation returns -EPERM when trying to
change from SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE
is considered to be a nice 20 on steroids, changing to another policy
should be allowed provided the RLIMIT_NICE is accounted for.

This patch allows the following test-case to pass with RLIMIT_NICE=40,
but still fail with RLIMIT_NICE=10 when the calling process is run
from a typical shell (nice 0, or 20 in rlimit terms).

int main()
{
int ret;
struct sched_param sp;
sp.sched_priority = 0;

/* switch to SCHED_IDLE */
ret = sched_setscheduler(0, SCHED_IDLE, &sp);
printf("setscheduler IDLE: %d\n", ret);
if (ret) return ret;

/* switch back to SCHED_OTHER */
ret = sched_setscheduler(0, SCHED_OTHER, &sp);
printf("setscheduler OTHER: %d\n", ret);

return ret;
}

$ ulimit -e
40
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: 0

$ ulimit -e 10
$ ulimit -e
10
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: -1

Signed-off-by: Darren Hart
Signed-off-by: Peter Zijlstra
Cc: Richard Purdie
LKML-Reference:
Signed-off-by: Ingo Molnar

Darren Hart
2011-03-04 18:14:30 +0800

26 Feb, 2011

2 commits

544b4a1f3 sched: Clean up the IRQ_TIME_ACCOUNTING code ... Browse Code »

Fix this warning:

lkml.org/lkml/2011/1/30/124

kernel/sched.c:3719: warning: 'irqtime_account_idle_ticks' defined but not used
kernel/sched.c:3720: warning: 'irqtime_account_process_tick' defined but not used

In a cleaner way than:

7e9498705e81: sched: Add #ifdef around irq time accounting functions

This patch will not have any functional impact.

Signed-off-by: Venkatesh Pallipadi
Cc: heiko.carstens@de.ibm.com
Cc: a.p.zijlstra@chello.nl
LKML-Reference:
Signed-off-by: Ingo Molnar

Venkatesh Pallipadi
2011-02-26 14:59:58 +0800
8eb90c30e sched: Switch wait_task_inactive to schedule_hrtimeout() ... Browse Code »

When we force thread hard and soft interrupts the startup of ksoftirqd
would hang in kthread_bind() when wait_task_inactive() calls
schedule_timeout_uninterruptible() because there is no softirq yet
which will wake us up.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
LKML-Reference:

Thomas Gleixner
2011-02-26 03:24:22 +0800

25 Feb, 2011

1 commit

7e9498705 sched: Add #ifdef around irq time accounting functions ... Browse Code »

Get rid of this:

kernel/sched.c:3731:13: warning: 'irqtime_account_idle_ticks' defined but not used
kernel/sched.c:3732:13: warning: 'irqtime_account_process_tick' defined but not used

Signed-off-by: Heiko Carstens
Cc: Venkatesh Pallipadi
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Heiko Carstens
2011-02-25 21:39:48 +0800

16 Feb, 2011

1 commit

d41d5a016 cgroup: Fix cgroup_subsys::exit callback ... Browse Code »

Make the ::exit method act like ::attach, it is after all very nearly
the same thing.

The bug had no effect on correctness - fixing it is an optimization for
the scheduler. Also, later perf-cgroups patches rely on it.

Signed-off-by: Peter Zijlstra
Acked-by: Paul Menage
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-02-16 20:30:47 +0800

14 Feb, 2011

1 commit

bf1af3a80 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…rostedt/linux-2.6-trace into perf/core

Ingo Molnar
2011-02-14 22:15:16 +0800

12 Feb, 2011

1 commit

868baf07b ftrace: Fix memory leak with function graph and cpu hotplug ... Browse Code »

When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.

On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.

ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.

The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.

Reported-by: Benjamin Herrenschmidt
Suggested-by: Peter Zijlstra
Cc: Stable Tree
Signed-off-by: Steven Rostedt

Steven Rostedt
2011-02-12 05:23:33 +0800

03 Feb, 2011

3 commits

d95f41220 sched: Add yield_to(task, preempt) functionality ... Browse Code »

Currently only implemented for fair class tasks.

Add a yield_to_task method() to the fair scheduling class. allowing the
caller of yield_to() to accelerate another thread in it's thread group,
task group.

Implemented via a scheduler hint, using cfs_rq->next to encourage the
target being selected. We can rely on pick_next_entity to keep things
fair, so noone can accelerate a thread that has already used its fair
share of CPU time.

This also means callers should only call yield_to when they really
mean it. Calling it too often can result in the scheduler just
ignoring the hint.

Signed-off-by: Rik van Riel
Signed-off-by: Marcelo Tosatti
Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Mike Galbraith
2011-02-03 21:20:33 +0800
ac53db596 sched: Use a buddy to implement yield_task_fair() ... Browse Code »

Use the buddy mechanism to implement yield_task_fair. This
allows us to skip onto the next highest priority se at every
level in the CFS tree, unless doing so would introduce gross
unfairness in CPU time distribution.

We order the buddy selection in pick_next_entity to check
yield first, then last, then next. We need next to be able
to override yield, because it is possible for the "next" and
"yield" task to be different processen in the same sub-tree
of the CFS tree. When they are, we need to go into that
sub-tree regardless of the "yield" hint, and pick the correct
entity once we get to the right level.

Signed-off-by: Rik van Riel
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Rik van Riel
2011-02-03 21:20:33 +0800
fe4b04fa3 perf: Cure task_oncpu_function_call() races ... Browse Code »

Oleg reported that on architectures with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from
task_oncpu_function_call() can land before perf_event_task_sched_in()
and cause interesting situations for eg. perf_install_in_context().

This patch reworks the task_oncpu_function_call() interface to give a
more usable primitive as well as rework all its users to hopefully be
more obvious as well as remove the races.

While looking at the code I also found a number of races against
perf_event_task_sched_out() which can flip contexts between tasks so
plug those too.

Reported-and-reviewed-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-02-03 19:14:43 +0800

27 Jan, 2011

1 commit

6ea72f120 sched: Avoid expensive initial update_cfs_load(), on UP too ... Browse Code »

Fix the build on UP.

Signed-off-by: Peter Zijlstra
Cc: Paul Turner
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-01-27 19:48:14 +0800