14 Jan, 2011
1 commit
-
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
Documentation/trace/events.txt: Remove obsolete sched_signal_send.
writeback: fix global_dirty_limits comment runtime -> real-time
ppc: fix comment typo singal -> signal
drivers: fix comment typo diable -> disable.
m68k: fix comment typo diable -> disable.
wireless: comment typo fix diable -> disable.
media: comment typo fix diable -> disable.
remove doc for obsolete dynamic-printk kernel-parameter
remove extraneous 'is' from Documentation/iostats.txt
Fix spelling milisec -> ms in snd_ps3 module parameter description
Fix spelling mistakes in comments
Revert conflicting V4L changes
i7core_edac: fix typos in comments
mm/rmap.c: fix comment
sound, ca0106: Fix assignment to 'channel'.
hrtimer: fix a typo in comment
init/Kconfig: fix typo
anon_inodes: fix wrong function name in comment
fix comment typos concerning "consistent"
poll: fix a typo in comment
...Fix up trivial conflicts in:
- drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
- fs/ext4/ext4.hAlso fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
07 Jan, 2011
5 commits
-
One of the operands, buf, is incorrect, since it is stripped and the
correct address for subsequent string comparing could change if
leading white spaces, if any, are removed from buf.It is fixed by replacing buf with cmp.
Signed-off-by: Hillf Danton
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Seems I lost a change somewhere, leaking memory.
sched: fix struct autogroup memory leak
Add missing change to actually use autogroup_free().
Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
root_task_group is the leftover of USER_SCHED, now it's always
same to init_task_group.
But as Mike suggested, root_task_group is maybe the suitable name
to keep for a tree.
So in this patch:
init_task_group --> root_task_group
init_task_group_load --> root_task_group_load
INIT_TASK_GROUP_LOAD --> ROOT_TASK_GROUP_LOADSuggested-by: Mike Galbraith
Signed-off-by: Yong Zhang
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
…/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
sched: Change wait_for_completion_*_timeout() to return a signed long
sched, autogroup: Fix reference leak
sched, autogroup: Fix potential access to freed memory
sched: Remove redundant CONFIG_CGROUP_SCHED ifdef
sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight
sched: Move periodic share updates to entity_tick()
printk: Use this_cpu_{read|write} api on printk_pending
sched: Make pushable_tasks CONFIG_SMP dependant
sched: Add 'autogroup' scheduling feature: automated per session task groups
sched: Fix unregister_fair_sched_group()
sched: Remove unused argument dest_cpu to migrate_task()
mutexes, sched: Introduce arch_mutex_cpu_relax()
sched: Add some clock info to sched_debug
cpu: Remove incorrect BUG_ON
cpu: Remove unused variable
sched: Fix UP build breakage
sched: Make task dump print all 15 chars of proc comm
sched: Update tg->shares after cpu.shares write
sched: Allow update_cfs_load() to update global load
sched: Implement demand based update_cfs_load()
... -
…git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (146 commits)
tools, perf: Documentation for the power events API
perf: Add calls to suspend trace point
perf script: Make some lists static
perf script: Use the default lost event handler
perf session: Warn about errors when processing pipe events too
perf tools: Fix perf_event.h header usage
perf test: Clarify some error reports in the open syscall test
x86, NMI: Add touch_nmi_watchdog to io_check_error delay
x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time
x86: Only call smp_processor_id in non-preempt cases
perf timechart: Adjust perf timechart to the new power events
perf: Clean up power events by introducing new, more generic ones
perf: Do not export power_frequency, but power_start event
perf test: Add test for counting open syscalls
perf evsel: Auto allocate resources needed for some methods
perf evsel: Use {cpu,thread}_map to shorten list of parameters
perf tools: Refactor all_tids to hold nr and the map
perf tools: Refactor cpumap to hold nr and the map
perf evsel: Introduce per cpu and per thread open helpers
perf evsel: Steal the counter reading routines from stat
...
05 Jan, 2011
2 commits
-
wait_for_completion_*_timeout() can return:
0: if the wait timed out
-ve: if the wait was interrupted
+ve: if the completion was completed.As they currently return an 'unsigned long', the last two cases
are not easily distinguished which can easily result in buggy
code, as is the case for the recently added
wait_for_completion_interruptible_timeout() call in
net/sunrpc/cache.cSo change them both to return 'long'. As MAX_SCHEDULE_TIMEOUT
is LONG_MAX, a large +ve return value should never overflow.Signed-off-by: NeilBrown
Cc: Peter Zijlstra
Cc: J. Bruce Fields
Cc: Andrew Morton
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: Merge the final .37 tree.
Signed-off-by: Ingo Molnar
04 Jan, 2011
1 commit
-
CONFIG_[FAIR|RT]_GROUP_SCHED always means CONFIG_CGROUP_SCHED
Signed-off-by: Yong Zhang
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
23 Dec, 2010
2 commits
-
…/linux-2.6-rcu into core/rcu
-
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.cNeeded to update to apply fixes for which the old branch was too
outdated.
22 Dec, 2010
1 commit
-
Merge reason: Pick up the latest -rc.
Signed-off-by: Ingo Molnar
20 Dec, 2010
1 commit
-
Linus reported that the new warning introduced by commit f26f9aff6aaf
"Sched: fix skip_clock_update optimization" triggers. The need_resched
flag can be set by other CPUs asynchronously so this debug check is
bogus - remove it.Reported-by: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar
16 Dec, 2010
3 commits
-
Currently we call perf_event_init() from sched_init(). In order to
make it more obvious move it to the cannnonical location.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since the irqtime accounting is using non-atomic u64 and can be read
from remote cpus (writes are strictly cpu local, reads are not) we
have to deal with observing partial updates.When we do observe partial updates the clock movement (in particular,
->clock_task movement) will go funny (in either direction), a
subsequent clock update (observing the full update) will make it go
funny in the oposite direction.Since we rely on these clocks to be strictly monotonic we cannot
suffer backwards motion. One possible solution would be to simply
ignore all backwards deltas, but that will lead to accounting
artefacts, most notable: clock_task + irq_time != clock, this
inaccuracy would end up in user visible stats.Therefore serialize the reads using a seqcount.
Reviewed-by: Venkatesh Pallipadi
Reported-by: Mikael Pettersson
Tested-by: Mikael Pettersson
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Some ARM systems have a short sched_clock() [ which needs to be fixed
too ], but this exposed a bug in the irq_time code as well, it doesn't
deal with wraps at all.Fix the irq_time code to deal with u64 wraps by re-writing the code to
only use delta increments, which avoids the whole issue.Reviewed-by: Venkatesh Pallipadi
Reported-by: Mikael Pettersson
Tested-by: Mikael Pettersson
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
09 Dec, 2010
3 commits
-
As noted by Peter Zijlstra at https://lkml.org/lkml/2010/11/10/391
(while reviewing other stuff, though), tracking pushable tasks
only makes sense on SMP systems.Signed-off-by: Dario Faggioli
Acked-by: Steven Rostedt
Acked-by: Gregory Haskins
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
idle_balance() drops/retakes rq->lock, leaving the previous task
vulnerable to set_tsk_need_resched(). Clear it after we return
from balancing instead, and in setup_thread_stack() as well, so
no successfully descheduled or never scheduled task has it set.Need resched confused the skip_clock_update logic, which assumes
that the next call to update_rq_clock() will come nearly immediately
after being set. Make the optimization robust against the waking
a sleeper before it sucessfully deschedules case by checking that
the current task has not been dequeued before setting the flag,
since it is that useless clock update we're trying to save, and
clear unconditionally in schedule() proper instead of conditionally
in put_prev_task().Signed-off-by: Mike Galbraith
Reported-by: Bjoern B. Brandenburg
Tested-by: Yong Zhang
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
There's a long-running regression that proved difficult to fix and
which is hitting certain people and is rather annoying in its effects.Damien reported that after 74f5187ac8 (sched: Cure load average vs
NO_HZ woes) his load average is unnaturally high, he also noted that
even with that patch reverted the load avgerage numbers are not
correct.The problem is that the previous patch only solved half the NO_HZ
problem, it addressed the part of going into NO_HZ mode, not of
comming out of NO_HZ mode. This patch implements that missing half.When comming out of NO_HZ mode there are two important things to take
care of:- Folding the pending idle delta into the global active count.
- Correctly aging the averages for the idle-duration.So with this patch the NO_HZ interaction should be complete and
behaviour between CONFIG_NO_HZ=[yn] should be equivalent.Furthermore, this patch slightly changes the load average computation
by adding a rounding term to the fixed point multiplication.Reported-by: Damien Wyart
Reported-by: Tim McGrath
Tested-by: Damien Wyart
Tested-by: Orion Poplawski
Tested-by: Kyle McMartin
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
Cc: Chase Douglas
LKML-Reference:
Signed-off-by: Ingo Molnar
30 Nov, 2010
3 commits
-
A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity. This patch
implements an idea from Linus, to automatically create task
groups. Currently, only per session autogroups are implemented,
but the patch leaves the way open for enhancement.Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group. When a task calls setsid(), a new task group
is created, the process is moved into the new task group, and a
reference to the preveious task group is dropped. Child
processes inherit this task group thereafter, and increase it's
refcount. When the last thread of a process exits, the
process's reference is dropped, such that when the last process
referencing an autogroup exits, the autogroup is destroyed.At runqueue selection time, IFF a task has no cgroup assignment,
its current autogroup is used.Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem:cat /proc//autogroup
Displays the task's group and the group's nice level.
echo > /proc//autogroup
Sets the task group's shares to the weight of nice task.
Setting nice level is rate limited for !admin users due to the
abuse risk of task group locking.The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
the boot option noautogroup, and can also be turned on/off on
the fly via:echo [01] > /proc/sys/kernel/sched_autogroup_enabled
... which will automatically move tasks to/from the root task group.
Signed-off-by: Mike Galbraith
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Cc: Markus Trippelsdorf
Cc: Mathieu Desnoyers
Cc: Paul Turner
Cc: Oleg Nesterov
[ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
Signed-off-by: Ingo Molnar
LKML-Reference:
Signed-off-by: Ingo Molnar -
In the flipping and flopping between calling
unregister_fair_sched_group() on a per-cpu versus per-group basis
we ended up in a bad state.Remove from the list for the passed cpu as opposed to some
arbitrary index.( This fixes explosions w/ autogroup as well as a group
creation/destruction stress test. )Reported-by: Stephen Rothwell
Signed-off-by: Paul Turner
Cc: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar -
The first version of synchronize_sched_expedited() used the migration
code in the scheduler, and was therefore implemented in kernel/sched.c.
However, the more recent version of this code no longer uses the
migration code, so this commit moves it to the main RCU source files.Signed-off-by: Lai Jiangshan
Signed-off-by: Paul E. McKenney
26 Nov, 2010
3 commits
-
Remove unused argument, 'dest_cpu' of migrate_task(), and pass runqueue,
as it is always known at the call site.Signed-off-by: Nikanth Karthikesan
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
The spinning mutex implementation uses cpu_relax() in busy loops as a
compiler barrier. Depending on the architecture, cpu_relax() may do more
than needed in this specific mutex spin loops. On System z we also give
up the time slice of the virtual cpu in cpu_relax(), which prevents
effective spinning on the mutex.This patch replaces cpu_relax() in the spinning mutex code with
arch_mutex_cpu_relax(), which can be defined by each architecture that
selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
this patch should not affect other architectures than System z for now.Signed-off-by: Gerald Schaefer
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: Pick up latest fixes.
Signed-off-by: Ingo Molnar
23 Nov, 2010
1 commit
-
Signed-off-by: Erik Gilling
Signed-off-by: John Stultz
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
18 Nov, 2010
8 commits
-
Formerly sched_group_set_shares would force a rebalance by overflowing domain
share sums. Now that per-cpu averages are maintained we can set the true value
by issuing an update_cfs_shares() following a tg->shares update.Also initialize tg se->load to 0 for consistency since we'll now set correct
weights on enqueue.Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
When the system is busy, dilation of rq->next_balance makes lb->update_shares()
insufficiently frequent for threads which don't sleep (no dequeue/enqueue
updates). Adjust for this by making demand based updates based on the
accumulation of execution time sufficient to wrap our averaging window.Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Using cfs_rq->nr_running is not sufficient to synchronize update_cfs_load with
the put path since nr_running accounting occurs at deactivation.It's also not safe to make the removal decision based on load_avg as this fails
with both high periods and low shares. Resolve this by clipping history after
4 periods without activity.Note: the above will always occur from update_shares() since in the
last-task-sleep-case that task will still be cfs_rq->curr when update_cfs_load
is called.Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Make tg_shares_up() use the active cgroup list, this means we cannot
do a strict bottom-up walk of the hierarchy, but assuming its a very
wide tree with a small number of active groups it should be a win.Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Make certain load-balance actions scale per number of active cgroups
instead of the number of existing cgroups.This makes wakeup/sleep paths more expensive, but is a win for systems
where the vast majority of existing cgroups are idle.Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
By tracking a per-cpu load-avg for each cfs_rq and folding it into a
global task_group load on each tick we can rework tg_shares_up to be
strictly per-cpu.This should improve cpu-cgroup performance for smp systems
significantly.[ Paul: changed to use queueing cfs_rq + bug fixes ]
Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
While discussing the need for sched_idle_next(), Oleg remarked that
since try_to_wake_up() ensures sleeping tasks will end up running on a
sane cpu, we can do away with migrate_live_tasks().If we then extend the existing hack of migrating current from
CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
need for the sched_idle_next() abomination disappears as well, since
idle will be the only possible thread left after the migration thread
stops.This greatly simplifies the hot-unplug task migration path, as can be
seen from the resulting code reduction (and about half the new lines
are comments).Suggested-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: Move to a .37-rc base.
Signed-off-by: Ingo Molnar
11 Nov, 2010
2 commits
-
Instead of dealing with sched classes inside each check_preempt_curr()
implementation, pull out this logic into the generic wakeup preemption
path.This fixes a hang in KVM (and others) where we are waiting for the
stop machine thread to run ...Reported-by: Markus Trippelsdorf
Tested-by: Marcelo Tosatti
Tested-by: Sergey Senozhatsky
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Currently we consider a sched domain to be well balanced when the imbalance
is less than the domain's imablance_pct. As the number of cores and threads
are increasing, current values of imbalance_pct (for example 25% for a
NUMA domain) are not enough to detect imbalances like:a) On a WSM-EP system (two sockets, each having 6 cores and 12 logical threads),
24 cpu-hogging tasks get scheduled as 13 on one socket and 11 on another
socket. Leading to an idle HT cpu.b) On a hypothetial 2 socket NHM-EX system (each socket having 8 cores and
16 logical threads), 16 cpu-hogging tasks can get scheduled as 9 on one
socket and 7 on another socket. Leaving one core in a socket idle
whereas in another socket we have a core having both its HT siblings busy.While this issue can be fixed by decreasing the domain's imbalance_pct
(by making it a function of number of logical cpus in the domain), it
can potentially cause more task migrations across sched groups in an
overloaded case.Fix this by using imbalance_pct only during newly_idle and busy
load balancing. And during idle load balancing, check if there
is an imbalance in number of idle cpu's across the busiest and this
sched_group or if the busiest group has more tasks than its weight that
the idle cpu in this_group can pull.Reported-by: Nikhil Rao
Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
02 Nov, 2010
1 commit
-
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",Signed-off-by: Uwe Kleine-König
Signed-off-by: Jiri Kosina
29 Oct, 2010
1 commit
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched_stat: Update sched_info_queue/dequeue() code comments
sched, cgroup: Fixup broken cgroup movement
23 Oct, 2010
1 commit
-
Andrew Morton pointed out almost all sched_setscheduler() callers are
using fixed parameters and can be converted to static. It reduces runtime
memory use a little.Signed-off-by: KOSAKI Motohiro
Reported-by: Andrew Morton
Acked-by: James Morris
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
22 Oct, 2010
1 commit
-
Dima noticed that we fail to correct the ->vruntime of sleeping tasks
when we move them between cgroups.Reported-by: Dima Zavin
Signed-off-by: Peter Zijlstra
Tested-by: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar