31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
26 Mar, 2011
1 commit
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched, doc: Update sched-design-CFS.txt
sched: Remove unused 'rq' variable and cpu_rq() call from alloc_fair_sched_group()
sched.h: Fix a typo ("its")
sched: Fix yield_to kernel-doc
25 Mar, 2011
1 commit
-
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...Fix up conflicts in fs/{aio.c,super.c}
24 Mar, 2011
1 commit
-
We never uncharge subpage quantities.
Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Mar, 2011
2 commits
-
The sentence uses the possessive pronoun, which is spelled
without an apostrophe.Signed-off-by: Jonathan Neuschäfer
Cc: Jiri Kosina
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
All kthreads being created from a single helper task, they all use memory
from a single node for their kernel stack and task struct.This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter
to parameters already used by kthread_create().This parameter serves in allocating memory for the new kthread on its
memory node if possible.Signed-off-by: Eric Dumazet
Acked-by: David S. Miller
Reviewed-by: Andi Kleen
Acked-by: Rusty Russell
Cc: Tejun Heo
Cc: Tony Luck
Cc: Fenghua Yu
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Mar, 2011
2 commits
-
…l/git/tip/linux-2.6-tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
posix-clocks: Check write permissions in posix syscalls
hrtimer: Remove empty hrtimer_init_hres_timer()
hrtimer: Update hrtimer->state documentation
hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
timers: Export CLOCK_BOOTTIME via the posix timers interface
timers: Add CLOCK_BOOTTIME hrtimer base
time: Extend get_xtime_and_monotonic_offset() to also return sleep
time: Introduce get_monotonic_boottime and ktime_get_boottime
hrtimers: extend hrtimer base code to handle more then 2 clockids
ntp: Remove redundant and incorrect parameter check
mn10300: Switch do_timer() to xtimer_update()
posix clocks: Introduce dynamic clocks
posix-timers: Cleanup namespace
posix-timers: Add support for fd based clocks
x86: Add clock_adjtime for x86
posix-timers: Introduce a syscall for clock tuning.
time: Splitout compat timex accessors
ntp: Add ADJ_SETOFFSET mode bit
time: Introduce timekeeping_inject_offset
posix-timer: Update comment
...Fix up new system-call-related conflicts in
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/syscall_table_32.S
(name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
due to movement of get_jiffies_64() in:
kernel/time.c -
…/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
sched: Resched proper CPU on yield_to()
sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
sched: Clean up the IRQ_TIME_ACCOUNTING code
sched: Add #ifdef around irq time accounting functions
sched, autogroup: Stop claiming ownership of the root task group
sched, autogroup: Stop going ahead if autogroup is disabled
sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
sched: Fix the group_imb logic
sched: Clean up some f_b_g() comments
sched: Clean up remnants of sd_idle
sched: Wholesale removal of sd_idle logic
sched: Add yield_to(task, preempt) functionality
sched: Use a buddy to implement yield_task_fair()
sched: Limit the scope of clear_buddies
sched: Check the right ->nr_running in yield_task_fair()
sched: Avoid expensive initial update_cfs_load(), on UP too
sched: Fix switch_from_fair()
sched: Simplify the idle scheduling class
softirqs: Account ksoftirqd time as cpustat softirq
...
10 Mar, 2011
1 commit
-
This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.Signed-off-by: Jens Axboe
04 Mar, 2011
1 commit
-
Merge reason: Pick up updates before queueing up dependent patches.
Signed-off-by: Ingo Molnar
23 Feb, 2011
1 commit
-
Merge reason: Pick up the latest fixes before queueing up new changes.
Signed-off-by: Ingo Molnar
17 Feb, 2011
1 commit
-
There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'. The former is the more prominent one. The latter is
mostly used by workqueue and in a few other odd places. Unify the
spelling to 'freezable'.Signed-off-by: Tejun Heo
Reported-by: Alan Stern
Acked-by: "Rafael J. Wysocki"
Acked-by: Greg Kroah-Hartman
Acked-by: Dmitry Torokhov
Cc: David Woodhouse
Cc: Alex Dubov
Cc: "David S. Miller"
Cc: Steven Whitehouse
03 Feb, 2011
3 commits
-
Currently only implemented for fair class tasks.
Add a yield_to_task method() to the fair scheduling class. allowing the
caller of yield_to() to accelerate another thread in it's thread group,
task group.Implemented via a scheduler hint, using cfs_rq->next to encourage the
target being selected. We can rely on pick_next_entity to keep things
fair, so noone can accelerate a thread that has already used its fair
share of CPU time.This also means callers should only call yield_to when they really
mean it. Calling it too often can result in the scheduler just
ignoring the hint.Signed-off-by: Rik van Riel
Signed-off-by: Marcelo Tosatti
Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Use the buddy mechanism to implement yield_task_fair. This
allows us to skip onto the next highest priority se at every
level in the CFS tree, unless doing so would introduce gross
unfairness in CPU time distribution.We order the buddy selection in pick_next_entity to check
yield first, then last, then next. We need next to be able
to override yield, because it is possible for the "next" and
"yield" task to be different processen in the same sub-tree
of the CFS tree. When they are, we need to go into that
sub-tree regardless of the "yield" hint, and pick the correct
entity once we get to the right level.Signed-off-by: Rik van Riel
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Oleg reported that on architectures with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from
task_oncpu_function_call() can land before perf_event_task_sched_in()
and cause interesting situations for eg. perf_install_in_context().This patch reworks the task_oncpu_function_call() interface to give a
more usable primitive as well as rework all its users to hopefully be
more obvious as well as remove the races.While looking at the code I also found a number of races against
perf_event_task_sched_out() which can flip contexts between tasks so
plug those too.Reported-and-reviewed-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
01 Feb, 2011
1 commit
-
All callers of do_timer() are converted to xtime_update(). The only
users of xtime_lock are in kernel/time/. Make both local to
kernel/time/ and remove them from the global header files.[ tglx: Reuse tick-internal.h instead of creating another local header
file. Massaged changelog ]Signed-off-by: Torben Hohn
Cc: Peter Zijlstra
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
Signed-off-by: Thomas Gleixner
31 Jan, 2011
1 commit
-
xtime_update() takes xtime_lock write locked and calls
do_timer(). Provided to replace the do_timer() calls in the
architecture code.Signed-off-by: Torben Hohn
Cc: Peter Zijlstra
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
LKML-Reference:
Signed-off-by: Thomas Gleixner
26 Jan, 2011
2 commits
-
When a task is taken out of the fair class we must ensure the vruntime
is properly normalized because when we put it back in it will assume
to be normalized.The case that goes wrong is when changing away from the fair class
while sleeping. Sleeping tasks have non-normalized vruntime in order
to make sleeper-fairness work. So treat the switch away from fair as a
wakeup and preserve the relative vruntime.Also update sysrq-n to call the ->switch_{to,from} methods.
Reported-by: Onkalo Samu
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
instead, as suggested by Eric Dumazet.Tested-by: Shaun Ruffell
Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
14 Jan, 2011
4 commits
-
Add khugepaged to relocate fragmented pages into hugepages if new
hugepages become available. (this is indipendent of the defrag logic that
will have to make new hugepages available)The fundamental reason why khugepaged is unavoidable, is that some memory
can be fragmented and not everything can be relocated. So when a virtual
machine quits and releases gigabytes of hugepages, we want to use those
freely available hugepages to create huge-pmd in the other virtual
machines that may be running on fragmented memory, to maximize the CPU
efficiency at all times. The scan is slow, it takes nearly zero cpu time,
except when it copies data (in which case it means we definitely want to
pay for that cpu time) so it seems a good tradeoff.In addition to the hugepages being released by other process releasing
memory, we have the strong suspicion that the performance impact of
potentially defragmenting hugepages during or before each page fault could
lead to more performance inconsistency than allocating small pages at
first and having them collapsed into large pages later... if they prove
themselfs to be long lived mappings (khugepaged scan is slow so short
lived mappings have low probability to run into khugepaged if compared to
long lived mappings).Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We'd like to be able to oom_score_adj a process up/down as it
enters/leaves the foreground. Currently, it is not possible to oom_adj
down without CAP_SYS_RESOURCE. This patch allows a task to decrease its
oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
or its inherited value at fork. Assuming the thread that has forked it
has oom_score_adj of 0, each process could decrease it back from 0 upon
activation unless a CAP_SYS_RESOURCE thread elevated it to something
higher.Alternative considered:
* a setuid binary
* a daemon with CAP_SYS_RESOURCESince you don't wan't all processes to be able to reduce their oom_adj, a
setuid or daemon implementation would be complex. The alternatives also
have much higher overhead.This patch updated from original patch based on feedback from David
Rientjes.Signed-off-by: Mandeep Singh Baines
Acked-by: David Rientjes
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Rik van Riel
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This warning was added in commit bdff746a3915 ("clone: prepare to recycle
CLONE_STOPPED") three years ago. 2.6.26 came and went. As far as I know,
no-one is actually using CLONE_STOPPED.Signed-off-by: Dave Jones
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On a 16TB machine, max_user_watches has an integer overflow. Convert it
to use a long and handle the associated fallout.Signed-off-by: Robin Holt
Cc: "Eric W. Biederman"
Acked-by: Davide Libenzi
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jan, 2011
2 commits
-
Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds -
Remove path.h from sched.h and other files.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
07 Jan, 2011
3 commits
-
root_task_group is the leftover of USER_SCHED, now it's always
same to init_task_group.
But as Mike suggested, root_task_group is maybe the suitable name
to keep for a tree.
So in this patch:
init_task_group --> root_task_group
init_task_group_load --> root_task_group_load
INIT_TASK_GROUP_LOAD --> ROOT_TASK_GROUP_LOADSuggested-by: Mike Galbraith
Signed-off-by: Yong Zhang
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
…/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
sched: Change wait_for_completion_*_timeout() to return a signed long
sched, autogroup: Fix reference leak
sched, autogroup: Fix potential access to freed memory
sched: Remove redundant CONFIG_CGROUP_SCHED ifdef
sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight
sched: Move periodic share updates to entity_tick()
printk: Use this_cpu_{read|write} api on printk_pending
sched: Make pushable_tasks CONFIG_SMP dependant
sched: Add 'autogroup' scheduling feature: automated per session task groups
sched: Fix unregister_fair_sched_group()
sched: Remove unused argument dest_cpu to migrate_task()
mutexes, sched: Introduce arch_mutex_cpu_relax()
sched: Add some clock info to sched_debug
cpu: Remove incorrect BUG_ON
cpu: Remove unused variable
sched: Fix UP build breakage
sched: Make task dump print all 15 chars of proc comm
sched: Update tg->shares after cpu.shares write
sched: Allow update_cfs_load() to update global load
sched: Implement demand based update_cfs_load()
... -
…git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (146 commits)
tools, perf: Documentation for the power events API
perf: Add calls to suspend trace point
perf script: Make some lists static
perf script: Use the default lost event handler
perf session: Warn about errors when processing pipe events too
perf tools: Fix perf_event.h header usage
perf test: Clarify some error reports in the open syscall test
x86, NMI: Add touch_nmi_watchdog to io_check_error delay
x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time
x86: Only call smp_processor_id in non-preempt cases
perf timechart: Adjust perf timechart to the new power events
perf: Clean up power events by introducing new, more generic ones
perf: Do not export power_frequency, but power_start event
perf test: Add test for counting open syscalls
perf evsel: Auto allocate resources needed for some methods
perf evsel: Use {cpu,thread}_map to shorten list of parameters
perf tools: Refactor all_tids to hold nr and the map
perf tools: Refactor cpumap to hold nr and the map
perf evsel: Introduce per cpu and per thread open helpers
perf evsel: Steal the counter reading routines from stat
...
05 Jan, 2011
1 commit
-
Merge reason: Merge the final .37 tree.
Signed-off-by: Ingo Molnar
23 Dec, 2010
1 commit
-
…/linux-2.6-rcu into core/rcu
22 Dec, 2010
1 commit
-
Merge reason: Pick up the latest -rc.
Signed-off-by: Ingo Molnar
09 Dec, 2010
2 commits
-
As noted by Peter Zijlstra at https://lkml.org/lkml/2010/11/10/391
(while reviewing other stuff, though), tracking pushable tasks
only makes sense on SMP systems.Signed-off-by: Dario Faggioli
Acked-by: Steven Rostedt
Acked-by: Gregory Haskins
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
There's a long-running regression that proved difficult to fix and
which is hitting certain people and is rather annoying in its effects.Damien reported that after 74f5187ac8 (sched: Cure load average vs
NO_HZ woes) his load average is unnaturally high, he also noted that
even with that patch reverted the load avgerage numbers are not
correct.The problem is that the previous patch only solved half the NO_HZ
problem, it addressed the part of going into NO_HZ mode, not of
comming out of NO_HZ mode. This patch implements that missing half.When comming out of NO_HZ mode there are two important things to take
care of:- Folding the pending idle delta into the global active count.
- Correctly aging the averages for the idle-duration.So with this patch the NO_HZ interaction should be complete and
behaviour between CONFIG_NO_HZ=[yn] should be equivalent.Furthermore, this patch slightly changes the load average computation
by adding a rounding term to the fixed point multiplication.Reported-by: Damien Wyart
Reported-by: Tim McGrath
Tested-by: Damien Wyart
Tested-by: Orion Poplawski
Tested-by: Kyle McMartin
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
Cc: Chase Douglas
LKML-Reference:
Signed-off-by: Ingo Molnar
30 Nov, 2010
2 commits
-
A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity. This patch
implements an idea from Linus, to automatically create task
groups. Currently, only per session autogroups are implemented,
but the patch leaves the way open for enhancement.Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group. When a task calls setsid(), a new task group
is created, the process is moved into the new task group, and a
reference to the preveious task group is dropped. Child
processes inherit this task group thereafter, and increase it's
refcount. When the last thread of a process exits, the
process's reference is dropped, such that when the last process
referencing an autogroup exits, the autogroup is destroyed.At runqueue selection time, IFF a task has no cgroup assignment,
its current autogroup is used.Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem:cat /proc//autogroup
Displays the task's group and the group's nice level.
echo > /proc//autogroup
Sets the task group's shares to the weight of nice task.
Setting nice level is rate limited for !admin users due to the
abuse risk of task group locking.The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
the boot option noautogroup, and can also be turned on/off on
the fly via:echo [01] > /proc/sys/kernel/sched_autogroup_enabled
... which will automatically move tasks to/from the root task group.
Signed-off-by: Mike Galbraith
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Cc: Markus Trippelsdorf
Cc: Mathieu Desnoyers
Cc: Paul Turner
Cc: Oleg Nesterov
[ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
Signed-off-by: Ingo Molnar
LKML-Reference:
Signed-off-by: Ingo Molnar -
Add priority boosting, but only for TINY_PREEMPT_RCU. This is enabled
by the default-off RCU_BOOST kernel parameter. The priority to which to
boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel
parameter (defaulting to real-time priority 1) and the time to wait
before boosting the readers blocking a given grace period is controlled
by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds).Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
26 Nov, 2010
2 commits
-
The perf hardware pmu got initialized at various points in the boot,
some before early_initcall() some after (notably arch_initcall).The problem is that the NMI lockup detector is ran from early_initcall()
and expects the hardware pmu to be present.Sanitize this by moving all architecture hardware pmu implementations to
initialize at early_initcall() and move the lockup detector to an explicit
initcall right after that.Cc: paulus
Cc: davem
Cc: Michael Cree
Cc: Deng-Cheng Zhu
Acked-by: Paul Mundt
Acked-by: Will Deacon
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: Pick up latest fixes.
Signed-off-by: Ingo Molnar
18 Nov, 2010
3 commits
-
Introduce a new sysctl for the shares window and disambiguate it from
sched_time_avg.A 10ms window appears to be a good compromise between accuracy and performance.
Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
By tracking a per-cpu load-avg for each cfs_rq and folding it into a
global task_group load on each tick we can rework tg_shares_up to be
strictly per-cpu.This should improve cpu-cgroup performance for smp systems
significantly.[ Paul: changed to use queueing cfs_rq + bug fixes ]
Signed-off-by: Paul Turner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
While discussing the need for sched_idle_next(), Oleg remarked that
since try_to_wake_up() ensures sleeping tasks will end up running on a
sane cpu, we can do away with migrate_live_tasks().If we then extend the existing hack of migrating current from
CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
need for the sched_idle_next() abomination disappears as well, since
idle will be the only possible thread left after the migration thread
stops.This greatly simplifies the hot-unplug task migration path, as can be
seen from the resulting code reduction (and about half the new lines
are comments).Suggested-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar