Eric Lee / smarc-fsl-linux-kernel

19 Sep, 2009

2 commits

6952b61de headers: taskstats_kern.h trim ... Browse Code »

Remove net/genetlink.h inclusion, now sched.c won't be recompiled
because of some networking changes.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-19 00:48:52 +0800
a03fdb761 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
time: Prevent 32 bit overflow with set_normalized_timespec()
clocksource: Delay clocksource down rating to late boot
clocksource: clocksource_select must be called with mutex locked
clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
timers: Drop a function prototype
clocksource: Resolve cpu hotplug dead lock with TSC unstable
timer.c: Fix S/390 comments
timekeeping: Fix invalid getboottime() value
timekeeping: Fix up read_persistent_clock() breakage on sh
timekeeping: Increase granularity of read_persistent_clock(), build fix
time: Introduce CLOCK_REALTIME_COARSE
x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
clocksource: Avoid clocksource watchdog circular locking dependency
clocksource: Protect the watchdog rating changes with clocksource_mutex
clocksource: Call clocksource_change_rating() outside of watchdog_lock
timekeeping: Introduce read_boot_clock
timekeeping: Increase granularity of read_persistent_clock()
timekeeping: Update clocksource with stop_machine
timekeeping: Add timekeeper read_clock helper functions
timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
...

Fix trivial conflict due to MIPS lemote -> loongson renaming.

Linus Torvalds
2009-09-19 00:15:24 +0800

18 Sep, 2009

4 commits

dcbf77b9e Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE
sched: Stop buddies from hogging the system
sched: Add new wakeup preemption mode: WAKEUP_RUNNING
sched: Fix TASK_WAKING & loadaverage breakage
sched: Disable wakeup balancing
sched: Rename flags to wake_flags
sched: Clean up the load_idx selection in select_task_rq_fair
sched: Optimize cgroup vs wakeup a bit
sched: x86: Name old_perf in a unique way
sched: Implement a gentler fair-sleepers feature
sched: Add SD_PREFER_LOCAL
sched: Add a few SYNC hint knobs to play with
sched: Fix sync wakeups again
sched: Add WF_FORK
sched: Rename sync arguments
sched: Rename select_task_rq() argument
sched: Feature to disable APERF/MPERF cpu_power
x86: sched: Provide arch implementations using aperf/mperf
x86: Add generic aperf/mperf code
x86: Move APERF/MPERF into a X86_FEATURE
...

Fix up trivial conflict in arch/x86/include/asm/processor.h due to
nearby addition of amd_get_nb_id() declaration from the EDAC merge.

Linus Torvalds
2009-09-18 12:00:02 +0800
5dd4de587 softirq: add BLOCK_IOPOLL to softirq_to_name ... Browse Code »

With BLOCK_IOPOLL_SOFTIRQ added, softirq_to_name[] and
show_softirq_name() needs to be updated.

Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt

Li Zefan
2009-09-18 03:53:44 +0800
b375a11a2 tracing: switch function prints from %pf to %ps ... Browse Code »

For direct function pointers (like what mcount provides) PowerPC64
requires the use of %ps, otherwise nothing is printed.

This patch converts all prints of functions retrieved through mcount
to use the %ps format from the %pf.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-09-18 03:53:40 +0800
45bd00d31 Merge branch 'linus' into tracing/core ... Browse Code »

Merge reason: Pick up kernel/softirq.c update for dependent fix.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-18 02:53:10 +0800

17 Sep, 2009

4 commits

29cd8bae3 sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE ... Browse Code »

The SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL code can break out of
the domain iteration early, making us miss the SD_WAKE_AFFINE bits.

Fix this by continuing iteration until there is no need for a
larger domain.

This also cleans up the cgroup stuff a bit, but not having two
update_shares() invocations.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-17 16:40:31 +0800
de69a80be sched: Stop buddies from hogging the system ... Browse Code »

Clear buddies more agressively.

The (theoretical, haven't actually observed any of this) problem is
that when we do not select either buddy in pick_next_entity()
because they are too far ahead of the left-most task, we do not
clear the buddies.

This means that as soon as we service the left-most task, these
same buddies will be tried again on the next schedule. Now if the
left-most task was a pure hog, it wouldn't have done any wakeups
and it wouldn't have set buddies of its own. That leads to the old
buddies dominating, which would lead to bad latencies.

Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-17 16:40:30 +0800
ad4b78bbc sched: Add new wakeup preemption mode: WAKEUP_RUNNING ... Browse Code »

Create a new wakeup preemption mode, preempt towards tasks that run
shorter on avg. It sets next buddy to be sure we actually run the task
we preempted for.

Test results:

root@twins:~# while :; do :; done &
[1] 6537
root@twins:~# while :; do :; done &
[2] 6538
root@twins:~# while :; do :; done &
[3] 6539
root@twins:~# while :; do :; done &
[4] 6540

root@twins:/home/peter# ./latt -c4 sleep 4
Entries: 48 (clients=4)

Averages:
------------------------------
Max 4750 usec
Avg 497 usec
Stdev 737 usec

root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features

root@twins:/home/peter# ./latt -c4 sleep 4
Entries: 48 (clients=4)

Averages:
------------------------------
Max 14 usec
Avg 5 usec
Stdev 3 usec

Disabled by default - needs more testing.

Signed-off-by: Peter Zijlstra
Acked-by: Mike Galbraith
Signed-off-by: Ingo Molnar
LKML-Reference:

Peter Zijlstra
2009-09-17 16:17:25 +0800
eb24073bc sched: Fix TASK_WAKING & loadaverage breakage ... Browse Code »

Fix this:

top - 21:54:00 up 2:59, 1 user, load average: 432512.33, 426421.74, 417432.74

Which happens because we now set TASK_WAKING before activate_task().

Cc: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-17 15:51:20 +0800

16 Sep, 2009

18 commits

ab86e5765 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
debugfs: Modify default debugfs directory for debugging pktcdvd.
debugfs: Modified default dir of debugfs for debugging UHCI.
debugfs: Change debugfs directory of IWMC3200
debugfs: Change debuhgfs directory of trace-events-sample.h
debugfs: Fix mount directory of debugfs by default in events.txt
hpilo: add poll f_op
hpilo: add interrupt handler
hpilo: staging for interrupt handling
driver core: platform_device_add_data(): use kmemdup()
Driver core: Add support for compatibility classes
uio: add generic driver for PCI 2.3 devices
driver-core: move dma-coherent.c from kernel to driver/base
mem_class: fix bug
mem_class: use minor as index instead of searching the array
driver model: constify attribute groups
UIO: remove 'default n' from Kconfig
Driver core: Add accessor for device platform data
Driver core: move dev_get/set_drvdata to drivers/base/dd.c
Driver core: add new device to bus's list before probing

Linus Torvalds
2009-09-16 23:27:10 +0800
6b7b352f2 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK

Linus Torvalds
2009-09-16 22:46:34 +0800
5a9b86f64 sched: Rename flags to wake_flags ... Browse Code »

For consistencies sake, rename the argument (again).

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-16 22:44:33 +0800
5158f4e44 sched: Clean up the load_idx selection in select_task_rq_fair ... Browse Code »

Clean up the code a little.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-16 22:44:32 +0800
3b6408942 sched: Optimize cgroup vs wakeup a bit ... Browse Code »

We don't need to call update_shares() for each domain we iterate,
just got the largets one.

However, we should call it before wake_affine() as well, so that
that can use up-to-date values too.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-16 22:44:32 +0800
b36461da2 tracing: Fix minor bugs for __unregister_ftrace_function_probe ... Browse Code »

Fix the condition of strcmp for "*".
Also fix NULL pointer dereference when glob is NULL.

Signed-off-by: Atsushi Tsuji
LKML-Reference:
Signed-off-by: Steven Rostedt

Atsushi Tsuji
2009-09-16 21:08:54 +0800
51e0304ce sched: Implement a gentler fair-sleepers feature ... Browse Code »

Add back FAIR_SLEEPERS and GENTLE_FAIR_SLEEPERS.

FAIR_SLEEPERS is the old logic: credit sleepers with their sleep time.

GENTLE_FAIR_SLEEPERS dampens this a bit: 50% of their sleep time gets
credited.

The hope here is to still give the benefits of fair-sleepers logic
(quick wakeups, etc.) while not allow them to have 100% of their
sleep time as if they were running.

Cc: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-16 15:05:20 +0800
59abf0264 sched: Add SD_PREFER_LOCAL ... Browse Code »

And turn it on for NUMA and MC domains. This improves
locality in balancing decisions by keeping up to
capacity amount of tasks local before looking for idle
CPUs. (and twice the capacity if SD_POWERSAVINGS_BALANCE
is set.)

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-16 14:42:40 +0800
cb684b5bc block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK ... Browse Code »

kernel/built-in.o:(.data+0x17b0): undefined reference to `blk_iopoll_enabled'

Since the extern declaration makes the compile work, but the actual
symbol is missing when block/blk-iopoll.o isn't linked in.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-16 03:53:11 +0800
e69b0f1b4 sched: Add a few SYNC hint knobs to play with ... Browse Code »

Currently we use overlap to weaken the SYNC hint, but allow it to
set the hint as well.

echo NO_SYNC_WAKEUP > /debug/sched_features
echo SYNC_MORE > /debug/sched_features

preserves pipe-test behaviour without using the WF_SYNC hint.

Worth playing with on more workloads...

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-16 01:47:23 +0800
63859d4fe sched: Fix sync wakeups again ... Browse Code »

The sync argument rename to introduce WF_* broke stuff by missing a
local alias for an argument in __wake_up_common, fix it by using
the more descriptive wake_flags name.

This restores WF_SYNC propagation, which fixes wake_affine()
behaviour, which fixes pipe-test.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-16 01:47:22 +0800
723e9db7a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
powerpc/nvram: Enable use Generic NVRAM driver for different size chips
powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
powerpc/ps3: Workaround for flash memory I/O error
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
powerpc/perf_counters: Reduce stack usage of power_check_constraints
powerpc: Fix bug where perf_counters breaks oprofile
powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
powerpc/irq: Improve nanodoc
powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
powerpc/book3e: Add missing page sizes
powerpc/pseries: Fix to handle slb resize across migration
powerpc/powermac: Thermal control turns system off too eagerly
powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
powerpc/405ex: support cuImage via included dtb
powerpc/405ex: provide necessary fixup function to support cuImage
powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
powerpc/44x: Update Arches defconfig
powerpc/44x: Update Arches dts
...

Fix up conflicts in drivers/char/agp/uninorth-agp.c

Linus Torvalds
2009-09-16 00:51:09 +0800
a56af8764 driver-core: move dma-coherent.c from kernel to driver/base ... Browse Code »

Placing dma-coherent.c in driver/base is better than in kernel,
since it contains code to do per-device coherent dma memory
handling.

Signed-off-by: Ming Lei
Signed-off-by: Greg Kroah-Hartman

Ming Lei
2009-09-16 00:50:47 +0800
ada3fa150 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...

Fix trivial conflict as by Tejun Heo in kernel/sched.c

Linus Torvalds
2009-09-16 00:39:44 +0800
f199fd990 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tip/linux-2.6-tip

* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix buffer overflow in perf_copy_attr()

Linus Torvalds
2009-09-16 00:34:27 +0800
6ca6cca31 tracing: optimize global_trace_clock cachelines ... Browse Code »

The prev_trace_clock_time is only read or written to when the
trace_clock_lock is taken. For better perfomance, they
should share the same cache line.

Reported-by: Peter Zijlstra
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-09-16 00:24:22 +0800
227423904 Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, pat: Fix cacheflush address in change_page_attr_set_clr()
mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
x86: Fix earlyprintk=dbgp for machines without NX
x86, pat: Sanity check remap_pfn_range for RAM region
x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
x86, pat: Add lookup_memtype to get the current memtype of a paddr
x86, pat: Use page flags to track memtypes of RAM pages
x86, pat: Generalize the use of page flag PG_uncached
x86, pat: Add rbtree to do quick lookup in memtype tracking
x86, pat: Add PAT reserve free to io_mapping* APIs
x86, pat: New i/f for driver to request memtype for IO regions
x86, pat: ioremap to follow same PAT restrictions as other PAT users
x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
x86, mtrr: make mtrr_aps_delayed_init static bool
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
x86: Fix an incorrect argument of reserve_bootmem()
x86: Fix system crash when loading with "reservetop" parameter

Linus Torvalds
2009-09-16 00:19:38 +0800
1aaf2e591 Merge branch 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, intel_txt: clean up the impact on generic code, unbreak non-x86
x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE
x86, intel_txt: Fix typos in Kconfig help
x86, intel_txt: Factor out the code for S3 setup
x86, intel_txt: tboot.c needs
intel_txt: Force IOMMU on for Intel TXT launch
x86, intel_txt: Intel TXT Sx shutdown support
x86, intel_txt: Intel TXT reboot/halt shutdown support
x86, intel_txt: Intel TXT boot support

Linus Torvalds
2009-09-16 00:19:20 +0800

15 Sep, 2009

12 commits

a7558e010 sched: Add WF_FORK ... Browse Code »

Avoid the cache buddies from biasing the time distribution away
from fork()ers. Normally the next buddy will be the preferred
scheduling target, but this makes fork()s prefer to run the new
child, whereas we prefer to run the parent, since that will
generate more work.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:51:31 +0800
7d4787214 sched: Rename sync arguments ... Browse Code »

In order to extend the functions to have more than 1 flag (sync),
rename the argument to flags, and explicitly define a WF_ space for
individual flags.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:51:30 +0800
0763a660a sched: Rename select_task_rq() argument ... Browse Code »

In order to be able to rename the sync argument, we need to rename
the current flag argument.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:51:29 +0800
8e6598af3 sched: Feature to disable APERF/MPERF cpu_power ... Browse Code »

I suspect a feed-back loop between cpuidle and the aperf/mperf
cpu_power bits, where when we have idle C-states lower the ratio,
which leads to lower cpu_power and then less load, which generates
more idle time, etc..

Put in a knob to disable it.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:51:28 +0800
d6a59aa3a sched: Provide arch_scale_freq_power ... Browse Code »

Provide an ach specific hook for cpufreq based scaling of
cpu_power.

Signed-off-by: Peter Zijlstra
[ego@in.ibm.com: spotting bugs]
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:51:24 +0800
0ec9fab3d sched: Improve latencies and throughput ... Browse Code »

Make the idle balancer more agressive, to improve a
x264 encoding workload provided by Jason Garrett-Glaser:

NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 252.82 fps, 22096.60 kb/s
encoded 600 frames, 250.69 fps, 22096.60 kb/s
encoded 600 frames, 245.76 fps, 22096.60 kb/s

NO_NEXT_BUDDY LB_BIAS
encoded 600 frames, 344.44 fps, 22096.60 kb/s
encoded 600 frames, 346.66 fps, 22096.60 kb/s
encoded 600 frames, 352.59 fps, 22096.60 kb/s

NO_NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 425.75 fps, 22096.60 kb/s
encoded 600 frames, 425.45 fps, 22096.60 kb/s
encoded 600 frames, 422.49 fps, 22096.60 kb/s

Peter pointed out that this is better done via newidle_idx,
not via LB_BIAS, newidle balancing should look for where
there is load _now_, not where there was load 2 ticks ago.

Worst-case latencies are improved as well as no buddies
means less vruntime spread. (as per prior lkml discussions)

This change improves kbuild-peak parallelism as well.

Reported-by: Jason Garrett-Glaser
Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Mike Galbraith
2009-09-15 22:51:16 +0800
78e7ed53c sched: Tweak wake_idx ... Browse Code »

When merging select_task_rq_fair() and sched_balance_self() we lost
the use of wake_idx, restore that and set them to 0 to make wake
balancing more aggressive.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:01:07 +0800
d7c33c493 sched: Fix task affinity for select_task_rq_fair ... Browse Code »

While merging select_task_rq_fair() and sched_balance_self() I made
a mistake that leads to testing the wrong task affinty.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:01:07 +0800
83f54960c sched: for_each_domain() vs RCU ... Browse Code »

for_each_domain() uses RCU to serialize the sched_domains, except
it doesn't actually use rcu_read_lock() and instead relies on
disabling preemption -> FAIL.

XXX: audit other sched_domain code.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:01:06 +0800
ae154be1f sched: Weaken SD_POWERSAVINGS_BALANCE ... Browse Code »

One of the problems of power-saving balancing is that under certain
scenarios it is too slow and allows tons of real work to pile up.

Avoid this by ignoring the powersave stuff when there's real work
to be done.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:01:06 +0800
c88d59108 sched: Merge select_task_rq_fair() and sched_balance_self() ... Browse Code »

The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.

To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().

Modify sched_balance_self() to:

- update_shares() when walking up the domain tree,
(it only called it for the top domain, but it should
have done this anyway), which allows us to remove
this ugly bit from try_to_wake_up().

- do wake_affine() on the smallest domain that contains
both this (the waking) and the prev (the wakee) cpu for
WAKE invocations.

Then use the top-down balance steps it had to replace wake_idle().

This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.

SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.

Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:01:05 +0800
e9c843118 sched: Add TASK_WAKING ... Browse Code »

We're going to want to drop rq->lock in try_to_wake_up() for a
longer period of time, however we also want to deal with concurrent
waking of the same task, which is currently handled by holding
rq->lock.

So introduce a new TASK state, namely TASK_WAKING, which indicates
someone is already waking the task (other wakers will fail p->state
& state).

We also keep preemption disabled over the whole ttwu().

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-15 22:01:05 +0800