Eric Lee / smarc-fsl-linux-kernel

15 Jan, 2012

1 commit

c49c41a41 Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security ... Browse Code »

* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
capabilities: remove __cap_full_set definition
security: remove the security_netlink_recv hook as it is equivalent to capable()
ptrace: do not audit capability check when outputing /proc/pid/stat
capabilities: remove task_ns_* functions
capabitlies: ns_capable can use the cap helpers rather than lsm call
capabilities: style only - move capable below ns_capable
capabilites: introduce new has_ns_capabilities_noaudit
capabilities: call has_ns_capability from has_capability
capabilities: remove all _real_ interfaces
capabilities: introduce security_capable_noaudit
capabilities: reverse arguments to security_capable
capabilities: remove the task from capable LSM hook entirely
selinux: sparse fix: fix several warnings in the security server cod
selinux: sparse fix: fix warnings in netlink code
selinux: sparse fix: eliminate warnings for selinuxfs
selinux: sparse fix: declare selinux_disable() in security.h
selinux: sparse fix: move selinux_complete_init
selinux: sparse fix: make selinux_secmark_refcount static
SELinux: Fix RCU deref check warning in sel_netport_insert()

Manually fix up a semantic mis-merge wrt security_netlink_recv():

- the interface was removed in commit fd7784615248 ("security: remove
the security_netlink_recv hook as it is equivalent to capable()")

- a new user of it appeared in commit a38f7907b926 ("crypto: Add
userspace configuration API")

causing no automatic merge conflict, but Eric Paris pointed out the
issue.

Linus Torvalds
2012-01-15 10:36:33 +0800

12 Jan, 2012

2 commits

b8bf17d31 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix lockup by limiting load-balance retries on lock-break
sched: Fix CONFIG_CGROUP_SCHED dependency
sched: Remove empty #ifdefs

Linus Torvalds
2012-01-12 14:52:48 +0800
bced76aea sched: Fix lockup by limiting load-balance retries on lock-break ... Browse Code »

Eric and David reported dead machines and traced it to commit
a195f004 ("sched: Fix load-balance lock-breaking"), it turns out
there's still a scenario where we can end up re-trying forever.

Since there is no strict forward progress guarantee in the
load-balance iteration we can get stuck re-retrying the same
task-set over and over.

Creating a forward progress guarantee with the existing
structure is somewhat non-trivial, for now simply terminate the
retry loop after a few tries.

Reported-by: Eric Dumazet
Tested-by: Eric Dumazet
Reported-by: David Ahern
[ logic cleanup as suggested by Eric ]
Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Frederic Weisbecker
Cc: Suresh Siddha
Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2012-01-12 00:15:12 +0800

10 Jan, 2012

2 commits

9b9fb610f sched: Remove empty #ifdefs ... Browse Code »

Signed-off-by: Hiroshi Shimamoto
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/4F0B8525.8070901@ct.jp.nec.com
Signed-off-by: Ingo Molnar

Hiroshi Shimamoto
2012-01-10 16:57:41 +0800
db0c2bf69 Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cgroup: fix to allow mounting a hierarchy by name
cgroup: move assignement out of condition in cgroup_attach_proc()
cgroup: Remove task_lock() from cgroup_post_fork()
cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
cgroup: only need to check oldcgrp==newgrp once
cgroup: remove redundant get/put of task struct
cgroup: remove redundant get/put of old css_set from migrate
cgroup: Remove unnecessary task_lock before fetching css_set on migration
cgroup: Drop task_lock(parent) on cgroup_fork()
cgroups: remove redundant get/put of css_set from css_set_check_fetched()
resource cgroups: remove bogus cast
cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
cgroup, cpuset: don't use ss->pre_attach()
cgroup: don't use subsys->can_attach_task() or ->attach_task()
cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
cgroup: improve old cgroup handling in cgroup_attach_proc()
cgroup: always lock threadgroup during migration
threadgroup: extend threadgroup_lock() to cover exit and exec
threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
...

Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.

Linus Torvalds
2012-01-10 04:59:24 +0800

09 Jan, 2012

1 commit

972b2c719 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...

Linus Torvalds
2012-01-09 04:19:57 +0800

08 Jan, 2012

1 commit

7affca353 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
arm: fix up some samsung merge sysdev conversion problems
firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
Drivers:hv: Fix a bug in vmbus_driver_unregister()
driver core: remove __must_check from device_create_file
debugfs: add missing #ifdef HAS_IOMEM
arm: time.h: remove device.h #include
driver-core: remove sysdev.h usage.
clockevents: remove sysdev.h
arm: convert sysdev_class to a regular subsystem
arm: leds: convert sysdev_class to a regular subsystem
kobject: remove kset_find_obj_hinted()
m86k: gpio - convert sysdev_class to a regular subsystem
mips: txx9_sram - convert sysdev_class to a regular subsystem
mips: 7segled - convert sysdev_class to a regular subsystem
sh: dma - convert sysdev_class to a regular subsystem
sh: intc - convert sysdev_class to a regular subsystem
power: suspend - convert sysdev_class to a regular subsystem
power: qe_ic - convert sysdev_class to a regular subsystem
power: cmm - convert sysdev_class to a regular subsystem
s390: time - convert sysdev_class to a regular subsystem
...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
- arch/arm/mach-exynos/cpu.c
- arch/arm/mach-exynos/irq-eint.c
- arch/arm/mach-s3c64xx/common.c
- arch/arm/mach-s3c64xx/cpu.c
- arch/arm/mach-s5p64x0/cpu.c
- arch/arm/mach-s5pv210/common.c
- arch/arm/plat-samsung/include/plat/cpu.h
- arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h

Linus Torvalds
2012-01-08 04:03:30 +0800

24 Dec, 2011

1 commit

1ac9bc694 sched/tracing: Add a new tracepoint for sleeptime ... Browse Code »
43

If CONFIG_SCHEDSTATS is defined, the kernel maintains
information about how long the task was sleeping or
in the case of iowait, blocking in the kernel before
getting woken up.

This will be useful for sleep time profiling.

Note: this information is only provided for sched_fair.
Other scheduling classes may choose to provide this in
the future.

Note: the delay includes the time spent on the runqueue
as well.

Signed-off-by: Arun Sharma
Acked-by: Peter Zijlstra
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Cc: Arnaldo Carvalho de Melo
Cc: Andrew Vagin
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar

Arun Sharma
2011-12-24 00:56:17 +0800

23 Dec, 2011

1 commit

664dfa65e sched: Disable scheduler warnings during oopses ... Browse Code »

The panic-on-framebuffer code seems to cause a schedule
to occur during an oops. This causes a bunch of extra
spew as can be seen in:

https://bugzilla.redhat.com/attachment.cgi?id=549230

Don't do scheduler debug checks when we are oopsing already.

Signed-off-by: Dave Jones
Link: http://lkml.kernel.org/r/20111222213929.GA4722@redhat.com
Signed-off-by: Ingo Molnar

Dave Jones
2011-12-23 18:20:50 +0800

21 Dec, 2011

7 commits

62af3783e sched: Fix cgroup movement of waking process ... Browse Code »

There is a small race between try_to_wake_up() and sched_move_task(),
which is trying to move the process being woken up.

try_to_wake_up() on CPU0 sched_move_task() on CPU1
--------------------------------+---------------------------------
raw_spin_lock_irqsave(p->pi_lock)
task_waking_fair()
->p.se.vruntime -= cfs_rq->min_vruntime
ttwu_queue()
->send reschedule IPI to CPU1
raw_spin_unlock_irqsave(p->pi_lock)
task_rq_lock()
-> tring to aquire both p->pi_lock and
rq->lock with IRQ disabled
task_move_group_fair()
-> p.se.vruntime
-= (old)cfs_rq->min_vruntime
+= (new)cfs_rq->min_vruntime
task_rq_unlock()

(via IPI)
sched_ttwu_pending()
raw_spin_lock(rq->lock)
ttwu_do_activate()
...
enqueue_entity()
child.se->vruntime += cfs_rq->min_vruntime
raw_spin_unlock(rq->lock)

As a result, vruntime of the process becomes far bigger than min_vruntime,
if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_waking_fair().

Signed-off-by: Daisuke Nishimura
Signed-off-by: Peter Zijlstra
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20111215143741.df82dd50.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar

Daisuke Nishimura
2011-12-21 17:34:52 +0800
7ceff013c sched: Fix cgroup movement of newly created process ... Browse Code »

There is a small race between do_fork() and sched_move_task(), which is
trying to move the child.

do_fork() sched_move_task()
--------------------------------+---------------------------------
copy_process()
sched_fork()
task_fork_fair()
-> vruntime of the child is initialized
based on that of the parent.
-> we can see the child in "tasks" file now.
task_rq_lock()
task_move_group_fair()
-> child.se.vruntime
-= (old)cfs_rq->min_vruntime
+= (new)cfs_rq->min_vruntime
task_rq_unlock()
wake_up_new_task()
...
enqueue_entity()
child.se.vruntime += cfs_rq->min_vruntime

As a result, vruntime of the child becomes far bigger than min_vruntime,
if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_fork_fair().

Signed-off-by: Daisuke Nishimura
Signed-off-by: Peter Zijlstra
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20111215143607.2ee12c5d.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar

Daisuke Nishimura
2011-12-21 17:34:51 +0800
4fc420c91 sched: Fix cgroup movement of forking process ... Browse Code »

There is a small race between task_fork_fair() and sched_move_task(),
which is trying to move the parent.

task_fork_fair() sched_move_task()
--------------------------------+---------------------------------
cfs_rq = task_cfs_rq(current)
-> cfs_rq is the "old" one.
curr = cfs_rq->curr
-> curr is set to the parent.
task_rq_lock()
dequeue_task()
->parent.se.vruntime -= (old)cfs_rq->min_vruntime
enqueue_task()
->parent.se.vruntime += (new)cfs_rq->min_vruntime
task_rq_unlock()
raw_spin_lock_irqsave(rq->lock)
se->vruntime = curr->vruntime
-> vruntime of the child is set to that of the parent
which has already been updated by sched_move_task().
se->vruntime -= (old)cfs_rq->min_vruntime.
raw_spin_unlock_irqrestore(rq->lock)

As a result, vruntime of the child becomes far bigger than expected,
if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.

This patch fixes this problem by setting "cfs_rq" and "curr" after
holding the rq->lock.

Signed-off-by: Daisuke Nishimura
Acked-by: Paul Turner
Signed-off-by: Peter Zijlstra
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20111215143655.662676b0.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar

Daisuke Nishimura
2011-12-21 17:34:49 +0800
11534ec5b sched: Remove cfs bandwidth period check in tg_set_cfs_period() ... Browse Code »

Remove cfs bandwidth period check from tg_set_cfs_period.
Invalid bandwidth period's lower/upper limits are denoted
by min_cfs_quota_period/max_cfs_quota_period repsectively,
and are checked against valid period in tg_set_cfs_bandwidth().

As pjt pointed out, negative input will result in very large unsigned
numbers and will be caught by the max allowed period test.

Signed-off-by: Kamalesh Babulal
Acked-by: Paul Turner
[ammended changelog to mention negative values]
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111210135925.GA14593@linux.vnet.ibm.com
--
kernel/sched/core.c | 3 ---
1 file changed, 3 deletions(-)

Signed-off-by: Ingo Molnar

Kamalesh Babulal
2011-12-21 17:34:48 +0800
a195f004e sched: Fix load-balance lock-breaking ... Browse Code »

The current lock break relies on contention on the rq locks, something
which might never come because we've got IRQs disabled. Or will be
very likely because on anything with more than 2 cpus a synchronized
load-balance pass will very likely cause contention on the rq locks.

Also the sched_nr_migrate thing fails when it gets trapped the loops
of either the cgroup muck in load_balance_fair() or the move_tasks()
load condition.

Instead, use the new lb_flags field to propagate break/abort
conditions for all these loops and create a new loop outside the irq
disabled on the break being required.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-tsceb6w61q0gakmsccix6xxi@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-21 17:34:47 +0800
5b54b56be sched: Replace all_pinned with a generic flags field ... Browse Code »

Replace the all_pinned argument with a flags field so that we can add
some extra controls throughout that entire call chain.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-33kevm71m924ok1gpxd720v3@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-21 17:34:45 +0800
518cd6234 sched: Only queue remote wakeups when crossing cache boundaries ... Browse Code »

Mike reported a 13% drop in netperf TCP_RR performance due to the
new remote wakeup code. Suresh too noticed some performance issues
with it.

Reducing the IPIs to only cross cache domains solves the observed
performance issues.

Reported-by: Suresh Siddha
Reported-by: Mike Galbraith
Acked-by: Suresh Siddha
Acked-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Cc: Chris Mason
Cc: Dave Kleikamp
Link: http://lkml.kernel.org/r/1323338531.17673.7.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-21 17:34:44 +0800

20 Dec, 2011

1 commit

612ef28a0 Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into cputime-tip ... Browse Code »

Conflicts:
drivers/cpufreq/cpufreq_conservative.c
drivers/cpufreq/cpufreq_ondemand.c
drivers/macintosh/rack-meter.c
fs/proc/stat.c
fs/proc/uptime.c
kernel/sched/core.c

Martin Schwidefsky
2011-12-20 02:23:15 +0800

16 Dec, 2011

1 commit

07cde2608 sched: Add missing rcu_dereference() around ->real_parent usage ... Browse Code »

Wrap another ->real_parent dereference while under rcu_read_lock.

Signed-off-by: Kees Cook
Cc: Peter Zijlstra
Cc: Glauber Costa
Cc: Suresh Siddha
Cc: KAMEZAWA Hiroyuki
Link: http://lkml.kernel.org/r/20111215164918.GA13003@www.outflux.net
[ tidied up the changelog ]
Signed-off-by: Ingo Molnar

Kees Cook
2011-12-16 16:42:09 +0800

08 Dec, 2011

1 commit

067491b73 sched, nohz: Fix missing RCU read lock ... Browse Code »

Yong Zhang reported:

> [ INFO: suspicious RCU usage. ]
> kernel/sched/fair.c:5091 suspicious rcu_dereference_check() usage!

This is due to the sched_domain stuff being RCU protected and
commit 0b005cf5 ("sched, nohz: Implement sched group, domain
aware nohz idle load balancing") overlooking this fact.

The sd variable only lives inside the for_each_domain() block,
so we only need to wrap that.

Reported-by: Yong Zhang
Tested-by: Yong Zhang
Signed-off-by: Peter Zijlstra
Cc: Suresh Siddha
Link: http://lkml.kernel.org/r/1323264728.32012.107.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-08 12:45:48 +0800

07 Dec, 2011

7 commits

cd490c5b2 sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer ... Browse Code »

Intention is to set the NOHZ_BALANCE_KICK flag for the 'ilb_cpu'. Not
for the 'cpu' which is the local cpu. Fix the typo.

Reported-by: Srivatsa Vaddagiri
Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1323199594.1984.18.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-07 03:51:29 +0800
8a6d42d1b sched, nohz: Fix the idle cpu check in nohz_idle_balance ... Browse Code »

cpu bit in the nohz.idle_cpu_mask are reset in the first busy tick after
exiting idle. So during nohz_idle_balance(), intention is to double
check if the cpu that is part of the idle_cpu_mask is indeed idle before
going ahead in performing idle balance for that cpu.

Fix the cpu typo in the idle_cpu() check during nohz_idle_balance().

Reported-by: Srivatsa Vaddagiri
Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1323199177.1984.12.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-07 03:51:27 +0800
f8b6d1cc7 sched: Use jump_labels for sched_feat ... Browse Code »

Now that we initialize jump_labels before sched_init() we can use them
for the debug features without having to worry about a window where
they have the wrong setting.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-vpreo4hal9e0kzqmg5y0io2k@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-07 03:51:26 +0800
be726ffd1 sched/accounting: Fix parameter passing in task_group_account_field ... Browse Code »

The order of parameters is inverted. The index parameter
should come first.

Signed-off-by: Glauber Costa
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1322863119-14225-3-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar

Glauber Costa
2011-12-07 03:51:24 +0800
1c77f38ad sched/accounting: Fix user/system tick double accounting ... Browse Code »

Now that we're pointing cpuacct's root cgroup to cpustat and accounting
through task_group_account_field(), we should not access cpustat directly.
Since it is done anyway inside the acessor function, we end up accounting
it twice, which is wrong.

Signed-off-by: Glauber Costa
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1322863119-14225-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar

Glauber Costa
2011-12-07 03:51:23 +0800
54c707e98 sched/accounting: Re-use scheduler statistics for the root cgroup ... Browse Code »

Right now, after we collect tick statistics for user and system and store them
in a well known location, we keep the same statistics again for cpuacct.
Since cpuacct is hierarchical, the numbers for the root cgroup should be
absolutely equal to the system-wide numbers.

So it would be better to just use it: this patch changes cpuacct accounting
in a way that the cpustat statistics are kept in a struct kernel_cpustat percpu
array. In the root cgroup case, we just point it to the main array. The rest of
the hierarchy walk can be totally disabled later with a static branch - but I am
not doing it here.

Signed-off-by: Glauber Costa
Signed-off-by: Peter Zijlstra
Cc: Paul Tuner
Link: http://lkml.kernel.org/r/1322498719-2255-4-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar

Glauber Costa
2011-12-07 03:51:21 +0800
b39e66eaf sched: Save some hrtick_start_fair cycles ... Browse Code »

hrtick_start_fair() shows up in profiles even when disabled.

v3.0.6

taskset -c 3 pipe-test

PerfTop: 997 irqs/sec kernel:89.5% exact: 0.0% [1000Hz cycles], (all, CPU: 3)
------------------------------------------------------------------------------------------------

Virgin Patched
samples pcnt function samples pcnt function
_______ _____ ___________________________ _______ _____ ___________________________

2880.00 10.2% __schedule
1634.00 5.8% pipe_read
1458.00 5.2% system_call
1382.00 4.9% _raw_spin_lock_irqsave
1202.00 4.3% pipe_write
1164.00 4.1% copy_user_generic_string
1097.00 3.9% __switch_to
872.00 3.1% mutex_lock
687.00 2.4% mutex_unlock
682.00 2.4% native_sched_clock
643.00 2.3% system_call_after_swapgs
617.00 2.2% sched_clock_local
612.00 2.2% fsnotify
596.00 2.1% _raw_spin_un
542.00 1.9% sysret_check
467.00 1.7% fget_light
462.00 1.6% finish_task_switch
437.00 1.5% vfs_write
431.00 1.5% do_sync_write
413.00 1.5% select_task_rq_fair
386.00 1.4% update_curr
385.00 1.4% rw_verify_area
377.00 1.3% _raw_spin_lock_irq
369.00 1.3% do_sync_read
360.00 1.3% vfs_read
* 342.00 1.2% hrtick_start_fair 3136.00 11.3% __schedule 1615.00 5.8% pipe_read 1534.00 5.5% system_call 1412.00 5.1% _raw_spin_lock_irqsave 1255.00 4.5% copy_user_generic_string 1241.00 4.5% __switch_to 929.00 3.3% mutex_lock 846.00 3.0% mutex_unlock 804.00 2.9% pipe_write 713.00 2.6% native_sched_clock 653.00 2.3% _raw_spin_unlock_irqrestore 633.00 2.3% fsnotify 605.00 2.2% sched_clock_local lock_irqrestore 593.00 2.1% system_call_after_swapgs 559.00 2.0% sysret_check 472.00 1.7% fget_light 461.00 1.7% finish_task_switch 442.00 1.6% vfs_write 428.00 1.5% do_sync_write 404.00 1.5% _raw_spin_lock_irq 402.00 1.4% update_curr 389.00 1.4% do_sync_read 378.00 1.4% vfs_read 340.00 1.2% pipe_iov_copy_from_user 316.00 1.1% __wake_up_sync_key 313.00 1.1% __wake_up_common

Signed-off-by: Mike Galbraith
[ fixed !CONFIG_SCHED_HRTICK borkage ]
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1321971607.6855.17.camel@marge.simson.net
Signed-off-by: Ingo Molnar

Mike Galbraith
2011-12-07 03:51:20 +0800

06 Dec, 2011

12 commits

44252e421 sched/accounting, cgroups: Reuse cgroup's parent pointer ... Browse Code »

We already have a pointer to the cgroup parent (whose data is more likely
to be in the cache than this, anyway), so there is no need to have this one
in cpuacct.

This patch makes the underlying cgroup be used instead.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Paul Tuner
Cc: Li Zefan
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1322498719-2255-3-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar

Glauber Costa
2011-12-06 16:06:40 +0800
3292beb34 sched/accounting: Change cpustat fields to an array ... Browse Code »
43

This patch changes fields in cpustat from a structure, to an
u64 array. Math gets easier, and the code is more flexible.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Paul Tuner
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar

Glauber Costa
2011-12-06 16:06:38 +0800
786d6dc7a sched, nohz: Clean up the find_new_ilb() using sched groups nr_busy_cpus ... Browse Code »

nr_busy_cpus in the sched_group_power indicates whether the group
is semi idle or not. This helps remove the is_semi_idle_group() and simplify
the find_new_ilb() in the context of finding an optimal cpu that can do
idle load balancing.

Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20111202010832.656983582@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-06 16:06:36 +0800
0b005cf54 sched, nohz: Implement sched group, domain aware nohz idle load balancing ... Browse Code »

When there are many logical cpu's that enter and exit idle often, members of
the global nohz data structure are getting modified very frequently causing
lot of cache-line contention.

Make the nohz idle load balancing more scalabale by using the sched domain
topology and 'nr_busy_cpu's in the struct sched_group_power.

Idle load balance is kicked on one of the idle cpu's when there is atleast
one idle cpu and:

- a busy rq having more than one task or

- a busy rq's scheduler group that share package resources (like HT/MC
siblings) and has more than one member in that group busy or

- for the SD_ASYM_PACKING domain, if the lower numbered cpu's in that
domain are idle compared to the busy ones.

This will help in kicking the idle load balancing request only when
there is a potential imbalance. And once it is mostly balanced, these kicks will
be minimized.

These changes helped improve the workload that is context switch intensive
between number of task pairs by 2x on a 8 socket NHM-EX based system.

Reported-by: Tim Chen
Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20111202010832.602203411@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-06 16:06:34 +0800
69e1e811d sched, nohz: Track nr_busy_cpus in the sched_group_power ... Browse Code »

Introduce nr_busy_cpus in the struct sched_group_power [Not in sched_group
because sched groups are duplicated for the SD_OVERLAP scheduler domain]
and for each cpu that enters and exits idle, this parameter will
be updated in each scheduler group of the scheduler domain that this cpu
belongs to.

To avoid the frequent update of this state as the cpu enters
and exits idle, the update of the stat during idle exit is
delayed to the first timer tick that happens after the cpu becomes busy.
This is done using NOHZ_IDLE flag in the struct rq's nohz_flags.

Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20111202010832.555984323@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-06 16:06:32 +0800
1c792db7f sched, nohz: Introduce nohz_flags in 'struct rq' ... Browse Code »

Introduce nohz_flags in the struct rq, which will track these two flags
for now.

NOHZ_TICK_STOPPED keeps track of the tick stopped status that gets set when
the tick is stopped. It will be used to update the nohz idle load balancer data
structures during the first busy tick after the tick is restarted. At this
first busy tick after tickless idle, NOHZ_TICK_STOPPED flag will be reset.
This will minimize the nohz idle load balancer status updates that currently
happen for every tickless exit, making it more scalable when there
are many logical cpu's that enter and exit idle often.

NOHZ_BALANCE_KICK will track the need for nohz idle load balance
on this rq. This will replace the nohz_balance_kick in the rq, which was
not being updated atomically.

Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20111202010832.499438999@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-06 16:06:30 +0800
5b680fd61 sched/rt: Code cleanup, remove a redundant function call ... Browse Code »

The second call to sched_rt_period() is redundant, because the value of the
rt_runtime was already read and it was protected by the ->rt_runtime_lock.

Signed-off-by: Shan Hai
Reviewed-by: Kamalesh Babulal
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1322535836-13590-2-git-send-email-haishan.bai@gmail.com
Signed-off-by: Ingo Molnar

Shan Hai
2011-12-06 16:06:28 +0800
4d78a2239 sched: Fix the sched group node allocation for SD_OVERLAP domains ... Browse Code »

For the SD_OVERLAP domain, sched_groups for each CPU's sched_domain are
privately allocated and not shared with any other cpu. So the
sched group allocation should come from the cpu's node for which
SD_OVERLAP sched domain is being setup.

Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111118230554.164910950@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-06 16:06:26 +0800
916671c08 sched: Set skip_clock_update in yield_task_fair() ... Browse Code »

This is another case where we are on our way to schedule(),
so can save a useless clock update and resulting microscopic
vruntime update.

Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1321971686.6855.18.camel@marge.simson.net
Signed-off-by: Ingo Molnar

Mike Galbraith
2011-12-06 16:06:24 +0800
76854c7e8 sched: Use rt.nr_cpus_allowed to recover select_task_rq() cycles ... Browse Code »

rt.nr_cpus_allowed is always available, use it to bail from select_task_rq()
when only one cpu can be used, and saves some cycles for pinned tasks.

See the line marked with '*' below:

# taskset -c 3 pipe-test

PerfTop: 997 irqs/sec kernel:89.5% exact: 0.0% [1000Hz cycles], (all, CPU: 3)
------------------------------------------------------------------------------------------------

Virgin Patched
samples pcnt function samples pcnt function
_______ _____ ___________________________ _______ _____ ___________________________

2880.00 10.2% __schedule 3136.00 11.3% __schedule
1634.00 5.8% pipe_read 1615.00 5.8% pipe_read
1458.00 5.2% system_call 1534.00 5.5% system_call
1382.00 4.9% _raw_spin_lock_irqsave 1412.00 5.1% _raw_spin_lock_irqsave
1202.00 4.3% pipe_write 1255.00 4.5% copy_user_generic_string
1164.00 4.1% copy_user_generic_string 1241.00 4.5% __switch_to
1097.00 3.9% __switch_to 929.00 3.3% mutex_lock
872.00 3.1% mutex_lock 846.00 3.0% mutex_unlock
687.00 2.4% mutex_unlock 804.00 2.9% pipe_write
682.00 2.4% native_sched_clock 713.00 2.6% native_sched_clock
643.00 2.3% system_call_after_swapgs 653.00 2.3% _raw_spin_unlock_irqrestore
617.00 2.2% sched_clock_local 633.00 2.3% fsnotify
612.00 2.2% fsnotify 605.00 2.2% sched_clock_local
596.00 2.1% _raw_spin_unlock_irqrestore 593.00 2.1% system_call_after_swapgs
542.00 1.9% sysret_check 559.00 2.0% sysret_check
467.00 1.7% fget_light 472.00 1.7% fget_light
462.00 1.6% finish_task_switch 461.00 1.7% finish_task_switch
437.00 1.5% vfs_write 442.00 1.6% vfs_write
431.00 1.5% do_sync_write 428.00 1.5% do_sync_write
* 413.00 1.5% select_task_rq_fair 404.00 1.5% _raw_spin_lock_irq
386.00 1.4% update_curr 402.00 1.4% update_curr
385.00 1.4% rw_verify_area 389.00 1.4% do_sync_read
377.00 1.3% _raw_spin_lock_irq 378.00 1.4% vfs_read
369.00 1.3% do_sync_read 340.00 1.2% pipe_iov_copy_from_user
360.00 1.3% vfs_read 316.00 1.1% __wake_up_sync_key
342.00 1.2% hrtick_start_fair 313.00 1.1% __wake_up_common

Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1321971504.6855.15.camel@marge.simson.net
Signed-off-by: Ingo Molnar

Mike Galbraith
2011-12-06 15:51:26 +0800
77e81365e sched: Clean up domain traversal in select_idle_sibling() ... Browse Code »

Instead of going through the scheduler domain hierarchy multiple times
(for giving priority to an idle core over an idle SMT sibling in a busy
core), start with the highest scheduler domain with the SD_SHARE_PKG_RESOURCES
flag and traverse the domain hierarchy down till we find an idle group.

This cleanup also addresses an issue reported by Mike where the recent
changes returned the busy thread even in the presence of an idle SMT
sibling in single socket platforms.

Signed-off-by: Suresh Siddha
Tested-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1321556904.15339.25.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

Suresh Siddha
2011-12-06 15:51:25 +0800
b781a602a events, sched: Add tracepoint for accounting blocked time ... Browse Code »

This tracepoint shows how long a task is sleeping in uninterruptible state.

E.g. it may show how long and where a mutex is waited for.

Signed-off-by: Andrew Vagin
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar

Andrew Vagin
2011-12-06 15:51:23 +0800

17 Nov, 2011

1 commit

391e43da7 sched: Move all scheduler bits into kernel/sched/ ... Browse Code »
43

There's too many sched*.[ch] files in kernel/, give them their own
directory.

(No code changed, other than Makefile glue added.)

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-11-17 19:20:22 +0800