Eric Lee / smarc-fsl-linux-kernel

27 Feb, 2013

1 commit

dcad0fcea Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cputime: Use local_clock() for full dynticks cputime accounting
cputime: Constify timeval_to_cputime(timeval) argument
sched: Move RR_TIMESLICE from sysctl.h to rt.h
sched: Fix /proc/sched_debug failure on very very large systems
sched: Fix /proc/sched_stat failure on very very large systems
sched/core: Remove the obsolete and unused nr_uninterruptible() function

Linus Torvalds
12 years ago

22 Feb, 2013

1 commit

bbbfeac92 sched: Fix /proc/sched_debug failure on very very large systems ... Browse Code »

On systems with 4096 cores attemping to read /proc/sched_debug
fails because we are trying to push all the data into a single
kmalloc buffer.

The issue is on these very large machines all the data will not
fit in 4mb.

A better solution is to not us the single_open mechanism but to
provide our own seq_operations and treat each cpu as an
individual record.

The output should be identical to the previous version.

Reported-by: Dave Jones
Signed-off-by: Nathan Zimmer
Cc: Peter Zijlstra )
[ Whitespace fixlet]
[ Fix spello in comment]
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Nathan Zimmer
12 years ago

21 Feb, 2013

1 commit

502b24c23 Merge branch 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup changes from Tejun Heo:
"Nothing too drastic.

- Removal of synchronize_rcu() from userland visible paths.

- Various fixes and cleanups from Li.

- cgroup_rightmost_descendant() added which will be used by cpuset
changes (it will be a separate pull request)."

* 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fail if monitored file and event_control are in different cgroup
cgroup: fix cgroup_rmdir() vs close(eventfd) race
cpuset: fix cpuset_print_task_mems_allowed() vs rename() race
cgroup: fix exit() vs rmdir() race
cgroup: remove bogus comments in cgroup_diput()
cgroup: remove synchronize_rcu() from cgroup_diput()
cgroup: remove duplicate RCU free on struct cgroup
sched: remove redundant NULL cgroup check in task_group_path()
sched: split out css_online/css_offline from tg creation/destruction
cgroup: initialize cgrp->dentry before css_alloc()
cgroup: remove a NULL check in cgroup_exit()
cgroup: fix bogus kernel warnings when cgroup_create() failed
cgroup: remove synchronize_rcu() from rebind_subsystems()
cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}()
cgroup: use new hashtable implementation
cgroups: fix cgroup_event_listener error handling
cgroups: move cgroup_event_listener.c to tools/cgroup
cgroup: implement cgroup_rightmost_descendant()
cgroup: remove unused dummy cgroup_fork_callbacks()

Linus Torvalds
12 years ago

25 Jan, 2013

2 commits

cff3c124a sched/debug: Fix format string for 32-bit platforms ... Browse Code »

The type returned from atomic64_t can be either unsigned
long or unsigned long long, depending on the architecture.
Using a cast to unsigned long long lets us use the same
format string for all architectures.

Without this patch, building with scheduler debugging
enabled results in:

kernel/sched/debug.c: In function 'print_cfs_rq':
kernel/sched/debug.c:225:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Wformat]
kernel/sched/debug.c:225:2: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Wformat]

Signed-off-by: Arnd Bergmann
Cc: Peter Zijlstra
Cc: Paul Turner
Cc: linux-arm-kernel@list.infradead.org
Link: http://lkml.kernel.org/r/1359123276-15833-7-git-send-email-arnd@arndb.de
Signed-off-by: Ingo Molnar

Arnd Bergmann
12 years ago
2a73991b7 sched: remove redundant NULL cgroup check in task_group_path() ... Browse Code »

A task_group won't be online (thus no one can see it) until
cpu_cgroup_css_online(), and at that time tg->css.cgroup has
been initialized, so this NULL check is redundant.

Signed-off-by: Li Zefan
Signed-off-by: Tejun Heo

Li Zefan
12 years ago

24 Oct, 2012

7 commits

82958366c sched: Replace update_shares weight distribution with per-entity computation ... Browse Code »

Now that the machinery in place is in place to compute contributed load in a
bottom up fashion; replace the shares distribution code within update_shares()
accordingly.

Signed-off-by: Paul Turner
Reviewed-by: Ben Segall
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120823141507.061208672@google.com
Signed-off-by: Ingo Molnar

Paul Turner
13 years ago
bb17f6557 sched: Normalize tg load contributions against runnable time ... Browse Code »

Entities of equal weight should receive equitable distribution of cpu time.
This is challenging in the case of a task_group's shares as execution may be
occurring on multiple cpus simultaneously.

To handle this we divide up the shares into weights proportionate with the load
on each cfs_rq. This does not however, account for the fact that the sum of
the parts may be less than one cpu and so we need to normalize:
load(tg) = min(runnable_avg(tg), 1) * tg->shares
Where runnable_avg is the aggregate time in which the task_group had runnable
children.

Signed-off-by: Paul Turner
Reviewed-by: Ben Segall .
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120823141506.930124292@google.com
Signed-off-by: Ingo Molnar

Paul Turner
13 years ago
c566e8e9e sched: Aggregate total task_group load ... Browse Code »

Maintain a global running sum of the average load seen on each cfs_rq belonging
to each task group so that it may be used in calculating an appropriate
shares:weight distribution.

Signed-off-by: Paul Turner
Reviewed-by: Ben Segall
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120823141506.792901086@google.com
Signed-off-by: Ingo Molnar

Paul Turner
13 years ago
9ee474f55 sched: Maintain the load contribution of blocked entities ... Browse Code »

We are currently maintaining:

runnable_load(cfs_rq) = \Sum task_load(t)

For all running children t of cfs_rq. While this can be naturally updated for
tasks in a runnable state (as they are scheduled); this does not account for
the load contributed by blocked task entities.

This can be solved by introducing a separate accounting for blocked load:

blocked_load(cfs_rq) = \Sum runnable(b) * weight(b)

Obviously we do not want to iterate over all blocked entities to account for
their decay, we instead observe that:

runnable_load(t) = \Sum p_i*y^i

and that to account for an additional idle period we only need to compute:

y*runnable_load(t).

This means that we can compute all blocked entities at once by evaluating:

blocked_load(cfs_rq)` = y * blocked_load(cfs_rq)

Finally we maintain a decay counter so that when a sleeping entity re-awakens
we can determine how much of its load should be removed from the blocked sum.

Signed-off-by: Paul Turner
Reviewed-by: Ben Segall
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120823141506.585389902@google.com
Signed-off-by: Ingo Molnar

Paul Turner
13 years ago
2dac754e1 sched: Aggregate load contributed by task entities on parenting cfs_rq ... Browse Code »

For a given task t, we can compute its contribution to load as:

task_load(t) = runnable_avg(t) * weight(t)

On a parenting cfs_rq we can then aggregate:

runnable_load(cfs_rq) = \Sum task_load(t), for all runnable children t

Maintain this bottom up, with task entities adding their contributed load to
the parenting cfs_rq sum. When a task entity's load changes we add the same
delta to the maintained sum.

Signed-off-by: Paul Turner
Reviewed-by: Ben Segall
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120823141506.514678907@google.com
Signed-off-by: Ingo Molnar

Paul Turner
13 years ago
18bf2805d sched: Maintain per-rq runnable averages ... Browse Code »

Since runqueues do not have a corresponding sched_entity we instead embed a
sched_avg structure directly.

Signed-off-by: Ben Segall
Reviewed-by: Paul Turner
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120823141506.442637130@google.com
Signed-off-by: Ingo Molnar

Ben Segall
13 years ago
9d85f21c9 sched: Track the runnable average on a per-task entity basis ... Browse Code »

Instead of tracking averaging the load parented by a cfs_rq, we can track
entity load directly. With the load for a given cfs_rq then being the sum
of its children.

To do this we represent the historical contribution to runnable average
within each trailing 1024us of execution as the coefficients of a
geometric series.

We can express this for a given task t as:

runnable_sum(t) = \Sum u_i * y^i, runnable_avg_period(t) = \Sum 1024 * y^i
load(t) = weight_t * runnable_sum(t) / runnable_avg_period(t)

Where: u_i is the usage in the last i`th 1024us period (approximately 1ms)
~ms and y is chosen such that y^k = 1/2. We currently choose k to be 32 which
roughly translates to about a sched period.

Signed-off-by: Paul Turner
Reviewed-by: Ben Segall
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120823141506.372695337@google.com
Signed-off-by: Ingo Molnar

Paul Turner
13 years ago

14 May, 2012

1 commit

13e099d2f sched/debug: Fix printing large integers on 32-bit platforms ... Browse Code »

Some numbers like nr_running and nr_uninterruptible are fundamentally
unsigned since its impossible to have a negative amount of tasks, yet
we still print them as signed to easily recognise the underflow
condition.

rq->nr_uninterruptible has 'special' accounting and can in fact very
easily become negative on a per-cpu basis.

It was noted that since the P() macro assumes things are long long and
the promotion of unsigned 'int/long' to long long on 32bit doesn't
sign extend we print silly large numbers instead of the easier to read
signed numbers.

Therefore extend the P() macro to not require the sign extention.

Reported-by: Diwakar Tundlam
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-gk5tm8t2n4ix2vkpns42uqqp@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
13 years ago

09 May, 2012

1 commit

c82513e51 sched: Change rq->nr_running to unsigned int ... Browse Code »

Since there's a PID space limit of 30bits (see
futex.h:FUTEX_TID_MASK) and allocating that many tasks (assuming a
lower bound of 2 pages per task) would still take 8T of memory it
seems reasonable to say that unsigned int is sufficient for
rq->nr_running.

When we do get anywhere near that amount of tasks I suspect other
things would go funny, load-balancer load computations would really
need to be hoisted to 128bit etc.

So save a few bytes and convert rq->nr_running and friends to
unsigned int.

Suggested-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-y3tvyszjdmbibade5bw8zl81@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
13 years ago

27 Jan, 2012

1 commit

30fd049af sched: Remove sched_switch ... Browse Code »

Currently we don't utilize the sched_switch field anymore.

But, simply removing sched_switch field from the middle of the
sched_stat output will break tools.

So, to stay compatible we hardcode it to zero and remove the
field from the scheduler data structures.

Update the schedstat documentation accordingly.

Signed-off-by: Rakib Mullick
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1327422836.27181.5.camel@localhost.localdomain
Signed-off-by: Ingo Molnar

Rakib Mullick
13 years ago

17 Nov, 2011

1 commit

391e43da7 sched: Move all scheduler bits into kernel/sched/ ... Browse Code »
44

There's too many sched*.[ch] files in kernel/, give them their own
directory.

(No code changed, other than Makefile glue added.)

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
14 years ago