Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

20 Apr, 2014

1 commit

8f98f6f5d Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Two fixes:

- a SCHED_DEADLINE task selection fix
- a sched/numa related lockdep splat fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Check for stop task appearance when balancing happens
sched/numa: Fix task_numa_free() lockdep splat

Linus Torvalds
2014-04-20 01:40:51 +0800

17 Apr, 2014

1 commit

a1d9a3231 sched: Check for stop task appearance when balancing happens ... Browse Code »

We need to do it like we do for the other higher priority classes..

Signed-off-by: Kirill Tkhai
Cc: Michael wang
Cc: Sasha Levin
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/336561397137116@web27h.yandex.ru
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-04-17 19:39:51 +0800

11 Apr, 2014

1 commit

60e69eed8 sched/numa: Fix task_numa_free() lockdep splat ... Browse Code »

Sasha reported that lockdep claims that the following commit:
made numa_group.lock interrupt unsafe:

156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")

While I don't see how that could be, given the commit in question moved
task_numa_free() from one irq enabled region to another, the below does
make both gripes and lockups upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin
Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")
Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Cc: torvalds@linux-foundation.org
Cc: mgorman@suse.com
Cc: akpm@linux-foundation.org
Cc: Dave Jones
Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net
Signed-off-by: Ingo Molnar

Mike Galbraith
2014-04-11 16:39:15 +0800

08 Apr, 2014

3 commits

26c12d933 Merge branch 'akpm' (incoming from Andrew) ... Browse Code »

Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things

* emailed patches from Andrew Morton : (156 commits)
MAINTAINERS: update Intel C600 SAS driver maintainers
fs/ufs: remove unused ufs_super_block_third pointer
fs/ufs: remove unused ufs_super_block_second pointer
fs/ufs: remove unused ufs_super_block_first pointer
fs/ufs/super.c: add __init to init_inodecache()
doc/kernel-parameters.txt: add early_ioremap_debug
arm64: add early_ioremap support
arm64: initialize pgprot info earlier in boot
x86: use generic early_ioremap
mm: create generic early_ioremap() support
x86/mm: sparse warning fix for early_memremap
lglock: map to spinlock when !CONFIG_SMP
percpu: add preemption checks to __this_cpu ops
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
slub: use raw_cpu_inc for incrementing statistics
net: replace __this_cpu_inc in route.c with raw_cpu_inc
modules: use raw_cpu_write for initialization of per cpu refcount.
mm: use raw_cpu ops for determining current NUMA node
percpu: add raw_cpu_ops
slub: fix leak of 'name' in sysfs_slab_add
...

Linus Torvalds
2014-04-08 07:38:06 +0800
52f5684c8 kernel: use macros from compiler.h instead of __attribute__((...)) ... Browse Code »

To increase compiler portability there is which
provides convenience macros for various gcc constructs. Eg: __weak for
__attribute__((weak)). I've replaced all instances of gcc attributes
with the right macro in the kernel subsystem.

Signed-off-by: Gideon Israel Dsouza
Cc: "Rafael J. Wysocki"
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gideon Israel Dsouza
2014-04-08 07:36:11 +0800
b8780c363 sched: remove sleep_on() and friends ... Browse Code »

This is the final piece in the puzzle, as all patches to remove the
last users of \(interruptible_\|\)sleep_on\(_timeout\|\) have made it
into the 3.15 merge window. The work was long overdue, and this
interface in particular should not have survived the BKL removal
that was done a couple of years ago.

Citing Jon Corbet from http://lwn.net/2001/0201/kernel.php3":

"[...] it was suggested that the janitors look for and fix all code
that calls sleep_on() [...] since (1) almost all such code is
incorrect, and (2) Linus has agreed that those functions should
be removed in the 2.5 development series".

We haven't quite made it for 2.5, but maybe we can merge this for 3.15.

Signed-off-by: Arnd Bergmann
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Linus Torvalds

Arnd Bergmann
2014-04-08 02:24:06 +0800

04 Apr, 2014

3 commits

76ca7d1cc Merge branch 'akpm' (incoming from Andrew) ... Browse Code »

Merge first patch-bomb from Andrew Morton:
- Various misc bits
- kmemleak fixes
- small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things
- fanotify
- I appear to have become SuperH maintainer
- ocfs2 updates
- direct-io tweaks
- a bit of the MM queue
- printk updates
- MAINTAINERS maintenance
- some backlight things
- lib/ updates
- checkpatch updates
- the rtc queue
- nilfs2 updates
- Small Documentation/ updates

* emailed patches from Andrew Morton : (237 commits)
Documentation/SubmittingPatches: remove references to patch-scripts
Documentation/SubmittingPatches: update some dead URLs
Documentation/filesystems/ntfs.txt: remove changelog reference
Documentation/kmemleak.txt: updates
fs/reiserfs/super.c: add __init to init_inodecache
fs/reiserfs: move prototype declaration to header file
fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache()
fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks
fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block
nilfs2: update project's web site in nilfs2.txt
nilfs2: update MAINTAINERS file entries fix
nilfs2: verify metadata sizes read from disk
nilfs2: add FITRIM ioctl support for nilfs2
nilfs2: add nilfs_sufile_trim_fs to trim clean segs
nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
nilfs2: add nilfs_sufile_set_suinfo to update segment usage
nilfs2: add struct nilfs_suinfo_update and flags
nilfs2: update MAINTAINERS file entries
fs/coda/inode.c: add __init to init_inodecache()
BEFS: logging cleanup
...

Linus Torvalds
2014-04-04 07:22:16 +0800
c96d6660d kernel: audit/fix non-modular users of module_init in core code ... Browse Code »
5

Code that is obj-y (always built-in) or dependent on a bool Kconfig
(built-in or absent) can never be modular. So using module_init as an
alias for __initcall can be somewhat misleading.

Fix these up now, so that we can relocate module_init from init.h into
module.h in the future. If we don't do this, we'd have to add module.h
to obviously non-modular code, and that would be a worse thing.

The audit targets the following module_init users for change:
kernel/user.c obj-y
kernel/kexec.c bool KEXEC (one instance per arch)
kernel/profile.c bool PROFILING
kernel/hung_task.c bool DETECT_HUNG_TASK
kernel/sched/stats.c bool SCHEDSTATS
kernel/user_namespace.c bool USER_NS

Note that direct use of __initcall is discouraged, vs. one of the
priority categorized subgroups. As __initcall gets mapped onto
device_initcall, our use of subsys_initcall (which makes sense for these
files) will thus change this registration from level 6-device to level
4-subsys (i.e. slightly earlier). However no observable impact of that
difference has been observed during testing.

Also, two instances of missing ";" at EOL are fixed in kexec.

Signed-off-by: Paul Gortmaker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Eric Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Gortmaker
2014-04-04 07:21:07 +0800
32d01dc7b Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"A lot updates for cgroup:

- The biggest one is cgroup's conversion to kernfs. cgroup took
after the long abandoned vfs-entangled sysfs implementation and
made it even more convoluted over time. cgroup's internal objects
were fused with vfs objects which also brought in vfs locking and
object lifetime rules. Naturally, there are places where vfs rules
don't fit and nasty hacks, such as credential switching or lock
dance interleaving inode mutex and cgroup_mutex with object serial
number comparison thrown in to decide whether the operation is
actually necessary, needed to be employed.

After conversion to kernfs, internal object lifetime and locking
rules are mostly isolated from vfs interactions allowing shedding
of several nasty hacks and overall simplification. This will also
allow implmentation of operations which may affect multiple cgroups
which weren't possible before as it would have required nesting
i_mutexes.

- Various simplifications including dropping of module support,
easier cgroup name/path handling, simplified cgroup file type
handling and task_cg_lists optimization.

- Prepatory changes for the planned unified hierarchy, which is still
a patchset away from being actually operational. The dummy
hierarchy is updated to serve as the default unified hierarchy.
Controllers which aren't claimed by other hierarchies are
associated with it, which BTW was what the dummy hierarchy was for
anyway.

- Various fixes from Li and others. This pull request includes some
patches to add missing slab.h to various subsystems. This was
triggered xattr.h include removal from cgroup.h. cgroup.h
indirectly got included a lot of files which brought in xattr.h
which brought in slab.h.

There are several merge commits - one to pull in kernfs updates
necessary for converting cgroup (already in upstream through
driver-core), others for interfering changes in the fixes branch"

* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
cgroup: remove useless argument from cgroup_exit()
cgroup: fix spurious lockdep warning in cgroup_exit()
cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
cgroup: break kernfs active_ref protection in cgroup directory operations
cgroup: fix cgroup_taskset walking order
cgroup: implement CFTYPE_ONLY_ON_DFL
cgroup: make cgrp_dfl_root mountable
cgroup: drop const from @buffer of cftype->write_string()
cgroup: rename cgroup_dummy_root and related names
cgroup: move ->subsys_mask from cgroupfs_root to cgroup
cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
cgroup: reorganize cgroup bootstrapping
cgroup: relocate setting of CGRP_DEAD
cpuset: use rcu_read_lock() to protect task_cs()
cgroup_freezer: document freezer_fork() subtleties
cgroup: update cgroup_transfer_tasks() to either succeed or fail
cgroup: drop task_lock() protection around task->cgroups
cgroup: update how a newly forked task gets associated with css_set
...

Linus Torvalds
2014-04-04 04:05:42 +0800

03 Apr, 2014

1 commit

05bf58ca4 Merge branch 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull sched/idle changes from Ingo Molnar:
"More idle code reorganization, to prepare for more integration.

(Sent separately because it depended on pending timer work, which is
now upstream)"

* 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/idle: Add more comments to the code
sched/idle: Move idle conditions in cpuidle_idle main function
sched/idle: Reorganize the idle loop
cpuidle/idle: Move the cpuidle_idle_call function to idle.c
idle/cpuidle: Split cpuidle_idle_call main function into smaller functions

Linus Torvalds
2014-04-03 07:22:27 +0800

02 Apr, 2014

3 commits

7a4883773 Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer updates from Jens Axboe:
"This is the pull request for the core block IO bits for the 3.15
kernel. It's a smaller round this time, it contains:

- Various little blk-mq fixes and additions from Christoph and
myself.

- Cleanup of the IPI usage from the block layer, and associated
helper code. From Frederic Weisbecker and Jan Kara.

- Duplicate code cleanup in bio-integrity from Gu Zheng. This will
give you a merge conflict, but that should be easy to resolve.

- blk-mq notify spinlock fix for RT from Mike Galbraith.

- A blktrace partial accounting bug fix from Roman Pen.

- Missing REQ_SYNC detection fix for blk-mq from Shaohua Li"

* 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits)
blk-mq: add REQ_SYNC early
rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
blk-mq: support partial I/O completions
blk-mq: merge blk_mq_insert_request and blk_mq_run_request
blk-mq: remove blk_mq_alloc_rq
blk-mq: don't dump CPU -> hw queue map on driver load
blk-mq: fix wrong usage of hctx->state vs hctx->flags
blk-mq: allow blk_mq_init_commands() to return failure
block: remove old blk_iopoll_enabled variable
blktrace: fix accounting of partially completed requests
smp: Rename __smp_call_function_single() to smp_call_function_single_async()
smp: Remove wait argument from __smp_call_function_single()
watchdog: Simplify a little the IPI call
smp: Move __smp_call_function_single() below its safe version
smp: Consolidate the various smp_call_function_single() declensions
smp: Teach __smp_call_function_single() to check for offline cpus
smp: Remove unused list_head from csd
smp: Iterate functions through llist_for_each_entry_safe()
block: Stop abusing rq->csd.list in blk-softirq
block: Remove useless IPI struct initialization
...

Linus Torvalds
2014-04-02 10:19:15 +0800
1ead65812 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer changes from Thomas Gleixner:
"This assorted collection provides:

- A new timer based timer broadcast feature for systems which do not
provide a global accessible timer device. That allows those
systems to put CPUs into deep idle states where the per cpu timer
device stops.

- A few NOHZ_FULL related improvements to the timer wheel

- The usual updates to timer devices found in ARM SoCs

- Small improvements and updates all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
tick: Remove code duplication in tick_handle_periodic()
tick: Fix spelling mistake in tick_handle_periodic()
x86: hpet: Use proper destructor for delayed work
workqueue: Provide destroy_delayed_work_on_stack()
clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS
timer: Remove code redundancy while calling get_nohz_timer_target()
hrtimer: Rearrange comments in the order struct members are declared
timer: Use variable head instead of &work_list in __run_timers()
clocksource: exynos_mct: silence a static checker warning
arm: zynq: Add support for cpufreq
arm: zynq: Don't use arm_global_timer with cpufreq
clocksource/cadence_ttc: Overhaul clocksource frequency adjustment
clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled
clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI
sh: Remove Kconfig entries for TMU, CMT and MTU2
ARM: shmobile: Remove CMT, TMU and STI Kconfig entries
clocksource: armada-370-xp: Use atomic access for shared registers
clocksource: orion: Use atomic access for shared registers
clocksource: timer-keystone: Delete unnecessary variable
clocksource: timer-keystone: introduce clocksource driver for Keystone
...

Linus Torvalds
2014-04-02 02:00:07 +0800
a21e40877 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Ingo Molnar:
"The main purpose is to fix a full dynticks bug related to
virtualization, where steal time accounting appears to be zero in
/proc/stat even after a few seconds of competing guests running busy
loops in a same host CPU. It's not a regression though as it was
there since the beginning.

The other commits are preparatory work to fix the bug and various
cleanups"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arch: Remove stub cputime.h headers
sched: Remove needless round trip nsecs tick conversion of steal time
cputime: Fix jiffies based cputime assumption on steal accounting
cputime: Bring cputime -> nsecs conversion
cputime: Default implementation of nsecs -> cputime conversion
cputime: Fix nsecs_to_cputime() return type cast

Linus Torvalds
2014-04-02 01:16:10 +0800

01 Apr, 2014

1 commit

1f8c538ed Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Martin Schwidefsky:
"There are two memory management related changes, the CMMA support for
KVM to avoid swap-in of freed pages and the split page table lock for
the PMD level. These two come with common code changes in mm/.

A fix for the long standing theoretical TLB flush problem, this one
comes with a common code change in kernel/sched/.

Another set of changes is Heikos uaccess work, included is the initial
set of patches with more to come.

And fixes and cleanups as usual"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (36 commits)
s390/con3270: optionally disable auto update
s390/mm: remove unecessary parameter from pgste_ipte_notify
s390/mm: remove unnecessary parameter from gmap_do_ipte_notify
s390/mm: fixing comment so that parameter name match
s390/smp: limit number of cpus in possible cpu mask
hypfs: Add clarification for "weight_min" attribute
s390: update defconfigs
s390/ptrace: add support for PTRACE_SINGLEBLOCK
s390/perf: make print_debug_cf() static
s390/topology: Remove call to update_cpu_masks()
s390/compat: remove compat exec domain
s390: select CONFIG_TTY for use of tty in unconditional keyboard driver
s390/appldata_os: fix cpu array size calculation
s390/checksum: remove memset() within csum_partial_copy_from_user()
s390/uaccess: remove copy_from_user_real()
s390/sclp_early: Return correct HSA block count also for zero
s390: add some drivers/subsystems to the MAINTAINERS file
s390: improve debug feature usage
s390/airq: add support for irq ranges
s390/mm: enable split page table lock for PMD level
...

Linus Torvalds
2014-04-01 05:35:30 +0800

20 Mar, 2014

1 commit

6201b4d61 timer: Remove code redundancy while calling get_nohz_timer_target() ... Browse Code »

There are only two users of get_nohz_timer_target(): timer and hrtimer. Both
call it under same circumstances, i.e.

#ifdef CONFIG_NO_HZ_COMMON
if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
return get_nohz_timer_target();
#endif

So, it makes more sense to get all this as part of get_nohz_timer_target()
instead of duplicating code at two places. For this another parameter is
required to be passed to this routine, pinned.

Signed-off-by: Viresh Kumar
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1e1b53537217d58d48c2d7a222a9c3ac47d5b64c.1395140107.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner

Viresh Kumar
2014-03-20 19:35:46 +0800

13 Mar, 2014

2 commits

300a9d887 sched: Remove needless round trip nsecs <-> tick conversion of steal time ... Browse Code »

When update_rq_clock_task() accounts the pending steal time for a task,
it converts the steal delta from nsecs to tick then from tick to nsecs.

There is no apparent good reason for doing that though because both
the task clock and the prev steal delta are u64 and store values
in nsecs.

So lets remove the needless conversion.

Cc: Ingo Molnar
Cc: Marcelo Tosatti
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Acked-by: Rik van Riel
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-03-13 22:56:44 +0800
dee08a72d cputime: Fix jiffies based cputime assumption on steal accounting ... Browse Code »

The steal guest time accounting code assumes that cputime_t is based on
jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t
is based on nsecs, steal_account_process_tick() passes the delta in
jiffies to account_steal_time() which then accounts it as if it's a
value in nsecs.

As a result, accounting 1 second of steal time (with HZ=100 that would
be 100 jiffies) is spuriously accounted as 100 nsecs.

As such /proc/stat may report 0 values of steal time even when two
guests have run concurrently for a few seconds on the same host and
same CPU.

In order to fix this, lets convert the nsecs based steal delta to
cputime instead of jiffies by using the right conversion API.

Given that the steal time is stored in cputime_t and this type can have
a smaller granularity than nsecs, we only account the rounded converted
value and leave the remaining nsecs for the next deltas.

Reported-by: Huiqingding
Reported-by: Marcelo Tosatti
Cc: Ingo Molnar
Cc: Marcelo Tosatti
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Acked-by: Rik van Riel
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-03-13 22:56:44 +0800

12 Mar, 2014

3 commits

6037dd1a4 sched: Clean up the task_hot() function ... Browse Code »

task_hot() doesn't need the 'sched_domain' parameter, so remove it.

Signed-off-by: Alex Shi
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex.shi@linaro.org
Signed-off-by: Ingo Molnar

Alex Shi
2014-03-12 17:49:01 +0800
a2cd42601 sched: Remove double calculation in fix_small_imbalance() ... Browse Code »

The tmp value has been already calculated in:

scaled_busy_load_per_task =
(busiest->load_per_task * SCHED_POWER_SCALE) /
busiest->group_power;

Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1394555166-22894-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2014-03-12 17:49:00 +0800
383afd097 sched: Fix broken setscheduler() ... Browse Code »

I decided to run my tests on linux-next, and my wakeup_rt tracer was
broken. After running a bisect, I found that the problem commit was:

linux-next commit c365c292d059
"sched: Consider pi boosting in setscheduler()"

And the reason the wake_rt tracer test was failing, was because it had
no RT task to trace. I first noticed this when running with
sched_switch event and saw that my RT task still had normal SCHED_OTHER
priority. Looking at the problem commit, I found:

- p->normal_prio = normal_prio(p);
- p->prio = rt_mutex_getprio(p);

With no

+ p->normal_prio = normal_prio(p);
+ p->prio = rt_mutex_getprio(p);

Reading what the commit is suppose to do, I realize that the p->prio
can't be set if the task is boosted with a higher prio, but the
p->normal_prio still needs to be set regardless, otherwise, when the
task is deboosted, it wont get the new priority.

The p->prio has to be set before "check_class_changed()" is called,
otherwise the class wont be changed.

Also added fix to newprio to include a check for deadline policy that
was missing. This change was suggested by Juri Lelli.

Signed-off-by: Steven Rostedt
Cc: SebastianAndrzej Siewior
Cc: Juri Lelli
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20140306120438.638bfe94@gandalf.local.home
Signed-off-by: Ingo Molnar

Steven Rostedt
2014-03-12 17:48:59 +0800

11 Mar, 2014

12 commits

156654f49 sched/numa: Move task_numa_free() to __put_task_struct() ... Browse Code »

Bad idea on -rt:

[ 908.026136] [] rt_spin_lock_slowlock+0xaa/0x2c0
[ 908.026145] [] task_numa_free+0x31/0x130
[ 908.026151] [] finish_task_switch+0xce/0x100
[ 908.026156] [] thread_return+0x48/0x4ae
[ 908.026160] [] schedule+0x25/0xa0
[ 908.026163] [] rt_spin_lock_slowlock+0xd5/0x2c0
[ 908.026170] [] get_signal_to_deliver+0xaf/0x680
[ 908.026175] [] do_signal+0x3d/0x5b0
[ 908.026179] [] do_notify_resume+0x90/0xe0
[ 908.026186] [] int_signal+0x12/0x17
[ 908.026193] [] 0x7ff2a388b1cf

and since upstream does not mind where we do this, be a bit nicer ...

Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Cc: Mel Gorman
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
Signed-off-by: Ingo Molnar

Mike Galbraith
2014-03-11 19:05:43 +0800
35805ff8f sched/fair: Fix endless loop in idle_balance() ... Browse Code »

Check for fair tasks number to decide, that we've pulled a task.
rq's nr_running may contain throttled RT tasks.

Signed-off-by: Kirill Tkhai
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1394118975.19290.104.camel@tkhai
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-03-11 19:05:41 +0800
4c6c4e38c sched/core: Fix endless loop in pick_next_task() ... Browse Code »
26

1) Single cpu machine case.

When rq has only RT tasks, but no one of them can be picked
because of throttling, we enter in endless loop.

pick_next_task_{dl,rt} return NULL.

In pick_next_task_fair() we permanently go to retry

if (rq->nr_running != rq->cfs.h_nr_running)
return RETRY_TASK;

(rq->nr_running is not being decremented when rt_rq becomes
throttled).

No chances to unthrottle any rt_rq or to wake fair here,
because of rq is locked permanently and interrupts are
disabled.

2) In case of SMP this can cause a hang too. Although we unlock
rq in idle_balance(), interrupts are still disabled.

The solution is to check for available tasks in DL and RT
classes instead of checking for sum.

Signed-off-by: Kirill Tkhai
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1394098321.19290.11.camel@tkhai
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-03-11 19:05:39 +0800
e4aa358b6 sched/fair: Push down check for high priority class task into idle_balance() ... Browse Code »

We close idle_exit_fair() bracket in case of we've pulled something or we've received
task of high priority class.

Signed-off-by: Kirill Tkhai
Signed-off-by: Peter Zijlstra
Cc: Vincent Guittot
Link: http://lkml.kernel.org/r/1394098315.19290.10.camel@tkhai
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-03-11 19:05:37 +0800
734ff2a71 sched/rt: Fix picking RT and DL tasks from empty queue ... Browse Code »

The problems:

1) We check for rt_nr_running before call of put_prev_task().
If previous task is RT, its rt_rq may become throttled
and dequeued after this call.

In case of p is from rt->rq this just causes picking a task
from throttled queue, but in case of its rt_rq is child
we are guaranteed catch BUG_ON.

2) The same with deadline class. The only difference we operate
on only dl_rq.

This patch fixes all the above problems and it adds a small skip in the
DL update like we've already done for RT class:

if (unlikely((s64)delta_exec
Signed-off-by: Peter Zijlstra
Cc: Juri Lelli
Link: http://lkml.kernel.org/r/1393946746.3643.3.camel@tkhai
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-03-11 19:05:35 +0800
a1d028bd6 sched/idle: Add more comments to the code ... Browse Code »

The idle main function is a complex and a critical function. Added more
comments to the code.

Signed-off-by: Daniel Lezcano
Acked-by: Nicolas Pitre
Signed-off-by: Peter Zijlstra
Cc: tglx@linutronix.de
Cc: rjw@rjwysocki.net
Cc: preeti@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1393832934-11625-5-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar

Daniel Lezcano
2014-03-11 18:52:49 +0800
8ca3c6424 sched/idle: Move idle conditions in cpuidle_idle main function ... Browse Code »

This patch moves the condition before entering idle into the cpuidle main
function located in idle.c. That simplify the idle mainloop functions and
increase the readibility of the conditions to enter truly idle.

This patch is code reorganization and does not change the behavior of the
function.

Signed-off-by: Daniel Lezcano
Signed-off-by: Peter Zijlstra
Cc: tglx@linutronix.de
Cc: rjw@rjwysocki.net
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1393832934-11625-4-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar

Daniel Lezcano
2014-03-11 18:52:48 +0800
c8cc7d4de sched/idle: Reorganize the idle loop ... Browse Code »
2

Now that we have the main cpuidle function in idle.c, move some code from
the idle mainloop to this function for the sake of clarity.

That removes if then else indentation difficult to follow when looking at the
code. This patch does not change the current behavior.

Signed-off-by: Daniel Lezcano
Acked-by: Nicolas Pitre
Signed-off-by: Peter Zijlstra
Cc: tglx@linutronix.de
Cc: rjw@rjwysocki.net
Cc: preeti@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1393832934-11625-3-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar

Daniel Lezcano
2014-03-11 18:52:47 +0800
30cdd69e2 cpuidle/idle: Move the cpuidle_idle_call function to idle.c ... Browse Code »

The cpuidle_idle_call does nothing more than calling the three individuals
function and is no longer used by any arch specific code but only in the
cpuidle framework code.

We can move this function into the idle task code to ensure better
proximity to the scheduler code.

Signed-off-by: Daniel Lezcano
Acked-by: Nicolas Pitre
Signed-off-by: Peter Zijlstra
Cc: rjw@rjwysocki.net
Cc: preeti@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1393832934-11625-2-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar

Daniel Lezcano
2014-03-11 18:52:45 +0800
a02ed5e3e Merge branch 'sched/urgent' into sched/core ... Browse Code »

Pick up fixes before queueing up new changes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2014-03-11 18:34:27 +0800
96b3d28bf sched/clock: Prevent tracing recursion in sched_clock_cpu() ... Browse Code »

Prevent tracing of preempt_disable/enable() in sched_clock_cpu().
When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are
traced and this causes trace_clock() users (and probably others) to
go into an infinite recursion. Systems with a stable sched_clock()
are not affected.

This problem is similar to that fixed by upstream commit 95ef1e52922
("KVM guest: prevent tracing recursion with kvmclock").

Signed-off-by: Fernando Luis Vazquez Cao
Signed-off-by: Peter Zijlstra
Acked-by: Steven Rostedt
Cc: Andrew Morton
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1394083528.4524.3.camel@nexus
Signed-off-by: Ingo Molnar

Fernando Luis Vazquez Cao
2014-03-11 18:33:48 +0800
d44753b84 sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy ... Browse Code »

Deny the use of SCHED_DEADLINE policy to unprivileged users.
Even if root users can set the policy for normal users, we
don't want the latter to be able to change their parameters
(safest behavior).

Signed-off-by: Juri Lelli
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1393844961-18097-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar

Juri Lelli
2014-03-11 18:33:46 +0800

27 Feb, 2014

7 commits

37e117c07 sched: Guarantee task priority in pick_next_task() ... Browse Code »

Michael spotted that the idle_balance() push down created a task
priority problem.

Previously, when we called idle_balance() before pick_next_task() it
wasn't a problem when -- because of the rq->lock droppage -- an rt/dl
task slipped in.

Similarly for pre_schedule(), rt pre-schedule could have a dl task
slip in.

But by pulling it into the pick_next_task() loop, we'll not try a
higher task priority again.

Cure this by creating a re-start condition in pick_next_task(); and
triggering this from pick_next_task_{rt,fair}().

It also fixes a live-lock where we get stuck in pick_next_task_fair()
due to idle_balance() seeing !0 nr_running but there not actually
being any fair tasks about.

Reported-by: Michael Wang
Fixes: 38033c37faab ("sched: Push down pre_schedule() and idle_balance()")
Tested-by: Sasha Levin
Signed-off-by: Peter Zijlstra
Cc: Juri Lelli
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/20140224121218.GR15586@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-02-27 19:41:02 +0800
06d50c65b sched/idle: Remove stale old file ... Browse Code »

Commit cf37b6b48428d ("sched/idle: Move cpu/idle.c to sched/idle.c")
said to simply move a file; somehow it got mangled and created an old
version of the file and forgot to remove the old file.

Fix this fail; add the lost change and remove the now identical old
file.

Signed-off-by: Peter Zijlstra
Cc: rjw@rjwysocki.net
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: Daniel Lezcano
Link: http://lkml.kernel.org/r/20140224172207.GC9987@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-02-27 19:41:01 +0800
f5f9739d7 sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED ... Browse Code »

The struct sched_avg of struct rq is only used in case group
scheduling is enabled inside __update_tg_runnable_avg() to update
per-cpu representation of a task group. I.e. that there is no need to
maintain the runnable avg of a rq in the !CONFIG_FAIR_GROUP_SCHED case.

This patch guards struct sched_avg of struct rq and
update_rq_runnable_avg() with CONFIG_FAIR_GROUP_SCHED.

There is an extra empty definition for update_rq_runnable_avg()
necessary for the !CONFIG_FAIR_GROUP_SCHED && CONFIG_SMP case.

The function print_cfs_group_stats() which prints out struct sched_avg
of struct rq is already guarded with CONFIG_FAIR_GROUP_SCHED.

Reviewed-by: Ben Segall
Signed-off-by: Dietmar Eggemann
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/530DCDC5.1060406@arm.com
Signed-off-by: Ingo Molnar

Dietmar Eggemann
2014-02-27 19:41:00 +0800
faa599373 sched/deadline: Prevent rt_time growth to infinity ... Browse Code »

Kirill Tkhai noted:

Since deadline tasks share rt bandwidth, we must care about
bandwidth timer set. Otherwise rt_time may grow up to infinity
in update_curr_dl(), if there are no other available RT tasks
on top level bandwidth.

RT task were in fact throttled right after they got enqueued,
and never executed again (rt_time never again went below rt_runtime).

Peter then proposed to accrue DL execution on rt_time only when
rt timer is active, and proposed a patch (this patch is a slight
modification of that) to implement that behavior. While this
solves Kirill problem, it has a drawback.

Indeed, Kirill noted again:

It looks we may get into a situation, when all CPU time is shared
between RT and DL tasks:

rt_runtime = n
rt_period = 2n

| RT working, DL sleeping | DL working, RT sleeping |
-----------------------------------------------------------
| (1) duration = n | (2) duration = n | (repeat)
|--------------------------|------------------------------|
| (rt_bw timer is running) | (rt_bw timer is not running) |

No time for fair tasks at all.

While this can happen during the first period, if rq is always backlogged,
RT tasks won't have the opportunity to execute anymore: rt_time reached
rt_runtime during (1), suppose after (2) RT is enqueued back, it gets
throttled since rt timer didn't fire, replenishment is from now on eaten up
by DL tasks that accrue their execution on rt_time (while rt timer is
active - we have an RT task waiting for replenishment). FAIR tasks are
not touched after this first period. Ok, this is not ideal, and the situation
is even worse!

What above (the nice case), practically never happens in reality, where
your rt timer is not aligned to tasks periods, tasks are in general not
periodic, etc.. Long story short, you always risk to overload your system.

This patch is based on Peter's idea, but exploits an additional fact:
if you don't have RT tasks enqueued, it makes little sense to continue
incrementing rt_time once you reached the upper limit (DL tasks have their
own mechanism for throttling).

This cures both problems:

- no matter how many DL instances in the past, you'll have an rt_time
slightly above rt_runtime when an RT task is enqueued, and from that
point on (after the first replenishment), the task will normally execute;

- you can still eat up all bandwidth during the first period, but not
anymore after that, remember that DL execution will increment rt_time
till the upper limit is reached.

The situation is still not perfect! But, we have a simple solution for now,
that limits how much you can jeopardize your system, as we keep working
towards the right answer: RT groups scheduled using deadline servers.

Reported-by: Kirill Tkhai
Signed-off-by: Juri Lelli
Signed-off-by: Peter Zijlstra
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/20140225151515.617714e2f2cd6c558531ba61@gmail.com
Signed-off-by: Ingo Molnar

Juri Lelli
2014-02-27 19:29:41 +0800
eec751ed4 sched/deadline: Switch CPU's presence test order ... Browse Code »

Commit 82b9580 ("sched/deadline: Test for CPU's presence explicitly")
changed how we check if a CPU returned by cpudeadline machinery is
valid. But, we don't want to call cpu_present() if best_cpu is
equal to -1. So, switch the order of tests inside WARN_ON().

Signed-off-by: Juri Lelli
Signed-off-by: Peter Zijlstra
Cc: boris.ostrovsky@oracle.com
Cc: konrad.wilk@oracle.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1393238832-9100-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar

Juri Lelli
2014-02-27 19:29:40 +0800
3908ac13b sched/deadline: Cleanup RT leftovers from {inc/dec}_dl_migration ... Browse Code »

In deadline class we do not have group scheduling.

So, let's remove unnecessary

X = X;

equations.

Signed-off-by: Kirill Tkhai
Signed-off-by: Peter Zijlstra
Cc: Juri Lelli
Link: http://lkml.kernel.org/r/1393343543.4089.5.camel@tkhai
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-02-27 19:29:39 +0800
791c9e029 sched: Fix double normalization of vruntime ... Browse Code »

dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
which appears to guarentee that the !se->on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.

In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.

This patch changes switched_from_fair() to use !p->on_rq instead of
!se->on_rq.

I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.

Signed-off-by: George McCollister
Signed-off-by: Peter Zijlstra
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.com
Signed-off-by: Ingo Molnar

George McCollister
2014-02-27 19:29:38 +0800

25 Feb, 2014

1 commit

c46fff2a3 smp: Rename __smp_call_function_single() to smp_call_function_single_async() ... Browse Code »

The name __smp_call_function_single() doesn't tell much about the
properties of this function, especially when compared to
smp_call_function_single().

The comments above the implementation are also misleading. The main
point of this function is actually not to be able to embed the csd
in an object. This is actually a requirement that result from the
purpose of this function which is to raise an IPI asynchronously.

As such it can be called with interrupts disabled. And this feature
comes at the cost of the caller who then needs to serialize the
IPIs on this csd.

Lets rename the function and enhance the comments so that they reflect
these properties.

Suggested-by: Christoph Hellwig
Cc: Andrew Morton
Cc: Christoph Hellwig
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Frederic Weisbecker
Signed-off-by: Jens Axboe

Frederic Weisbecker
2014-02-25 06:47:15 +0800