Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

10 Feb, 2015

5 commits

0ba97bc4b Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Ingo Molnar:
"The main changes in this cycle were:

- rework hrtimer expiry calculation in hrtimer_interrupt(): the
previous code had a subtle bug where expiry caching would miss an
expiry, resulting in occasional bogus (late) expiry of hrtimers.

- continuing Y2038 fixes

- ktime division optimization

- misc smaller fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Make __hrtimer_get_next_event() static
rtc: Convert rtc_set_ntp_time() to use timespec64
rtc: Remove redundant rtc_valid_tm() from rtc_hctosys()
rtc: Modify rtc_hctosys() to address y2038 issues
rtc: Update rtc-dev to use y2038-safe time interfaces
rtc: Update interface.c to use y2038-safe time interfaces
time: Expose get_monotonic_boottime64 for in-kernel use
time: Expose getboottime64 for in-kernel uses
ktime: Optimize ktime_divns for constant divisors
hrtimer: Prevent stale expiry time in hrtimer_interrupt()
ktime.h: Introduce ktime_ms_delta

Linus Torvalds
2015-02-10 08:33:07 +0800
5b9b28a63 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main scheduler changes in this cycle were:

- various sched/deadline fixes and enhancements

- rescheduling latency fixes/cleanups

- rework the rq->clock code to be more consistent and more robust.

- minor micro-optimizations

- ->avg.decay_count fixes

- add a stack overflow check to might_sleep()

- idle-poll handler fix, possibly resulting in power savings

- misc smaller updates and fixes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/Documentation: Remove unneeded word
sched/wait: Introduce wait_on_bit_timeout()
sched: Pull resched loop to __schedule() callers
sched/deadline: Remove cpu_active_mask from cpudl_find()
sched: Fix hrtick_start() on UP
sched/deadline: Avoid pointless __setscheduler()
sched/deadline: Fix stale yield state
sched/deadline: Fix hrtick for a non-leftmost task
sched/deadline: Modify cpudl::free_cpus to reflect rd->online
sched/idle: Add missing checks to the exit condition of cpu_idle_poll()
sched: Fix missing preemption opportunity
sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target
sched/debug: Print rq->clock_task
sched/core: Rework rq->clock update skips
sched/core: Validate rq_clock*() serialization
sched/core: Remove check of p->sched_class
sched/fair: Fix sched_entity::avg::decay_count initialization
sched/debug: Fix potential call to __ffs(0) in sched_show_task()
sched/debug: Check for stack overflow in ___might_sleep()
sched/fair: Fix the dealing with decay_count in __synchronize_entity_decay()

Linus Torvalds
2015-02-10 08:06:06 +0800
a4cbbf549 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"Kernel side changes:

- AMD range breakpoints support:

Extend breakpoint tools and core to support address range through
perf event with initial backend support for AMD extended
breakpoints.

The syntax is:

perf record -e mem:addr/len:type

For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)

perf record -e mem:0x1000/512:w

- event throttling/rotating fixes

- various event group handling fixes, cleanups and general paranoia
code to be more robust against bugs in the future.

- kernel stack overhead fixes

User-visible tooling side changes:

- Show precise number of samples in at the end of a 'record' session,
if processing build ids, since we will then traverse the whole
perf.data file and see all the PERF_RECORD_SAMPLE records,
otherwise stop showing the previous off-base heuristicly counted
number of "samples" (Namhyung Kim).

- Support to read compressed module from build-id cache (Namhyung
Kim)

- Enable sampling loads and stores simultaneously in 'perf mem'
(Stephane Eranian)

- 'perf diff' output improvements (Namhyung Kim)

- Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho
de Melo)

Tooling side infrastructure changes:

- Cache eh/debug frame offset for dwarf unwind (Namhyung Kim)

- Support parsing parameterized events (Cody P Schafer)

- Add support for IP address formats in libtraceevent (David Ahern)

Plus other misc fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
perf: Decouple unthrottling and rotating
perf: Drop module reference on event init failure
perf: Use POLLIN instead of POLL_IN for perf poll data in flag
perf: Fix put_event() ctx lock
perf: Fix move_group() order
perf: Fix event->ctx locking
perf: Add a bit of paranoia
perf symbols: Convert lseek + read to pread
perf tools: Use perf_data_file__fd() consistently
perf symbols: Support to read compressed module from build-id cache
perf evsel: Set attr.task bit for a tracking event
perf header: Set header version correctly
perf record: Show precise number of samples
perf tools: Do not use __perf_session__process_events() directly
perf callchain: Cache eh/debug frame offset for dwarf unwind
perf tools: Provide stub for missing pthread_attr_setaffinity_np
perf evsel: Don't rely on malloc working for sz 0
tools lib traceevent: Add support for IP address formats
perf ui/tui: Show fatal error message only if exists
perf tests: Fix typo in sample-parsing.c
...

Linus Torvalds
2015-02-10 07:43:55 +0800
8308756f4 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core locking updates from Ingo Molnar:
"The main changes are:

- mutex, completions and rtmutex micro-optimizations
- lock debugging fix
- various cleanups in the MCS and the futex code"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Optimize setting task running after being blocked
locking/rwsem: Use task->state helpers
sched/completion: Add lock-free checking of the blocking case
sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state
locking/mutex: Explicitly mark task as running after wakeup
futex: Fix argument handling in futex_lock_pi() calls
doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants
locking/Documentation: Update code path
softirq/preempt: Add missing current->preempt_disable_ip update
locking/osq: No need for load/acquire when acquire-polling
locking/mcs: Better differentiate between MCS variants
locking/mutex: Introduce ww_mutex_set_context_slowpath()
locking/mutex: Move MCS related comments to proper location
locking/mutex: Checking the stamp is WW only

Linus Torvalds
2015-02-10 07:24:03 +0800
23e8fe2e1 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU updates from Ingo Molnar:
"The main RCU changes in this cycle are:

- Documentation updates.

- Miscellaneous fixes.

- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.

- SRCU updates.

- RCU CPU stall-warning updates.

- RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
rcu: Initialize tiny RCU stall-warning timeouts at boot
rcu: Fix RCU CPU stall detection in tiny implementation
rcu: Add GP-kthread-starvation checks to CPU stall warnings
rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
rcu: Optionally run grace-period kthreads at real-time priority
ksoftirqd: Use new cond_resched_rcu_qs() function
ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
rcutorture: Add more diagnostics in rcu_barrier() test failure case
torture: Flag console.log file to prevent holdovers from earlier runs
torture: Add "-enable-kvm -soundhw pcspk" to qemu command line
rcutorture: Handle different mpstat versions
rcutorture: Check from beginning to end of grace period
rcu: Remove redundant rcu_batches_completed() declaration
rcutorture: Drop rcu_torture_completed() and friends
rcu: Provide rcu_batches_completed_sched() for TINY_RCU
rcutorture: Use unsigned for Reader Batch computations
rcutorture: Make build-output parsing correctly flag RCU's warnings
rcu: Make _batches_completed() functions return unsigned long
rcutorture: Issue warnings on close calls due to Reader Batch blows
documentation: Fix smp typo in memory-barriers.txt
...

Linus Torvalds
2015-02-10 06:28:42 +0800

07 Feb, 2015

3 commits

26cdd1f76 Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git… ... Browse Code »

….kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer and x86 fix from Ingo Molnar:
"A CLOCK_TAI early expiry fix and an x86 microcode driver oops fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode: Return error from driver init code when loader is disabled

Linus Torvalds
2015-02-07 05:56:02 +0800
396e9099e Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Misc fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix deadline parameter modification handling
sched/wait: Remove might_sleep() from wait_event_cmd()
sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask
sched/fair: Avoid using uninitialized variable in preferred_group_nid()

Linus Torvalds
2015-02-07 05:34:26 +0800
29f12c48d Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core kernel fixes from Ingo Molnar:
"Two liblockdep fixes and a CPU hotplug race fix"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/liblockdep: don't include host headers
tools/liblockdep: ignore generated .so file
smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()

Linus Torvalds
2015-02-07 05:06:10 +0800

05 Feb, 2015

1 commit

2d926c15d hrtimer: Fix incorrect tai offset calculation for non high-res timer systems ... Browse Code »

I noticed some CLOCK_TAI timer test failures on one of my
less-frequently used configurations. And after digging in I
found in 76f4108892d9 (Cleanup hrtimer accessors to the
timekepeing state), the hrtimer_get_softirq_time tai offset
calucation was incorrectly rewritten, as the tai offset we
return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME.

This results in CLOCK_TAI timers expiring early on non-highres
capable machines.

This patch fixes the issue, calculating the tai time properly
from the monotonic base.

Signed-off-by: John Stultz
Cc: Thomas Gleixner
Cc: stable # 3.17+
Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2015-02-05 15:39:37 +0800

04 Feb, 2015

23 commits

2fde4f94e perf: Decouple unthrottling and rotating ... Browse Code »

Currently the adjusments made as part of perf_event_task_tick() use the
percpu rotation lists to iterate over any active PMU contexts, but these
are not used by the context rotation code, having been replaced by
separate (per-context) hrtimer callbacks. However, some manipulation of
the rotation lists (i.e. removal of contexts) has remained in
perf_rotate_context(). This leads to the following issues:

* Contexts are not always removed from the rotation lists. Removal of
PMUs which have been placed in rotation lists, but have not been
removed by a hrtimer callback can result in corruption of the rotation
lists (when memory backing the context is freed).

This has been observed to result in hangs when PMU drivers built as
modules are inserted and removed around the creation of events for
said PMUs.

* Contexts which do not require rotation may be removed from the
rotation lists as a result of a hrtimer, and will not be considered by
the unthrottling code in perf_event_task_tick.

This patch fixes the issue by updating the rotation ist when events are
scheduled in/out, ensuring that each rotation list stays in sync with
the HW state. As each event holds a refcount on the module of its PMU,
this ensures that when a PMU module is unloaded none of its CPU contexts
can be in a rotation list. By maintaining a list of perf_event_contexts
rather than perf_event_cpu_contexts, we don't need separate paths to
handle the cpu and task contexts, which also makes the code a little
simpler.

As the rotation_list variables are not used for rotation, these are
renamed to active_ctx_list, which better matches their current function.
perf_pmu_rotate_{start,stop} are renamed to
perf_pmu_ctx_{activate,deactivate}.

Reported-by: Johannes Jensen
Signed-off-by: Mark Rutland
Signed-off-by: Peter Zijlstra (Intel)
Cc: Will Deacon
Cc: Arnaldo Carvalho de Melo
Cc: Fengguang Wu
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostej
Signed-off-by: Ingo Molnar

Mark Rutland
2015-02-04 15:07:16 +0800
cc34b98ba perf: Drop module reference on event init failure ... Browse Code »

When initialising an event, perf_init_event will call try_module_get() to
ensure that the PMU's module cannot be removed for the lifetime of the
event, with __free_event() dropping the reference when the event is
finally destroyed. If something fails after the event has been
initialised, but before the event is installed, perf_event_alloc will
drop the reference on the module.

However, if we fail to initialise an event for some reason (e.g. we ask
an uncore PMU to perform sampling, and it refuses to initialise the
event), we do not drop the refcount. If we try to open such a bogus
event without a precise IDR type, we will loop over each PMU in the pmus
list, incrementing each of their refcounts without decrementing them.

This patch adds a module_put when pmu->event_init(event) fails, ensuring
that the refcounts are balanced in failure cases. As the innards of the
precise and search based initialisation look very similar, this logic is
hoisted out into a new helper function. While the early return for the
failed try_module_get is removed from the search case, this is handled
by the remaining return when ret is not -ENOENT.

Signed-off-by: Mark Rutland
Signed-off-by: Peter Zijlstra (Intel)
Cc: Will Deacon
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1420642611-22667-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2015-02-04 15:07:14 +0800
7c60fc0e0 perf: Use POLLIN instead of POLL_IN for perf poll data in flag ... Browse Code »

Currently we flag available data (via poll syscall) on perf fd with
POLL_IN macro, which is normally used for SIGIO interface.

We've been lucky, because POLLIN (0x1) is subset of POLL_IN (0x20001)
and sys_poll (do_pollfd function) cut the extra bit out (0x20000).

Signed-off-by: Jiri Olsa
Signed-off-by: Peter Zijlstra (Intel)
Cc: Frederic Weisbecker
Cc: Namhyung Kim
Cc: Stephane Eranian
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1422467678-22341-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar

Jiri Olsa
2015-02-04 15:07:13 +0800
a83fe28e2 perf: Fix put_event() ctx lock ... Browse Code »
26

So what I suspect; but I'm in zombie mode today it seems; is that while
I initially thought that it was impossible for ctx to change when
refcount dropped to 0, I now suspect its possible.

Note that until perf_remove_from_context() the event is still active and
visible on the lists. So a concurrent sys_perf_event_open() from another
task into this task can race.

Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra (Intel)
Cc: Stephane Eranian
Cc: mark.rutland@arm.com
Cc: Jiri Olsa
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20150129134434.GB26304@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-02-04 15:07:12 +0800
8f95b435b perf: Fix move_group() order ... Browse Code »

Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over
the weekend:

event_sched_out.isra.79+0x2b9/0x2d0
group_sched_out+0x69/0xc0
ctx_sched_out+0x106/0x130
task_ctx_sched_out+0x37/0x70
__perf_install_in_context+0x70/0x1a0
remote_function+0x48/0x60
generic_exec_single+0x15b/0x1d0
smp_call_function_single+0x67/0xa0
task_function_call+0x53/0x80
perf_install_in_context+0x8b/0x110

I think the below should cure this; if we install a group leader it
will iterate the (still intact) group list and find its siblings and
try and install those too -- even though those still have the old
event->ctx -- in the new ctx.

Upon installing the first group sibling we'd try and schedule out the
group and trigger the above warn.

Fix this by installing the group leader last, installing siblings
would have no effect, they're not reachable through the group lists
and therefore we don't schedule them.

Also delay resetting the state until we're absolutely sure the events
are quiescent.

Reported-by: Jiri Olsa
Reported-by: vincent.weaver@maine.edu
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20150126162639.GA21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra (Intel)
2015-02-04 15:07:11 +0800
f63a8daa5 perf: Fix event->ctx locking ... Browse Code »
4

There have been a few reported issues wrt. the lack of locking around
changing event->ctx. This patch tries to address those.

It avoids the whole rwsem thing; and while it appears to work, please
give it some thought in review.

What I did fail at is sensible runtime checks on the use of
event->ctx, the RCU use makes it very hard.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Paul E. McKenney
Cc: Jiri Olsa
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-02-04 15:07:10 +0800
652884fe0 perf: Add a bit of paranoia ... Browse Code »

Add a few WARN()s to catch things that should never happen.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20150123125834.150481799@infradead.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-02-04 15:07:09 +0800
8f4bf4bcc Merge tag 'v3.19-rc7' into perf/core, to merge fixes before applying new changes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2015-02-04 14:58:29 +0800
afffc6c18 locking/rtmutex: Optimize setting task running after being blocked ... Browse Code »
13

We explicitly mark the task running after returning from
a __rt_mutex_slowlock() call, which does the actual sleeping
via wait-wake-trylocking. As such, this patch does two things:

(1) refactors the code so that setting current to TASK_RUNNING
is done by __rt_mutex_slowlock(), and not by the callers. The
downside to this is that it becomes a bit unclear when at what
point we block. As such I've added a comment that the task
blocks when calling __rt_mutex_slowlock() so readers can figure
out when it is running again.

(2) relaxes setting current's state through __set_current_state(),
instead of it's more expensive barrier alternative. There was no
need for the implied barrier as we're obviously not planning on
blocking.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1422857784.18096.1.camel@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2015-02-04 14:57:42 +0800
73105994c locking/rwsem: Use task->state helpers ... Browse Code »

Call __set_task_state() instead of assigning the new state
directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP
environments, keeping track of who last changed the state.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: "Paul E. McKenney"
Cc: Jason Low
Cc: Michel Lespinasse
Cc: Tim Chen
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1422257769-14083-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2015-02-04 14:57:39 +0800
7c34e3180 sched/completion: Add lock-free checking of the blocking case ... Browse Code »

The "thread would block" case can be checked without grabbing ->wait.lock.

[ If the check does not return early then grab the lock and recheck.
A memory barrier is not needed as complete() and complete_all() imply
a barrier.

The ACCESS_ONCE() is needed for calls in a loop that, if inlined, could
optimize out the re-fetching of x->done. ]

Signed-off-by: Nicholas Mc Guire
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1422013307-13200-1-git-send-email-der.herr@hofr.at
Signed-off-by: Ingo Molnar

Nicholas Mc Guire
2015-02-04 14:57:37 +0800
de30ec473 sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state ... Browse Code »
26

Signed-off-by: Nicholas Mc Guire
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1421467534-22834-1-git-send-email-der.herr@hofr.at
Signed-off-by: Ingo Molnar

Nicholas Mc Guire
2015-02-04 14:57:36 +0800
51587bcf3 locking/mutex: Explicitly mark task as running after wakeup ... Browse Code »

By the time we wake up and get the lock after being asleep
in the slowpath, we better be running. As good practice,
be explicit about this and avoid any mischief.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: "Paul E. McKenney"
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1421717961.4903.11.camel@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2015-02-04 14:57:33 +0800
2622e849a Merge tag 'v3.19-rc7' into locking/core, to refresh the branch before applying new changes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2015-02-04 14:53:17 +0800
139b6fd26 sched/Documentation: Remove unneeded word ... Browse Code »

The second 'mutex' shouldn't be there, it can't be about the mutex,
as the mutex can't be freed, but unlocked, the memory where the
mutex resides however, can be freed.

Signed-off-by: Sharon Dvir
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1422827252-31363-1-git-send-email-sharon.dvir1@mail.huji.ac.il
Signed-off-by: Ingo Molnar

Sharon Dvir
2015-02-04 14:52:33 +0800
bfd9b2b5f sched: Pull resched loop to __schedule() callers ... Browse Code »

__schedule() disables preemption during its job and re-enables it
afterward without doing a preemption check to avoid recursion.

But if an event happens after the context switch which requires
rescheduling, we need to check again if a task of a higher priority
needs the CPU. A preempt irq can raise such a situation. To handle that,
__schedule() loops on need_resched().

But preempt_schedule_*() functions, which call __schedule(), also loop
on need_resched() to handle missed preempt irqs. Hence we end up with
the same loop happening twice.

Lets simplify that by attributing the need_resched() loop responsibility
to all __schedule() callers.

There is a risk that the outer loop now handles reschedules that used
to be handled by the inner loop with the added overhead of caller details
(inc/dec of PREEMPT_ACTIVE, irq save/restore) but assuming those inner
rescheduling loop weren't too frequent, this shouldn't matter. Especially
since the whole preemption path is now losing one loop in any case.

Suggested-by: Linus Torvalds
Signed-off-by: Frederic Weisbecker
Signed-off-by: Peter Zijlstra (Intel)
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1422404652-29067-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2015-02-04 14:52:30 +0800
9659e1eee sched/deadline: Remove cpu_active_mask from cpudl_find() ... Browse Code »

cpu_active_mask is rarely changed (only on hotplug), so remove this
operation to gain a little performance.

If there is a change in cpu_active_mask, rq_online_dl() and
rq_offline_dl() should take care of it normally, so cpudl::free_cpus
carries enough information for us.

For the rare case when a task is put onto a dying cpu (which
rq_offline_dl() can't handle in a timely fashion), it will be
handled through _cpu_down()->...->multi_cpu_stop()->migration_call()
->migrate_tasks(), preventing the task from hanging on the
dead cpu.

Cc: Juri Lelli
Signed-off-by: Xunlei Pang
[peterz: changelog]
Signed-off-by: Peter Zijlstra (Intel)
Link: http://lkml.kernel.org/r/1421642980-10045-2-git-send-email-pang.xunlei@linaro.org
Cc: Linus Torvalds
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Xunlei Pang
2015-02-04 14:52:29 +0800
868933359 sched: Fix hrtick_start() on UP ... Browse Code »

The commit 177ef2a6315e ("sched/deadline: Fix a precision problem in
the microseconds range") forgot to change the UP version of
hrtick_start(), do so now.

Signed-off-by: Wanpeng Li
Fixes: 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range")
[ Fixed the changelog. ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Juri Lelli
Cc: Kirill Tkhai
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1416962647-76792-7-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar

Wanpeng Li
2015-02-04 14:52:28 +0800
75381608e sched/deadline: Avoid pointless __setscheduler() ... Browse Code »

There is no need to dequeue/enqueue and push/pull if there are
no scheduling parameters changed for the DL class.

Both fair and RT classes already check if parameters changed for
them to avoid unnecessary overhead. This patch add the parameters
changed test for the DL class in order to reduce overhead.

Signed-off-by: Wanpeng Li
[ Fixed up the changelog. ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Juri Lelli
Cc: Kirill Tkhai
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1416962647-76792-5-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar

Wanpeng Li
2015-02-04 14:52:27 +0800
1019a359d sched/deadline: Fix stale yield state ... Browse Code »
13

When we fail to start the deadline timer in update_curr_dl(), we
forget to clear ->dl_yielded, resulting in wrecked time keeping.

Since the natural place to clear both ->dl_yielded and ->dl_throttled
is in replenish_dl_entity(); both are after all waiting for that event;
make it so.

Luckily since 67dfa1b756f2 ("sched/deadline: Implement
cancel_dl_timer() to use in switched_from_dl()") the
task_on_rq_queued() condition in dl_task_timer() must be true, and can
therefore call enqueue_task_dl() unconditionally.

Reported-by: Wanpeng Li
Signed-off-by: Peter Zijlstra (Intel)
Cc: Kirill Tkhai
Cc: Juri Lelli
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-02-04 14:52:26 +0800
a7bebf488 sched/deadline: Fix hrtick for a non-leftmost task ... Browse Code »

After update_curr_dl() the current task might not be the leftmost task
anymore. In that case do not start a new hrtick for it.

In this case NEED_RESCHED will be set and the next schedule will start
the hrtick for the new task if and when appropriate.

Signed-off-by: Wanpeng Li
Acked-by: Juri Lelli
[ Rewrote the changelog and comment. ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Kirill Tkhai
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1416962647-76792-2-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar

Wanpeng Li
2015-02-04 14:52:25 +0800
4c195c8a1 Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new patches ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2015-02-04 14:44:00 +0800
40767b0dc sched/deadline: Fix deadline parameter modification handling ... Browse Code »
18

Commit 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to
use in switched_from_dl()") removed the hrtimer_try_cancel() function
call out from init_dl_task_timer(), which gets called from
__setparam_dl().

The result is that we can now re-init the timer while its active --
this is bad and corrupts timer state.

Furthermore; changing the parameters of an active deadline task is
tricky in that you want to maintain guarantees, while immediately
effective change would allow one to circumvent the CBS guarantees --
this too is bad, as one (bad) task should not be able to affect the
others.

Rework things to avoid both problems. We only need to initialize the
timer once, so move that to __sched_fork() for new tasks.

Then make sure __setparam_dl() doesn't affect the current running
state but only updates the parameters used to calculate the next
scheduling period -- this guarantees the CBS functions as expected
(albeit slightly pessimistic).

This however means we need to make sure __dl_clear_params() needs to
reset the active state otherwise new (and tasks flipping between
classes) will not properly (re)compute their first instance.

Todo: close class flipping CBS hole.
Todo: implement delayed BW release.

Reported-by: Luca Abeni
Acked-by: Juri Lelli
Tested-by: Luca Abeni
Fixes: 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()")
Signed-off-by: Peter Zijlstra (Intel)
Cc:
Cc: Kirill Tkhai
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20150128140803.GF23038@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-02-04 14:42:48 +0800

02 Feb, 2015

1 commit

00845eb96 sched: don't cause task state changes in nested sleep debugging ... Browse Code »

Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report
on nested sleep conditions, which we generally want to avoid because the
inner sleeping operation can re-set the thread state to TASK_RUNNING,
but that will then cause the outer sleep loop not actually sleep when it
calls schedule.

However, that's actually valid traditional behavior, with the inner
sleep being some fairly rare case (like taking a sleeping lock that
normally doesn't actually need to sleep).

And the debug code would actually change the state of the task to
TASK_RUNNING internally, which makes that kind of traditional and
working code not work at all, because now the nested sleep doesn't just
sometimes cause the outer one to not block, but will cause it to happen
every time.

In particular, it will cause the cardbus kernel daemon (pccardd) to
basically busy-loop doing scheduling, converting a laptop into a heater,
as reported by Bruno Prémont. But there may be other legacy uses of
that nested sleep model in other drivers that are also likely to never
get converted to the new model.

This fixes both cases:

- don't set TASK_RUNNING when the nested condition happens (note: even
if WARN_ONCE() only _warns_ once, the return value isn't whether the
warning happened, but whether the condition for the warning was true.
So despite the warning only happening once, the "if (WARN_ON(..))"
would trigger for every nested sleep.

- in the cases where we knowingly disable the warning by using
"sched_annotate_sleep()", don't change the task state (that is used
for all core scheduling decisions), instead use '->task_state_change'
that is used for the debugging decision itself.

(Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
suggested change to use 'task_state_change' as part of the test)

Reported-and-bisected-by: Bruno Prémont
Tested-by: Rafael J Wysocki
Acked-by: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner ,
Cc: Ilya Dryomov ,
Cc: Mike Galbraith
Cc: Ingo Molnar
Cc: Peter Hurley ,
Cc: Davidlohr Bueso ,
Signed-off-by: Linus Torvalds

Linus Torvalds
2015-02-02 04:23:32 +0800

31 Jan, 2015

6 commits

6155bc143 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also an event groups fix, two PMU driver
fixes and a CPU model variant addition"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Tighten (and fix) the grouping condition
perf/x86/intel: Add model number for Airmont
perf/rapl: Fix crash in rapl_scale()
perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization
perf probe: Fix probing kretprobes
perf symbols: Introduce 'for' method to iterate over the symbols with a given name
perf probe: Do not rely on map__load() filter to find symbols
perf symbols: Introduce method to iterate symbols ordered by name
perf symbols: Return the first entry with a given name in find_by_name method
perf annotate: Fix memory leaks in LOCK handling
perf annotate: Handle ins parsing failures
perf scripting perl: Force to use stdbool
perf evlist: Remove extraneous 'was' on error message

Linus Torvalds
2015-01-31 06:34:55 +0800
16b269436 sched/deadline: Modify cpudl::free_cpus to reflect rd->online ... Browse Code »

Currently, cpudl::free_cpus contains all CPUs during init, see
cpudl_init(). When calling cpudl_find(), we have to add rd->span
to avoid selecting the cpu outside the current root domain, because
cpus_allowed cannot be depended on when performing clustered
scheduling using the cpuset, see find_later_rq().

This patch adds cpudl_set_freecpu() and cpudl_clear_freecpu() for
changing cpudl::free_cpus when doing rq_online_dl()/rq_offline_dl(),
so we can avoid the rd->span operation when calling cpudl_find()
in find_later_rq().

Signed-off-by: Xunlei Pang
Signed-off-by: Peter Zijlstra (Intel)
Cc: Juri Lelli
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1421642980-10045-1-git-send-email-pang.xunlei@linaro.org
Signed-off-by: Ingo Molnar

Xunlei Pang
2015-01-31 02:39:16 +0800
ff6f2d29b sched/idle: Add missing checks to the exit condition of cpu_idle_poll() ... Browse Code »

cpu_idle_poll() is entered into when either the cpu_idle_force_poll is set or
tick_check_broadcast_expired() returns true. The exit condition from
cpu_idle_poll() is tif_need_resched().

However this does not take into account scenarios where cpu_idle_force_poll
changes or tick_check_broadcast_expired() returns false, without setting
the resched flag. So a cpu will be caught in cpu_idle_poll() needlessly,
thereby wasting power. Add an explicit check on cpu_idle_force_poll and
tick_check_broadcast_expired() to the exit condition of cpu_idle_poll()
to avoid this.

Signed-off-by: Preeti U Murthy
Reviewed-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20150121105655.15279.59626.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar

Preeti U Murthy
2015-01-31 02:38:52 +0800
a18b5d018 sched: Fix missing preemption opportunity ... Browse Code »
13

If an interrupt fires in cond_resched(), between the call to __schedule()
and the PREEMPT_ACTIVE count decrementation, and that interrupt sets
TIF_NEED_RESCHED, the call to preempt_schedule_irq() will be ignored
due to the PREEMPT_ACTIVE count. This kind of scenario, with irq preemption
being delayed because it's interrupting a preempt-disabled area, is
usually fixed up after preemption is re-enabled back with an explicit
call to preempt_schedule().

This is what preempt_enable() does but a raw preempt count decrement as
performed by __preempt_count_sub(PREEMPT_ACTIVE) doesn't handle delayed
preemption check. Therefore when such a race happens, the rescheduling
is going to be delayed until the next scheduler or preemption entrypoint.
This can be a problem for scheduler latency sensitive workloads.

Lets fix that by consolidating cond_resched() with preempt_schedule()
internals.

Reported-by: Linus Torvalds
Reported-by: Ingo Molnar
Original-patch-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Signed-off-by: Peter Zijlstra (Intel)
Link: http://lkml.kernel.org/r/1421946484-9298-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2015-01-31 02:38:51 +0800
80e3d87b2 sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target ... Browse Code »
5

This patch adds checks that prevens futile attempts to move rt tasks
to a CPU with active tasks of equal or higher priority.

This reduces run queue lock contention and improves the performance of
a well known OLTP benchmark by 0.7%.

Signed-off-by: Tim Chen
Signed-off-by: Peter Zijlstra (Intel)
Cc: Shawn Bohrer
Cc: Suruchi Kadu
Cc: Doug Nelson
Cc: Steven Rostedt
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com
Signed-off-by: Ingo Molnar

Tim Chen
2015-01-31 02:38:49 +0800
3847b2722 Merge branch 'sched/urgent' into sched/core ... Browse Code »

Merge all pending fixes and refresh the tree, before applying new changes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2015-01-31 02:28:36 +0800

28 Jan, 2015

1 commit

f10698ed6 Merge branch 'perf/urgent' into perf/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2015-01-28 22:42:56 +0800