Eric Lee / smarc-fsl-linux-kernel

12 Nov, 2016

3 commits

b9f659b81 Merge tag 'pm-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management fixes from Rafael Wysocki:
"These fix two bugs in error code paths in the PM core (system-wide
suspend of devices), a device reference leak in the boot-time suspend
test code and a cpupower utility regression from the 4.7 cycle.

Specifics:

- Prevent the PM core from attempting to suspend parent devices if
any of their children, whose suspend callbacks were invoked
asynchronously, have failed to suspend during the "late" and
"noirq" phases of system-wide suspend of devices (Brian Norris).

- Prevent the boot-time system suspend test code from leaking a
reference to the RTC device used by it (Johan Hovold).

- Fix cpupower to use the return value of one of its library
functions correctly and restore the correct behavior of it when
used for setting cpufreq tunables broken during the 4.7 development
cycle (Laura Abbott)"

* tag 'pm-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails
PM / sleep: fix device reference leak in test_suspend
cpupower: Correct return type of cpu_power_is_cpu_online() in cpufreq-set

Linus Torvalds
2016-11-12 08:54:23 +0800
cd16f3dcd Merge branches 'pm-tools-fixes' and 'pm-sleep-fixes' ... Browse Code »

* pm-tools-fixes:
cpupower: Correct return type of cpu_power_is_cpu_online() in cpufreq-set

* pm-sleep-fixes:
PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails
PM / sleep: fix device reference leak in test_suspend

Rafael J. Wysocki
2016-11-12 06:24:58 +0800
c6c7d83b9 Revert "console: don't prefer first registered if DT specifies stdout-path" ... Browse Code »

This reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path").

The reverted commit changes existing behavior on which many ARM boards
rely. Many ARM small-board-computers, like e.g. the Raspberry Pi have
both a video output and a serial console. Depending on whether the user
is using the device as a more regular computer; or as a headless device
we need to have the console on either one or the other.

Many users rely on the kernel behavior of the console being present on
both outputs, before the reverted commit the console setup with no
console= kernel arguments on an ARM board which sets stdout-path in dt
would look like this:

[root@localhost ~]# cat /proc/consoles
ttyS0 -W- (EC p a) 4:64
tty0 -WU (E p ) 4:1

Where as after the reverted commit, it looks like this:

[root@localhost ~]# cat /proc/consoles
ttyS0 -W- (EC p a) 4:64

This commit reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path") restoring the original
behavior.

Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path")
Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.com
Signed-off-by: Hans de Goede
Cc: Paul Burton
Cc: Rob Herring
Cc: Frank Rowand
Cc: Thorsten Leemhuis
Cc: Greg Kroah-Hartman
Cc: Tejun Heo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hans de Goede
2016-11-12 00:12:37 +0800

06 Nov, 2016

1 commit

ffbcbfca8 Merge branches 'sched-urgent-for-linus' and 'core-urgent-for-linus' of git://git… ... Browse Code »

….kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull stack vmap fixups from Thomas Gleixner:
"Two small patches related to sched_show_task():

- make sure to hold a reference on the task stack while accessing it

- remove the thread_saved_pc printout

.. and add a sanity check into release_task_stack() to catch problems
with task stack references"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Remove pointless printout in sched_show_task()
sched/core: Fix oops in sched_show_task()

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fork: Add task stack refcounting sanity check and prevent premature task stack freeing

Linus Torvalds
2016-11-06 02:46:02 +0800

03 Nov, 2016

2 commits

8243d5597 sched/core: Remove pointless printout in sched_show_task() ... Browse Code »

In sched_show_task() we print out a useless hex number, not even a
symbol, and there's a big question mark whether this even makes sense
anyway, I suspect we should just remove it all.

Signed-off-by: Linus Torvalds
Acked-by: Andy Lutomirski
Cc: Peter Zijlstra
Cc: Tetsuo Handa
Cc: Thomas Gleixner
Cc: bp@alien8.de
Cc: brgerst@gmail.com
Cc: jann@thejh.net
Cc: keescook@chromium.org
Cc: linux-api@vger.kernel.org
Cc: tycho.andersen@canonical.com
Link: http://lkml.kernel.org/r/CA+55aFzphURPFzAvU4z6Moy7ZmimcwPuUdYU8bj9z0J+S8X1rw@mail.gmail.com
Signed-off-by: Ingo Molnar

Linus Torvalds
2016-11-03 14:31:34 +0800
382005027 sched/core: Fix oops in sched_show_task() ... Browse Code »

When CONFIG_THREAD_INFO_IN_TASK=y, it is possible that an exited thread
remains in the task list after its stack pointer was already set to NULL.

Therefore, thread_saved_pc() and stack_not_used() in sched_show_task()
will trigger NULL pointer dereference if an attempt to dump such thread's
traces (e.g. SysRq-t, khungtaskd) is made.

Since show_stack() in sched_show_task() calls try_get_task_stack() and
sched_show_task() is called from interrupt context, calling
try_get_task_stack() from sched_show_task() will be safe as well.

Signed-off-by: Tetsuo Handa
Acked-by: Andy Lutomirski
Acked-by: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bp@alien8.de
Cc: brgerst@gmail.com
Cc: jann@thejh.net
Cc: keescook@chromium.org
Cc: linux-api@vger.kernel.org
Cc: tycho.andersen@canonical.com
Link: http://lkml.kernel.org/r/201611021950.FEJ34368.HFFJOOMLtQOVSF@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar

Tetsuo Handa
2016-11-03 14:27:34 +0800

02 Nov, 2016

1 commit

ceb75787b PM / sleep: fix device reference leak in test_suspend ... Browse Code »

Make sure to drop the reference taken by class_find_device() after
opening the RTC device.

Fixes: 77437fd4e61f (pm: boot time suspend selftest)
Signed-off-by: Johan Hovold
Signed-off-by: Rafael J. Wysocki

Johan Hovold
2016-11-02 12:10:04 +0800

01 Nov, 2016

1 commit

405c07597 fork: Add task stack refcounting sanity check and prevent premature task stack freeing ... Browse Code »

If something goes wrong with task stack refcounting and a stack
refcount hits zero too early, warn and leak it rather than
potentially freeing it early (and silently).

Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/f29119c783a9680a4b4656e751b6123917ace94b.1477926663.git.luto@kernel.org
Signed-off-by: Ingo Molnar

Andy Lutomirski
2016-11-01 14:39:17 +0800

29 Oct, 2016

4 commits

b546e0c28 Merge tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management fixes from Rafael Wysocki:
"These fix two intel_pstate issues related to the way it works when the
scaling_governor sysfs attribute is set to "performance" and fix up
messages in the system suspend core code.

Specifics:

- Fix a missing KERN_CONT in a system suspend message by converting
the affected code to using pr_info() and pr_cont() instead of the
"raw" printk() (Jon Hunter).

- Make intel_pstate set the CPU P-state from its .set_policy()
callback when the scaling_governor sysfs attribute is set to
"performance" so that it interacts with NOHZ_FULL more predictably
which was the case before 4.7 (Rafael Wysocki).

- Make intel_pstate always request the maximum allowed P-state when
the scaling_governor sysfs attribute is set to "performance" to
prevent it from effectively ingoring that setting is some
situations (Rafael Wysocki)"

* tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Always set max P-state in performance mode
PM / suspend: Fix missing KERN_CONT for suspend message
cpufreq: intel_pstate: Set P-state upfront in performance mode

Linus Torvalds
2016-10-29 09:29:13 +0800
b49c3170b Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Misc kernel fixes: a virtualization environment related fix, an uncore
PMU driver removal handling fix, a PowerPC fix and new events for
Knights Landing"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
perf/powerpc: Don't call perf_event_disable() from atomic context
perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
perf/x86/intel/cstate: Add C-state residency events for Knights Landing

Linus Torvalds
2016-10-29 07:27:16 +0800
a8006bd91 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Ingo Molnar:
"Fix four timer locking races: two were noticed by Linus while
reviewing the code while chasing for a corruption bug, and two
from fixing spurious USB timeouts"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Prevent base clock corruption when forwarding
timers: Prevent base clock rewind when forwarding clock
timers: Lock base for same bucket optimization
timers: Plug locking race vs. timer migration

Linus Torvalds
2016-10-29 02:26:01 +0800
965c4b7e1 Merge branches 'core-urgent-for-linus', 'irq-urgent-for-linus' and 'sched-urgent… ... Browse Code »

…-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool, irq and scheduler fixes from Ingo Molnar:
"One more objtool fixlet for GCC6 code generation patterns, an irq
DocBook fix and an unused variable warning fix in the scheduler"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix rare switch jump table pattern detection

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
doc: Add missing parameter for msi_setup

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Remove unused but set variable 'rq'

Linus Torvalds
2016-10-29 01:12:27 +0800

28 Oct, 2016

5 commits

5aab90ce1 perf/powerpc: Don't call perf_event_disable() from atomic context ... Browse Code »

The trinity syscall fuzzer triggered following WARN() on powerpc:

WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
...
NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0
LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0
Call Trace:
[c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable)
[c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
[c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
[c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
[c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
[c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

Followed by a lockdep warning:

===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc5+ #7 Tainted: G W
-------------------------------
./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by ls/2998:
#0: (rcu_read_lock){......}, at: [] .__atomic_notifier_call_chain+0x0/0x1c0
#1: (rcu_read_lock){......}, at: [] .hw_breakpoint_handler+0x0/0x2b0

stack backtrace:
CPU: 9 PID: 2998 Comm: ls Tainted: G W 4.8.0-rc5+ #7
Call Trace:
[c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable)
[c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180
[c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0
[c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0
[c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380
[c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60
[c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0
[c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
[c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
[c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
[c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
[c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

While it looks like the first WARN() is probably valid, the other one is
triggered by disabling event via perf_event_disable() from atomic context.

The event is disabled here in case we were not able to emulate
the instruction that hit the breakpoint. By disabling the event
we unschedule the event and make sure it's not scheduled back.

But we can't call perf_event_disable() from atomic context, instead
we need to use the event's pending_disable irq_work method to disable it.

Reported-by: Jan Stancek
Signed-off-by: Jiri Olsa
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Huang Ying
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161026094824.GA21397@krava
Signed-off-by: Ingo Molnar

Jiri Olsa
2016-10-28 17:06:25 +0800
0933840ac perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CON… ... Browse Code »

…FIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic

CAI Qian reported a crash in the PMU uncore device removal code,
enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option:

https://marc.info/?l=linux-kernel&m=147688837328451

The reason for the crash is that perf_pmu_unregister() tries to remove
a PMU device which is not added at this point. We add PMU devices
only after pmu_bus is registered, which happens in the
perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag.

The fix is to get the 'pmu_bus_running' flag state at the point
the PMU is taken out of the PMU list and remove the device
later only if it's set.

Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Jiri Olsa
2016-10-28 17:06:25 +0800
b274c0bb3 kcov: properly check if we are in an interrupt ... Browse Code »

in_interrupt() returns a nonzero value when we are either in an
interrupt or have bh disabled via local_bh_disable(). Since we are
interested in only ignoring coverage from actual interrupts, do a proper
check instead of just calling in_interrupt().

As a result of this change, kcov will start to collect coverage from
within local_bh_disable()/local_bh_enable() sections.

Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Acked-by: Dmitry Vyukov
Cc: Nicolai Stange
Cc: Andrey Ryabinin
Cc: Kees Cook
Cc: James Morse
Cc: Vegard Nossum
Cc: Quentin Casasnovas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Konovalov
2016-10-28 09:43:42 +0800
9c953d639 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A set of fixes for this series, most notably the fix for the blk-mq
software queue regression in from this merge window.

Apart from that, a fix for an unlikely hang if a queue is flooded with
FUA requests from Ming, and a few small fixes for nbd and badblocks.
Lastly, a rename update for the proc softirq output, since the block
polling code was made generic"

* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: update hardware and software queues for sleeping alloc
block: flush: fix IO hang in case of flood fua req
nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown
badblocks: badblocks_set/clear update unacked_exist
softirq: Display IRQ_POLL for irq-poll statistics

Linus Torvalds
2016-10-28 01:05:31 +0800
9dcb8b685 mm: remove per-zone hashtable of bitlock waitqueues ... Browse Code »

The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:

wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)

where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().

The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).

It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.

As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.

Peter Zijlstra already has a patch for that, but let's see if anybody
even notices. In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.

Reported-by: Andreas Gruenbacher
Tested-by: Bob Peterson
Acked-by: Mel Gorman
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Steven Whitehouse
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-28 00:27:57 +0800

27 Oct, 2016

1 commit

f5d6d2da0 sched/fair: Remove unused but set variable 'rq' ... Browse Code »

Since commit:

8663e24d56dc ("sched/fair: Reorder cgroup creation code")

... the variable 'rq' in alloc_fair_sched_group() is set but no longer used.
Remove it to fix the following GCC warning when building with 'W=1':

kernel/sched/fair.c:8842:13: warning: variable ‘rq’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161026113704.8981-1-tklauser@distanz.ch
Signed-off-by: Ingo Molnar

Tobias Klauser
2016-10-27 14:33:52 +0800

25 Oct, 2016

4 commits

6bad6bccf timers: Prevent base clock corruption when forwarding ... Browse Code »

When a timer is enqueued we try to forward the timer base clock. This
mechanism has two issues:

1) Forwarding a remote base unlocked

The forwarding function is called from get_target_base() with the current
timer base lock held. But if the new target base is a different base than
the current base (can happen with NOHZ, sigh!) then the forwarding is done
on an unlocked base. This can lead to corruption of base->clk.

Solution is simple: Invoke the forwarding after the target base is locked.

2) Possible corruption due to jiffies advancing

This is similar to the issue in get_net_timer_interrupt() which was fixed
in the previous patch. jiffies can advance between check and assignement
and therefore advancing base->clk beyond the next expiry value.

So we need to read jiffies into a local variable once and do the checks and
assignment with the local copy.

Fixes: a683f390b93f("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes
Reported-by: Michael Thayer
Signed-off-by: Thomas Gleixner
Cc: Michal Necasek
Cc: Peter Zijlstra
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.253640125@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-10-25 22:32:50 +0800
041ad7bc7 timers: Prevent base clock rewind when forwarding clock ... Browse Code »

Ashton and Michael reported, that kernel versions 4.8 and later suffer from
USB timeouts which are caused by the timer wheel rework.

This is caused by a bug in the base clock forwarding mechanism, which leads
to timers expiring early. The scenario which leads to this is:

run_timers()
while (jiffies >= base->clk) {
collect_expired_timers();
base->clk++;
expire_timers();
}

So base->clk = jiffies + 1. Now the cpu goes idle:

idle()
get_next_timer_interrupt()
nextevt = __next_time_interrupt();
if (time_after(nextevt, base->clk))
base->clk = jiffies;

jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.

base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():

jiffies = 0
base->clk = 1

A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:

base->clk + delta = 1 + 63 = 64

The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.

If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.

Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:

if (time_after(jiffies, base->clk) {
if (time_after(nextevt, base->clk))
base->clk = jiffies;

jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.

get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.

Thanks to Ashton for bisecting and providing trace data!

Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes
Reported-by: Michael Thayer
Signed-off-by: Thomas Gleixner
Cc: Michal Necasek
Cc: Peter Zijlstra
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-10-25 22:32:50 +0800
4da9152a4 timers: Lock base for same bucket optimization ... Browse Code »

Linus stumbled over the unlocked modification of the timer expiry value in
mod_timer() which is an optimization for timers which stay in the same
bucket - due to the bucket granularity - despite their expiry time getting
updated.

The optimization itself still makes sense even if we take the lock, because
in case that the bucket stays the same, we avoid the pointless
queue/enqueue dance.

Make the check and the modification of timer->expires protected by the base
lock and shuffle the remaining code around so we can keep the lock held
when we actually have to requeue the timer to a different bucket.

Fixes: f00c0afdfa62 ("timers: Implement optimization for same expiry time in mod_timer()")
Reported-by: Linus Torvalds
Signed-off-by: Thomas Gleixner
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton
Cc: Peter Zijlstra

Thomas Gleixner
2016-10-25 22:27:39 +0800
b831275a3 timers: Plug locking race vs. timer migration ... Browse Code »

Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.

While this has not been observed (yet), we need to make sure that it never
happens.

Fixes: 0eeda71bc30d ("timer: Replace timer base by a cpu index")
Reported-by: Linus Torvalds
Signed-off-by: Thomas Gleixner
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton
Cc: Peter Zijlstra

Thomas Gleixner
2016-10-25 22:27:39 +0800

24 Oct, 2016

1 commit

1adb469b9 PM / suspend: Fix missing KERN_CONT for suspend message ... Browse Code »

Commit 4bcc595ccd80 (printk: reinstate KERN_CONT for printing
continuation lines) exposed a missing KERN_CONT from one of the
messages shown on entering suspend. With v4.9-rc1, the 'done.' shown
after syncing the filesystems no longer appears as a continuation but
a new message with its own timestamp.

[ 9.259566] PM: Syncing filesystems ... [ 9.264119] done.

Fix this by adding the KERN_CONT log level for the 'done.' part of the
message seen after syncing filesystems. While we are at it, convert
these suspend printks to pr_info and pr_cont, respectively.

Signed-off-by: Jon Hunter
Signed-off-by: Rafael J. Wysocki

Jon Hunter
2016-10-24 20:38:02 +0800

23 Oct, 2016

1 commit

bfb7bfef6 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq fixes from Ingo Molnar:
"Mostly irqchip driver fixes, plus a symbol export"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kernel/irq: Export irq_set_parent()
irqchip/gic: Add missing \n to CPU IF adjustment message
irqchip/jcore: Don't show Kconfig menu item for driver
irqchip/eznps: Drop pointless static qualifier in nps400_of_init()
irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
irqchip/gic-v3-its: Fix 64bit GIC{R,ITS}_TYPER accesses

Linus Torvalds
2016-10-23 00:33:51 +0800

22 Oct, 2016

1 commit

f660f6066 softirq: Display IRQ_POLL for irq-poll statistics ... Browse Code »

This library was moved to the generic area and was
renamed to irq-poll. Hence, update proc/softirqs output accordingly.

Signed-off-by: Sagi Grimberg
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-10-22 05:45:47 +0800

21 Oct, 2016

1 commit

3118dac50 kernel/irq: Export irq_set_parent() ... Browse Code »

The TPS65217 driver grew interrupt support which uses
irq_set_parent(). While it's not yet clear why this is used in the first
place, building the driver as a module fails with:

ERROR: ".irq_set_parent" [drivers/mfd/tps65217.ko] undefined!

The correctness of the driver change is still investigated, but for now
it's less trouble to export irq_set_parent() than dealing with the build
wreckage.

[ tglx: Rewrote changelog and made the export GPL ]

Fixes: 6556bdacf646 ("mfd: tps65217: Add support for IRQs")
Signed-off-by: Sudip Mukherjee
Cc: Sudip Mukherjee
Cc: Marcin Niestroj
Cc: Grygorii Strashko
Cc: Tony Lindgren
Cc: Lee Jones
Link: http://lkml.kernel.org/r/1475775403-27207-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Thomas Gleixner

Sudip Mukherjee
2016-10-21 16:21:38 +0800

20 Oct, 2016

2 commits

893e2c5c9 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fix from Ingo Molnar:
"This fixes a group scheduling related performance/interactivity
regression introduced in v4.8, which affects certain hardware
environments where cpu_possible_mask != cpu_present_mask"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix incorrect task group ->load_avg

Linus Torvalds
2016-10-20 01:03:55 +0800
8835ca59d printk: suppress empty continuation lines ... Browse Code »

We have a fairly common pattern where you print several things as
continuations on one single line in a loop, and then at the end you do

printk(KERN_CONT "\n");

to flush the buffered output.

But if the output was flushed by something else (concurrent printk
activity, or just system logging), we don't want that final flushing to
just print an empty line.

So just suppress empty continuation lines when they couldn't be merged
into the line they are a continuation of.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-20 00:11:24 +0800

19 Oct, 2016

6 commits

63ae602ce Merge branch 'gup_flag-cleanups' ... Browse Code »

Merge the gup_flags cleanups from Lorenzo Stoakes:
"This patch series adjusts functions in the get_user_pages* family such
that desired FOLL_* flags are passed as an argument rather than
implied by flags.

The purpose of this change is to make the use of FOLL_FORCE explicit
so it is easier to grep for and clearer to callers that this flag is
being used. The use of FOLL_FORCE is an issue as it overrides missing
VM_READ/VM_WRITE flags for the VMA whose pages we are reading
from/writing to, which can result in surprising behaviour.

The patch series came out of the discussion around commit 38e088546522
("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
which addressed a BUG_ON() being triggered when a page was faulted in
with PROT_NONE set but having been overridden by FOLL_FORCE.
do_numa_page() was run on the assumption the page _must_ be one marked
for NUMA node migration as an actual PROT_NONE page would have been
dealt with prior to this code path, however FOLL_FORCE introduced a
situation where this assumption did not hold.

See

https://marc.info/?l=linux-mm&m=147585445805166

for the patch proposal"

Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.

[ This branch was rebased recently to add a few more acked-by's and
reviewed-by's ]

* gup_flag-cleanups:
mm: replace access_process_vm() write parameter with gup_flags
mm: replace access_remote_vm() write parameter with gup_flags
mm: replace __access_remote_vm() write parameter with gup_flags
mm: replace get_user_pages_remote() write/force parameters with gup_flags
mm: replace get_user_pages() write/force parameters with gup_flags
mm: replace get_vaddr_frames() write/force parameters with gup_flags
mm: replace get_user_pages_locked() write/force parameters with gup_flags
mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
mm: remove write/force parameters from __get_user_pages_unlocked()
mm: remove write/force parameters from __get_user_pages_locked()
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

Linus Torvalds
2016-10-19 23:39:47 +0800
f307ab6dc mm: replace access_process_vm() write parameter with gup_flags ... Browse Code »

This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Acked-by: Jesper Nilsson
Acked-by: Michal Hocko
Acked-by: Michael Ellerman
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 23:31:25 +0800
9beae1ea8 mm: replace get_user_pages_remote() write/force parameters with gup_flags ... Browse Code »

This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Reviewed-by: Jan Kara
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 23:12:02 +0800
b5a9b3407 sched/fair: Fix incorrect task group ->load_avg ... Browse Code »

A scheduler performance regression has been reported by Joseph Salisbury,
which he bisected back to:

3d30544f0212 ("sched/fair: Apply more PELT fixes)

The regression triggers when several levels of task groups are involved
(read: SystemD) and cpu_possible_mask != cpu_present_mask.

The root cause is that group entity's load (tg_child->se[i]->avg.load_avg)
is initialized to scale_load_down(se->load.weight). During the creation of
a child task group, its group entities on possible CPUs are attached to
parent's cfs_rq (tg_parent) and their loads are added to the parent's load
(tg_parent->load_avg) with update_tg_load_avg().

But only the load on online CPUs will then be updated to reflect real load,
whereas load on other CPUs will stay at the initial value.

The result is a tg_parent->load_avg that is higher than the real load, the
weight of group entities (tg_parent->se[i]->load.weight) on online CPUs is
smaller than it should be, and the task group gets a less running time than
what it could expect.

( This situation can be detected with /proc/sched_debug. The ".tg_load_avg"
of the task group will be much higher than sum of ".tg_load_avg_contrib"
of online cfs_rqs of the task group. )

The load of group entities don't have to be intialized to something else
than 0 because their load will increase when an entity is attached.

Reported-by: Joseph Salisbury
Tested-by: Dietmar Eggemann
Signed-off-by: Vincent Guittot
Acked-by: Peter Zijlstra
Cc: # 4.8.x
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: joonwoop@codeaurora.org
Fixes: 3d30544f0212 ("sched/fair: Apply more PELT fixes)
Link: http://lkml.kernel.org/r/1476881123-10159-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2016-10-19 21:04:47 +0800
9d71bdfbc Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixlet from Ingo Molnar:
"Remove an unused variable"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Remove unused but set variable

Linus Torvalds
2016-10-19 00:57:12 +0800
2c11fc87c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fix from Ingo Molnar:
"Fix a crash that can trigger when racing with CPU hotplug: we didn't
use sched-domains data structures carefully enough in select_idle_cpu()"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix sched domains NULL dereference in select_idle_sibling()

Linus Torvalds
2016-10-19 00:53:59 +0800

18 Oct, 2016

1 commit

351267d94 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull misc fixes from Ingo Molnar:
"A CPU hotplug debuggability fix and three objtool false positive
warnings fixes for new GCC6 code generation patterns"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Use distinct name for cpu_hotplug.dep_map
objtool: Skip all "unreachable instruction" warnings for gcov kernels
objtool: Improve rare switch jump table pattern detection
objtool: Support '-mtune=atom' stack frame setup instruction

Linus Torvalds
2016-10-18 23:35:07 +0800

17 Oct, 2016

1 commit

54e23845e alarmtimer: Remove unused but set variable ... Browse Code »

Remove the set but unused variable base in alarm_clock_get to fix the
following warning when building with 'W=1':

kernel/time/alarmtimer.c: In function ‘alarm_timer_create’:
kernel/time/alarmtimer.c:545:21: warning: variable ‘base’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161017094702.10873-1-tklauser@distanz.ch
Signed-off-by: Thomas Gleixner

Tobias Klauser
2016-10-17 17:59:37 +0800

16 Oct, 2016

2 commits

a705e07b9 cpu/hotplug: Use distinct name for cpu_hotplug.dep_map ... Browse Code »

Use distinctive name for cpu_hotplug.dep_map to avoid the actual
cpu_hotplug.lock appearing as cpu_hotplug.lock#2 in lockdep splats.

Signed-off-by: Joonas Lahtinen
Reviewed-by: Chris Wilson
Acked-by: Gautham R. Shenoy
Cc: Andrew Morton
Cc: Daniel Vetter
Cc: Gautham R . Shenoy
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: intel-gfx@lists.freedesktop.org
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar

Joonas Lahtinen
2016-10-16 17:09:32 +0800
9ffc66941 Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull gcc plugins update from Kees Cook:
"This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot
time as possible, hoping to capitalize on any possible variation in
CPU operation (due to runtime data differences, hardware differences,
SMP ordering, thermal timing variation, cache behavior, etc).

At the very least, this plugin is a much more comprehensive example
for how to manipulate kernel code using the gcc plugin internals"

* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
latent_entropy: Mark functions with __latent_entropy
gcc-plugins: Add latent_entropy plugin

Linus Torvalds
2016-10-16 01:03:15 +0800

15 Oct, 2016

1 commit

f34d3606f Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- tracepoints for basic cgroup management operations added

- kernfs and cgroup path formatting functions updated to behave in the
style of strlcpy()

- non-critical bug fixes

* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
cpuset: fix error handling regression in proc_cpuset_show()
cgroup: add tracepoints for basic operations
cgroup: make cgroup_path() and friends behave in the style of strlcpy()
kernfs: remove kernfs_path_len()
kernfs: make kernfs_path*() behave in the style of strlcpy()
kernfs: add dummy implementation of kernfs_path_from_node()

Linus Torvalds
2016-10-15 03:18:50 +0800

13 Oct, 2016

1 commit

ef6000b4c Disable the __builtin_return_address() warning globally after all ... Browse Code »

This affectively reverts commit 377ccbb48373 ("Makefile: Mute warning
for __builtin_return_address(>0) for tracing only") because it turns out
that it really isn't tracing only - it's all over the tree.

We already also had the warning disabled separately for mm/usercopy.c
(which this commit also removes), and it turns out that we will also
want to disable it for get_lock_parent_ip(), that is used for at least
TRACE_IRQFLAGS. Which (when enabled) ends up being all over the tree.

Steven Rostedt had a patch that tried to limit it to just the config
options that actually triggered this, but quite frankly, the extra
complexity and abstraction just isn't worth it. We have never actually
had a case where the warning is actually useful, so let's just disable
it globally and not worry about it.

Acked-by: Steven Rostedt
Cc: Thomas Gleixner
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Peter Anvin
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-13 01:23:41 +0800