Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

09 Jun, 2014

2 commits

1860e3798 Linux 3.15 Browse Code »

Linus Torvalds
2014-06-09 02:19:54 +0800
bb077d600 Revert "x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" ... Browse Code »

This reverts commit 3e1a878b7ccdb31da6d9d2b855c72ad87afeba3f.

It came in very late, and already has one reported failure: Sitsofe
reports that the current tree fails to boot on his EeePC, and bisected
it down to this. Rather than waste time trying to figure out what's
wrong, just revert it.

Reported-by: Sitsofe Wheeler
Cc: Igor Mammedov
Cc: Toshi Kani
Cc: Thomas Gleixner
Acked-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Linus Torvalds
2014-06-09 01:09:49 +0800

08 Jun, 2014

4 commits

c593e8978 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fix from Chris Mason:
"I had this in my 3.16 merge window queue, but it is small and obvious
enough for 3.15. I cherry-picked and retested against current rc8"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: send, fix corrupted path strings for long paths

Linus Torvalds
2014-06-08 06:12:18 +0800
052e5c7e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending ... Browse Code »

Pull SCSI target fixes from Nicholas Bellinger:
"Here are the remaining fixes for v3.15.

This series includes:

- iser-target fix for ImmediateData exception reference count bug
(Sagi + nab)
- iscsi-target fix for MC/S login + potential iser-target MRDSL
buffer overrun (Santosh + Roland)
- iser-target fix for v3.15-rc multi network portal shutdown
regression (nab)
- target fix for allowing READ_CAPCITY during ALUA Standby access
state (Chris + nab)
- target fix for NULL pointer dereference of alua_access_state for
un-configured devices (Chris + nab)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix alua_access_state attribute OOPs for un-configured devices
target: Allow READ_CAPACITY opcode in ALUA Standby access state
iser-target: Fix multi network portal shutdown regression
iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value()
iser-target: Add missing target_put_sess_cmd for ImmedateData failure

Linus Torvalds
2014-06-08 06:01:39 +0800
813895f8d Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Peter Anvin:
"A significantly larger than I'd like set of patches for just below the
wire. All of these, however, fix real problems.

The one thing that is genuinely scary in here is the change of SMP
initialization, but that *does* fix a confirmed hang when booting
virtual machines.

There is also a patch to actually do the right thing about not
offlining a CPU when there are not enough interrupt vectors available
in the system; the accounting was done incorrectly. The worst case
for that patch is that we fail to offline CPUs when we should (the new
code is strictly more conservative than the old), so is not
particularly risky.

Most of the rest is minor stuff; the EFI patches are all about
exporting correct information to boot loaders and kexec"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: EFI_MIXED should not prohibit loading above 4G
x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
x86: Fix list/memory corruption on CPU hotplug
x86: irq: Get correct available vectors for cpu disable
x86/efi: Do not export efi runtime map in case old map
x86/efi: earlyprintk=efi,keep fix

Linus Torvalds
2014-06-08 05:50:38 +0800
745c51673 x86/boot: EFI_MIXED should not prohibit loading above 4G ... Browse Code »

commit 7d453eee36ae ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
regression for the functionality to load kernels above 4G. The relevant
(incorrect) reasoning behind this change can be seen in the commit
message,

"The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits."

This is obviously bogus since 32-bit EFI loaders will never place the
kernel above the 4G mark. So this restriction is entirely unnecessary.

But things are worse than that - since we want to encourage people to
always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
the box for both 32-bit and 64-bit firmware, commit 7d453eee36ae
effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.

Remove the overzealous and superfluous restriction and restore the
XLF_CAN_BE_LOADED_ABOVE_4G functionality.

Cc: "H. Peter Anvin"
Cc: Dave Young
Cc: Vivek Goyal
Signed-off-by: Matt Fleming
Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin

Matt Fleming
2014-06-08 00:31:00 +0800

07 Jun, 2014

3 commits

d4c54919e mm: add !pte_present() check on existing hugetlb_entry callbacks ... Browse Code »
5

The age table walker doesn't check non-present hugetlb entry in common
path, so hugetlb_entry() callbacks must check it. The reason for this
behavior is that some callers want to handle it in its own way.

[ I think that reason is bogus, btw - it should just do what the regular
code does, which is to call the "pte_hole()" function for such hugetlb
entries - Linus]

However, some callers don't check it now, which causes unpredictable
result, for example when we have a race between migrating hugepage and
reading /proc/pid/numa_maps. This patch fixes it by adding !pte_present
checks on buggy callbacks.

This bug exists for years and got visible by introducing hugepage
migration.

ChangeLog v2:
- fix if condition (check !pte_present() instead of pte_present())

Reported-by: Sasha Levin
Signed-off-by: Naoya Horiguchi
Cc: Rik van Riel
Cc: [3.12+]
Signed-off-by: Andrew Morton
[ Backported to 3.15. Signed-off-by: Josh Boyer ]
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2014-06-07 04:21:16 +0800
01a9a8a9e Btrfs: send, fix corrupted path strings for long paths ... Browse Code »

If a path has more than 230 characters, we allocate a new buffer to
use for the path, but we were forgotting to copy the contents of the
previous buffer into the new one, which has random content from the
kmalloc call.

Test:

mkfs.btrfs -f /dev/sdd
mount /dev/sdd /mnt

TEST_PATH="/mnt/fdmanana/.config/google-chrome-mysetup/Default/Pepper_Data/Shockwave_Flash/WritableRoot/#SharedObjects/JSHJ4ZKN/s.wsj.net/[[IMPORT]]/players.edgesuite.net/flash/plugins/osmf/advanced-streaming-plugin/v2.7/osmf1.6/Ak#"
mkdir -p $TEST_PATH
echo "hello world" > $TEST_PATH/amaiAdvancedStreamingPlugin.txt

btrfs subvolume snapshot -r /mnt /mnt/mysnap1
btrfs send /mnt/mysnap1 -f /tmp/1.snap

A test for xfstests follows.

Signed-off-by: Filipe David Borba Manana
Cc: Marc Merlin
Tested-by: Marc MERLIN
Signed-off-by: Chris Mason

Filipe Manana
2014-06-07 03:00:46 +0800
d54d14bfb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Four misc fixes: each was deemed serious enough to warrant v3.15
inclusion"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
sched/dl: Fix race in dl_task_timer()
sched: Fix sched_policy < 0 comparison
sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled

Linus Torvalds
2014-06-07 00:53:32 +0800

06 Jun, 2014

10 commits

624483f3e mm: rmap: fix use-after-free in __put_anon_vma ... Browse Code »
5

While working address sanitizer for kernel I've discovered
use-after-free bug in __put_anon_vma.

For the last anon_vma, anon_vma->root freed before child anon_vma.
Later in anon_vma_free(anon_vma) we are referencing to already freed
anon_vma->root to check rwsem.

This fixes it by freeing the child anon_vma before freeing
anon_vma->root.

Signed-off-by: Andrey Ryabinin
Acked-by: Peter Zijlstra
Cc: # v3.0+
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2014-06-06 23:53:41 +0800
f14537735 target: Fix alua_access_state attribute OOPs for un-configured devices ... Browse Code »
5

This patch fixes a OOPs where an attempt to write to the per-device
alua_access_state configfs attribute at:

/sys/kernel/config/target/core/$HBA/$DEV/alua/$TG_PT_GP/alua_access_state

results in an NULL pointer dereference when the backend device has not
yet been configured.

This patch adds an explicit check for DF_CONFIGURED, and fails with
-ENODEV to avoid this case.

Reported-by: Chris Boot
Reported-by: Philip Gaw
Cc: Chris Boot
Cc: Philip Gaw
Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: Nicholas Bellinger

Nicholas Bellinger
2014-06-06 16:22:41 +0800
e7810c2d2 target: Allow READ_CAPACITY opcode in ALUA Standby access state ... Browse Code »
5

This patch allows READ_CAPACITY + SAI_READ_CAPACITY_16 opcode
processing to occur while the associated ALUA group is in Standby
access state.

This is required to avoid host side LUN probe failures during the
initial scan if an ALUA group has already implicitly changed into
Standby access state.

This addresses a bug reported by Chris + Philip using dm-multipath
+ ESX hosts configured with ALUA multipath.

Reported-by: Chris Boot
Reported-by: Philip Gaw
Cc: Chris Boot
Cc: Philip Gaw
Cc: Hannes Reinecke
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger

Nicholas Bellinger
2014-06-06 16:21:12 +0800
177875423 Merge tag 'efi-urgent' into x86/urgent ... Browse Code »

* Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
of the framebuffer when early_ioremap() is no longer available and
dropping __init from functions that may be invoked after
free_initmem() - Dave Young

* We shouldn't be exporting the EFI runtime map in sysfs if not using
the new 1:1 EFI mapping code since in that case the mappings are not
static across a kexec reboot - Dave Young

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2014-06-06 04:09:44 +0800
951e27306 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Two last minute tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Fix perf probe to find correct variable DIE
perf probe: Fix a segfault if asked for variable it doesn't find

Linus Torvalds
2014-06-06 03:51:05 +0800
1c5aefb5b Merge branch 'futex-fixes' (futex fixes from Thomas Gleixner) ... Browse Code »

Merge futex fixes from Thomas Gleixner:
"So with more awake and less futex wreckaged brain, I went through my
list of points again and came up with the following 4 patches.

1) Prevent pi requeueing on the same futex

I kept Kees check for uaddr1 == uaddr2 as a early check for private
futexes and added a key comparison to both futex_requeue and
futex_wait_requeue_pi.

Sebastian, sorry for the confusion yesterday night. I really
misunderstood your question.

You are right the check is pointless for shared futexes where the
same physical address is mapped to two different virtual addresses.

2) Sanity check atomic acquisiton in futex_lock_pi_atomic

That's basically what Darren suggested.

I just simplified it to use futex_top_waiter() to find kernel
internal state. If state is found return -EINVAL and do not bother
to fix up the user space variable. It's corrupted already.

3) Ensure state consistency in futex_unlock_pi

The code is silly versus the owner died bit. There is no point to
preserve it on unlock when the user space thread owns the futex.

What's worse is that it does not update the user space value when
the owner died bit is set. So the kernel itself creates observable
inconsistency.

Another "optimization" is to retry an atomic unlock. That's
pointless as in a sane environment user space would not call into
that code if it could have unlocked it atomically. So we always
check whether there is kernel state around and only if there is
none, we do the unlock by setting the user space value to 0.

4) Sanitize lookup_pi_state

lookup_pi_state is ambigous about TID == 0 in the user space value.

This can be a valid state even if there is kernel state on this
uaddr, but we miss a few corner case checks.

I tried to come up with a smaller solution hacking the checks into
the current cruft, but it turned out to be ugly as hell and I got
more confused than I was before. So I rewrote the sanity checks
along the state documentation with awful lots of commentry"

* emailed patches from Thomas Gleixner :
futex: Make lookup_pi_state more robust
futex: Always cleanup owner tid in unlock_pi
futex: Validate atomic acquisition in futex_lock_pi_atomic()
futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)

Linus Torvalds
2014-06-06 03:31:32 +0800
54a217887 futex: Make lookup_pi_state more robust ... Browse Code »
5

The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex. We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address. This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

Waiter | pi_state | pi->owner | uTID | uODIED | ?

[1] NULL | --- | --- | 0 | 0/1 | Valid
[2] NULL | --- | --- | >0 | 0/1 | Valid

[3] Found | NULL | -- | Any | 0/1 | Invalid

[4] Found | Found | NULL | 0 | 1 | Valid
[5] Found | Found | NULL | >0 | 1 | Invalid

[6] Found | Found | task | 0 | 1 | Valid

[7] Found | Found | NULL | Any | 0 | Invalid

[8] Found | Found | task | ==taskTID | 0/1 | Valid
[9] Found | Found | task | 0 | 0 | Invalid
[10] Found | Found | task | !=taskTID | 0/1 | Invalid

[1] Indicates that the kernel can acquire the futex atomically. We
came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

[2] Valid, if TID does not belong to a kernel thread. If no matching
thread is found then it indicates that the owner TID has died.

[3] Invalid. The waiter is queued on a non PI futex

[4] Valid state after exit_robust_list(), which sets the user space
value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

[5] The user space value got manipulated between exit_robust_list()
and exit_pi_state_list()

[6] Valid state after exit_pi_state_list() which sets the new owner in
the pi_state but cannot access the user space value.

[7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

[8] Owner and user space value match

[9] There is no transient state which sets the user space TID to 0
except exit_robust_list(), but this is indicated by the
FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
TID out of sync.

Signed-off-by: Thomas Gleixner
Cc: Kees Cook
Cc: Will Drewry
Cc: Darren Hart
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Thomas Gleixner
2014-06-06 03:31:07 +0800
13fbca4c6 futex: Always cleanup owner tid in unlock_pi ... Browse Code »
5

If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex. So the owner TID of the current owner
(the unlocker) persists. That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner
Cc: Kees Cook
Cc: Will Drewry
Cc: Darren Hart
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Thomas Gleixner
2014-06-06 03:31:07 +0800
b3eaa9fc5 futex: Validate atomic acquisition in futex_lock_pi_atomic() ... Browse Code »
5

We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state. If
it has, return -EINVAL. The state is corrupted already, so no point in
cleaning it up. Subsequent calls will fail as well. Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart
Cc: Kees Cook
Cc: Will Drewry
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner
Signed-off-by: Linus Torvalds

Thomas Gleixner
2014-06-06 03:31:07 +0800
e9c243a5a futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in fu… ... Browse Code »
5

…tex_requeue(..., requeue_pi=1)

If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call. If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Thomas Gleixner
2014-06-06 03:31:07 +0800

05 Jun, 2014

10 commits

3e1a878b7 x86/smpboot: Initialize secondary CPU only if master CPU will wait for it ... Browse Code »
13

Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.

Signed-off-by: Igor Mammedov
Acked-by: Toshi Kani
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar

Igor Mammedov
2014-06-05 22:33:08 +0800
feef1e8ec x86/smpboot: Log error on secondary CPU wakeup failure at ERR level ... Browse Code »

If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.

Signed-off-by: Igor Mammedov
Acked-by: Toshi Kani
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar

Igor Mammedov
2014-06-05 22:33:07 +0800
89f898c1e x86: Fix list/memory corruption on CPU hotplug ... Browse Code »

currently if AP wake up is failed, master CPU marks AP as not
present in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:

[ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[ 418.178445] Call Trace:
[ 418.185811] [] dump_stack+0x49/0x5c
[ 418.186440] [] warn_slowpath_common+0x8c/0xc0
[ 418.187192] [] warn_slowpath_fmt+0x46/0x50
[ 418.191231] [] ? acpi_ns_get_node+0xb7/0xc7
[ 418.193889] [] __list_add+0xbe/0xd0
[ 418.196649] [] kobject_add_internal+0x79/0x200
[ 418.208610] [] kobject_add_varg+0x38/0x60
[ 418.213831] [] kobject_add+0x44/0x70
[ 418.229961] [] device_add+0xd0/0x550
[ 418.234991] [] ? pm_runtime_init+0xe5/0xf0
[ 418.250226] [] device_register+0x1e/0x30
[ 418.255296] [] register_cpu+0xe3/0x130
[ 418.266539] [] arch_register_cpu+0x65/0x150
[ 418.285845] [] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:

cpu = cpumask_next_zero(-1, cpu_present_mask);

which returns id of previously failed to wake up CPU, since its
bit is cleared by do_boot_cpu() and as result register_cpu()
tries to register another CPU with the same id as already
present but failed to be onlined CPU.

Taking in account that AP will not do anything if master CPU
failed to wake it up, there is no reason to mark that AP as not
present and break next cpu hotplug attempts. As a side effect of
not marking AP as not present, user would be allowed to online
it again later.

Also fix memory corruption in acpi_unmap_lsapic()

if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.

So with above fix of cpu_present_mask make sure that a present
CPU has a valid APIC ID by not setting x86_cpu_to_apicid
to BAD_APICID in do_boot_cpu() on failure and allow
acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.

Signed-off-by: Igor Mammedov
Acked-by: Toshi Kani
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar

Igor Mammedov
2014-06-05 22:33:07 +0800
09dc4ab03 sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock ... Browse Code »
5

tg_set_cfs_bandwidth() sets cfs_b->timer_active to 0 to
force the period timer restart. It's not safe, because
can lead to deadlock, described in commit 927b54fccbf0:
"__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock."

Three CPUs must be involved:

CPU0 CPU1 CPU2
take rq->lock period timer fired
... take cfs_b lock
... ... tg_set_cfs_bandwidth()
throttle_cfs_rq() release cfs_b lock take cfs_b lock
... distribute_cfs_runtime() timer_active = 0
take cfs_b->lock wait for rq->lock ...
__start_cfs_bandwidth()
{wait for timer callback
break if timer_active == 1}

So, CPU0 and CPU1 are deadlocked.

Instead of resetting cfs_b->timer_active, tg_set_cfs_bandwidth can
wait for period timer callbacks (ignoring cfs_b->timer_active) and
restart the timer explicitly.

Signed-off-by: Roman Gushchin
Reviewed-by: Ben Segall
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/87wqdi9g8e.wl\%klamm@yandex-team.ru
Cc: pjt@google.com
Cc: chris.j.arges@canonical.com
Cc: gregkh@linuxfoundation.org
Cc: Linus Torvalds
Signed-off-by: Ingo Molnar

Roman Gushchin
2014-06-05 17:51:34 +0800
0f397f2c9 sched/dl: Fix race in dl_task_timer() ... Browse Code »
18

Throttled task is still on rq, and it may be moved to other cpu
if user is playing with sched_setaffinity(). Therefore, unlocked
task_rq() access makes the race.

Juri Lelli reports he got this race when dl_bandwidth_enabled()
was not set.

Other thing, pointed by Peter Zijlstra:

"Now I suppose the problem can still actually happen when
you change the root domain and trigger a effective affinity
change that way".

To fix that we do the same as made in __task_rq_lock(). We do not
use __task_rq_lock() itself, because it has a useful lockdep check,
which is not correct in case of dl_task_timer(). We do not need
pi_lock locked here. This case is an exception (PeterZ):

"The only reason we don't strictly need ->pi_lock now is because
we're guaranteed to have p->state == TASK_RUNNING here and are
thus free of ttwu races".

Signed-off-by: Kirill Tkhai
Signed-off-by: Peter Zijlstra
Cc: # v3.14+
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ru
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-06-05 17:51:12 +0800
b14ed2c27 sched: Fix sched_policy < 0 comparison ... Browse Code »
5

attr.sched_policy is u32, therefore a comparison against < 0 is never true.
Fix this by casting sched_policy to int.

This issue was reported by coverity CID 1219934.

Fixes: dbdb22754fde ("sched: Disallow sched_attr::sched_policy < 0")
Signed-off-by: Richard Weinberger
Signed-off-by: Peter Zijlstra
Cc: Michael Kerrisk
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1401741514-7045-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar

Richard Weinberger
2014-06-05 17:07:43 +0800
e9dd685ce sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled ... Browse Code »

As Peter Zijlstra told me, we have the following path:

do_exit()
exit_itimers()
itimer_delete()
spin_lock_irqsave(&timer->it_lock, &flags);
timer_delete_hook(timer);
kc->timer_del(timer) := posix_cpu_timer_del()
put_task_struct()
__put_task_struct()
task_numa_free()
spin_lock(&grp->lock);

Which means that task_numa_free() can be called with interrupts
disabled, which means that we should not be using spin_lock_irq() but
spin_lock_irqsave() instead. Otherwise we are enabling interrupts while
holding an interrupt unsafe lock!

Signed-off-by: Steven Rostedt
Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Mike Galbraith
Cc: Eric Dumazet
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20140527182541.GH11096@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Steven Rostedt
2014-06-05 17:07:41 +0800
22c91aa23 Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/jolsa/perf into perf/urgent

Pull perf/urgent fixes from Jiri Olsa:

* Fix perf probe to find correct variable DIE (Masami Hiramatsu)

* Fix a segfault in perf probe if asked for variable it doesn't find (Masami Hiramatsu)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2014-06-05 15:54:01 +0800
54539cd21 Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu fix from Tejun Heo:
"It is very late but this is an important percpu-refcount fix from
Sebastian Ott.

The problem is that percpu_ref_*() used __this_cpu_*() instead of
this_cpu_*(). The difference between the two is that the latter is
atomic on the local cpu while the former is not. this_cpu_inc() is
guaranteed to increment the percpu counter on the cpu that the
operation is executed on without any synchronization; however,
__this_cpu_inc() doesn't and if the local cpu invokes the function
from different contexts (e.g. process and irq) of the same CPU, it's
not guaranteed to actually increment as it may be implemented as rmw.

This bug existed from the get-go but it hasn't been noticed earlier
probably because on x86 __this_cpu_inc() is equivalent to
this_cpu_inc() as both get translated into single instruction;
however, s390 uses the generic rmw implementation and gets affected by
the bug. Kudos to Sebastian and Heiko for diagnosing it.

The change is very low risk and fixes a critical issue on the affected
architectures, so I think it's a good candidate for inclusion although
it's very late in the devel cycle. On the other hand, this has been
broken since v3.11, so backporting it through -stable post -rc1 won't
be the end of the world.

I'll ping Christoph whether __this_cpu_*() ops can be better annotated
so that it can trigger lockdep warning when used from multiple
contexts"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu-refcount: fix usage of this_cpu_ops

Linus Torvalds
2014-06-05 00:56:03 +0800
0c36b390a percpu-refcount: fix usage of this_cpu_ops ... Browse Code »
18

The percpu-refcount infrastructure uses the underscore variants of
this_cpu_ops in order to modify percpu reference counters.
(e.g. __this_cpu_inc()).

However the underscore variants do not atomically update the percpu
variable, instead they may be implemented using read-modify-write
semantics (more than one instruction). Therefore it is only safe to
use the underscore variant if the context is always the same (process,
softirq, or hardirq). Otherwise it is possible to lose updates.

This problem is something that Sebastian has seen within the aio
subsystem which uses percpu refcounters both in process and softirq
context leading to reference counts that never dropped to zeroes; even
though the number of "get" and "put" calls matched.

Fix this by using the non-underscore this_cpu_ops variant which
provides correct per cpu atomic semantics and fixes the corrupted
reference counts.

Cc: Kent Overstreet
Cc: # v3.11+
Reported-by: Sebastian Ott
Signed-off-by: Heiko Carstens
Signed-off-by: Tejun Heo
References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett

Sebastian Ott
2014-06-05 00:12:29 +0800

04 Jun, 2014

10 commits

c717d1561 Merge tag 'pm-3.15-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull intel pstate fixes from Rafael Wysocki:
"Final power management fixes for 3.15

- Taking non-idle time into account when calculating core busy time
was a mistake and led to a performance regression. Since the
problem it was supposed to address is now taken care of in a
different way, we don't need to do it any more, so drop the
non-idle time tracking from intel_pstate. Dirk Brandewie.

- Changing to fixed point math throughout the busy calculation
introduced rounding errors that adversely affect the accuracy of
intel_pstate's computations. Fix from Dirk Brandewie.

- The PID controller algorithm used by intel_pstate assumes that the
time interval between two adjacent samples will always be the same
which is not the case for deferable timers (used by intel_pstate)
when the system is idle. This leads to inaccurate predictions and
artificially increases convergence times for the minimum P-state.
Fix from Dirk Brandewie.

- intel_pstate carries out computations using 32-bit variables that
may overflow for large enough values of APERF/MPERF. Switch to
using 64-bit variables for computations, from Doug Smythies"

* tag 'pm-3.15-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_pstate: Improve initial busy calculation
intel_pstate: add sample time scaling
intel_pstate: Correct rounding in busy calculation
intel_pstate: Remove C0 tracking

Linus Torvalds
2014-06-04 22:48:54 +0800
9e9a928ee Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux ... Browse Code »
13

Pull drm fixes from Dave Airlie:
"All fairly small: radeon stability and a panic path fix.

Mostly radeon fixes, suspend/resume fix, stability on the CIK
chipsets, along with a locking check avoidance patch for panic times
regression"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: use the CP DMA on CIK
drm/radeon: sync page table updates
drm/radeon: fix vm buffer size estimation
drm/crtc-helper: skip locking checks in panicking path
drm/radeon/dpm: resume fixes for some systems

Linus Torvalds
2014-06-04 22:48:01 +0800
082f96a93 perf probe: Fix perf probe to find correct variable DIE ... Browse Code »

Fix perf probe to find correct variable DIE which has location or
external instance by tracking down the lexical blocks.

Current die_find_variable() expects that the all variable DIEs
which has DW_TAG_variable have a location. However, since recent
dwarf information may have declaration variable DIEs at the
entry of function (subprogram), die_find_variable() returns it.

To solve this problem, it must track down the DIE tree to find
a DIE which has an actual location or a reference for external
instance.

e.g. finding a DIE which origin is ;

: Abbrev Number: 95 (DW_TAG_subprogram)
DW_AT_abstract_origin:
DW_AT_low_pc : 0x1850
[...]
: Abbrev Number: 119 (DW_TAG_variable) DW_AT_abstract_origin:
: Abbrev Number: 119 (DW_TAG_variable)
[...]
: Abbrev Number: 105 (DW_TAG_lexical_block)
DW_AT_ranges : 0xaa0
: Abbrev Number: 96 (DW_TAG_variable) DW_AT_abstract_origin:
DW_AT_location : 0x486c (location list)

Signed-off-by: Masami Hiramatsu
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20140529121930.30879.87092.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Jiri Olsa

Masami Hiramatsu
2014-06-04 20:49:20 +0800
0c188a07b perf probe: Fix a segfault if asked for variable it doesn't find ... Browse Code »

Fix a segfault bug by asking for variable it doesn't find.
Since the convert_variable() didn't handle error code returned
from convert_variable_location(), it just passed an incomplete
variable field and then a segfault was occurred when formatting
the field.

This fixes that bug by handling success code correctly in
convert_variable(). Other callers of convert_variable_location()
are correctly checking the return code.

This bug was introduced by following commit. But another hidden
erroneous error handling has been there previously (-ENOMEM case).

commit 3d918a12a1b3088ac16ff37fa52760639d6e2403

Signed-off-by: Masami Hiramatsu
Reported-by: Arnaldo Carvalho de Melo
Tested-by: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20140529105232.28251.30447.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Jiri Olsa

Masami Hiramatsu
2014-06-04 20:48:03 +0800
ac2a55395 x86: irq: Get correct available vectors for cpu disable ... Browse Code »

check_irq_vectors_for_cpu_disable() can overestimate the number of
available interrupt vectors, so the check for cpu down succeeds, but
the actual cpu removal fails.

It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong
because the systems vectors are not taken into account.

Limit the search to first_system_vector instead of NR_VECTORS.

The second indicator for vector availability the used_vectors bitmap
is not taken into account at all. So system vectors,
e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20),
are accounted as available.

Add a check for the used_vectors bitmap and do not account vectors
which are marked there.

[ tglx: Simplified code. Rewrote changelog and code comments. ]

Signed-off-by: Yinghai Lu
Acked-by: Prarit Bhargava
Cc: Seiji Aguchi
Cc: Andi Kleen
Cc: K. Y. Srinivasan
Cc: Steven Rostedt (Red Hat)
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: "Elliott, Robert (Server Storage)"
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner

Yinghai Lu
2014-06-04 20:18:34 +0800
0a4ae727d Merge branch 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes ... Browse Code »

The first one is a one liner fixing a stupid typo in the VM handling code and is only relevant if play with one of the VM defines.

The other two switches CIK to use the CPDMA instead of the SDMA for buffer moves, as it turned out the SDMA is still sometimes not 100% reliable.

* 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux:
drm/radeon: use the CP DMA on CIK
drm/radeon: sync page table updates
drm/radeon: fix vm buffer size estimation

Dave Airlie
2014-06-04 11:29:13 +0800
2363d1966 iser-target: Fix multi network portal shutdown regression ... Browse Code »
5

This patch fixes a iser-target specific regression introduced in
v3.15-rc6 with:

commit 14f4b54fe38f3a8f8392a50b951c8aa43b63687a
Author: Sagi Grimberg
Date: Tue Apr 29 13:13:47 2014 +0300

Target/iscsi,iser: Avoid accepting transport connections during stop stage

where the change to set iscsi_np->enabled = false within
iscsit_clear_tpg_np_login_thread() meant that a iscsi_np with
two iscsi_tpg_np exports would have it's parent iscsi_np set
to a disabled state, even if other iscsi_tpg_np exports still
existed.

This patch changes iscsit_clear_tpg_np_login_thread() to only
set iscsi_np->enabled = false when shutdown = true, and also
changes iscsit_del_np() to set iscsi_np->enabled = true when
iscsi_np->np_exports is non zero.

Cc: Sagi Grimberg
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger

Nicholas Bellinger
2014-06-04 10:17:32 +0800
79d59d080 iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value() ... Browse Code »
5

In non-leading connection login, iscsi_login_non_zero_tsih_s1() calls
iscsi_change_param_value() with the buffer it uses to hold the login
PDU, not a temporary buffer. This leads to the login header getting
corrupted and login failing for non-leading connections in MC/S.

Fix this by adding a wrapper iscsi_change_param_sprintf() that handles
the temporary buffer itself to avoid confusion. Also handle sending a
reject in case of failure in the wrapper, which lets the calling code
get quite a bit smaller and easier to read.

Finally, bump the size of the temporary buffer from 32 to 64 bytes to be
safe, since "MaxRecvDataSegmentLength=" by itself is 25 bytes; with a
trailing NUL, a value >= 1M will lead to a buffer overrun. (This isn't
the default but we don't need to run right at the ragged edge here)

Reported-by: Santosh Kulkarni
Signed-off-by: Roland Dreier
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger

Roland Dreier
2014-06-04 10:17:31 +0800
6cc44a6fb iser-target: Add missing target_put_sess_cmd for ImmedateData failure ... Browse Code »
5

This patch addresses a bug where an early exception for SCSI WRITE
with ImmediateData=Yes was missing the target_put_sess_cmd() call
to drop the extra se_cmd->cmd_kref reference obtained during the
normal iscsit_setup_scsi_cmd() codepath execution.

This bug was manifesting itself during session shutdown within
isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would
end up waiting indefinately for the last se_cmd->cmd_kref put to
occur for the failed SCSI WRITE + ImmediateData descriptors.

This fix follows what traditional iscsi-target code already does
for the same failure case within iscsit_get_immediate_data().

Reported-by: Sagi Grimberg
Cc: Sagi Grimberg
Cc: Or Gerlitz
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger

Nicholas Bellinger
2014-06-04 10:17:31 +0800
d2cfd3105 Merge tag 'sound-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound ... Browse Code »

Pull sound fixes from Takashi Iwai:
"A few addition of HD-audio fixups for ALC260 and AD1986A codecs. All
marked as stable fixes.

The fixes are pretty local and they are old machines, so quite safe to
apply"

* tag 'sound-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup
ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop
ALSA: hda/analog - Fix silent output on ASUS A8JN

Linus Torvalds
2014-06-04 03:07:30 +0800

03 Jun, 2014

1 commit

c9482a5bd kernfs: move the last knowledge of sysfs out from kernfs ... Browse Code »

There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.

Signed-off-by: Jianyu Zhan
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds

Jianyu Zhan
2014-06-03 23:11:18 +0800