17 Jun, 2014

13 commits

  • Greg Kroah-Hartman
     
  • commit 754a292fe6b08196cb135c03b404444e17de520a upstream.

    Add support for Marvell Technology Group Ltd. 88SE91A0 SATA 6Gb/s
    Controller by adding its PCI ID.

    Signed-off-by: Andreas Schrägle
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Andreas Schrägle
     
  • commit d251836508fb26cd1a22b41381739835ee23728d upstream.

    This device normally comes with a proprietary driver, using a web GUI
    to configure RAID:
    http://www.highpoint-tech.com/USA_new/series_rr600-download.htm
    But thankfully it also works out of the box with the AHCI driver,
    being just a Marvell 88SE9235.

    Devices 640L, 644L, 644LS should also be supported but not tested here.

    Signed-off-by: Jérôme Carretero
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Jérôme Carretero
     
  • commit 74a86272f05c3dae40f2d7b17ff09a0608cf3304 upstream.

    Added support for Sveon STV27 device (rtl2832u + FC0013 tuner)

    Signed-off-by: Alessandro Miceli
    Signed-off-by: Antti Palosaari
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Alessandro Miceli
     
  • commit f27f5b0ee4967babfb8b03511f5e76b79d781014 upstream.

    Added Sveon STV20 device based on Realtek RTL2832U and FC0012 tuner

    Signed-off-by: Alessandro Miceli
    Signed-off-by: Antti Palosaari
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Alessandro Miceli
     
  • commit 9ca24ae4083665bda38da45f4b5dc9bbaf936bc0 upstream.

    Add USB ID for Peak DVB-T USB.

    [crope@iki.fi: fix Brian email address and indentation]
    Signed-off-by: Brian Healy
    Signed-off-by: Antti Palosaari
    Signed-off-by: Greg Kroah-Hartman

    Signed-off-by: Mauro Carvalho Chehab

    Brian Healy
     
  • commit c40765d919d25d2d44d99c4ce39e48808f137e1e upstream.

    According the spec the host should read H_CSR again
    after asserting reset H_RST to ensure that reset was
    read by the firmware

    Signed-off-by: Tomas Winkler
    Signed-off-by: Alexander Usyskin
    Signed-off-by: Greg Kroah-Hartman

    Tomas Winkler
     
  • commit 07cd7be3d92eeeae1f92a017f2cfe4fdd9256526 upstream.

    It my take time till ME_RDY will be cleared after the reset,
    so we cannot check the bit before we got the interrupt

    Signed-off-by: Tomas Winkler
    Signed-off-by: Alexander Usyskin
    Signed-off-by: Greg Kroah-Hartman

    Tomas Winkler
     
  • commit b04ada92ffaabb868497a1fce8e4f6bf74e5488f upstream.

    We cleared H_RST for H_CSR on spurious interrupt generated when ME_RDY
    while cleared and not while ME_RDY is set. The spurious interrupt
    is not delivered on all platforms in this case the
    driver may fail to initialize.

    Signed-off-by: Tomas Winkler
    Signed-off-by: Alexander Usyskin
    Signed-off-by: Greg Kroah-Hartman

    Tomas Winkler
     
  • commit b701c0b1fe819a2083fc6ec5332e0e4492b9516d upstream.

    free_msi_irqs() is leaking memory, since list_for_each_entry(entry,
    &dev->msi_list, list) {...} is never executed, because dev->msi_list is
    made empty by the loop just above this one.

    Fix it by relying on zero termination of attribute array like
    populate_msi_sysfs() does.

    Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Bjorn Helgaas
    Acked-by: Neil Horman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • commit a3c54931199565930d6d84f4c3456f6440aefd41 upstream.

    Fixes an easy DoS and possible information disclosure.

    This does nothing about the broken state of x32 auditing.

    eparis: If the admin has enabled auditd and has specifically loaded
    audit rules. This bug has been around since before git. Wow...

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Eric Paris
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     
  • commit c2338f2dc7c1e9f6202f370c64ffd7f44f3d4b51 upstream.

    Dentry that had been through (or into) __dentry_kill() might be seen
    by shrink_dentry_list(); that's normal, it'll be taken off the shrink
    list and freed if __dentry_kill() has already finished. The problem
    is, its ->d_parent might be pointing to already freed dentry, so
    lock_parent() needs to be careful.

    We need to check that dentry hasn't already gone into __dentry_kill()
    *and* grab rcu_read_lock() before dropping ->d_lock - the latter makes
    sure that whatever we see in ->d_parent after dropping ->d_lock it
    won't be freed until we drop rcu_read_lock().

    Signed-off-by: Al Viro
    Cc: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 23adbe12ef7d3d4195e80800ab36b37bee28cd03 upstream.

    The kernel has no concept of capabilities with respect to inodes; inodes
    exist independently of namespaces. For example, inode_capable(inode,
    CAP_LINUX_IMMUTABLE) would be nonsense.

    This patch changes inode_capable to check for uid and gid mappings and
    renames it to capable_wrt_inode_uidgid, which should make it more
    obvious what it does.

    Fixes CVE-2014-4014.

    Cc: Theodore Ts'o
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Chinner
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     

09 Jun, 2014

2 commits


08 Jun, 2014

4 commits

  • Pull btrfs fix from Chris Mason:
    "I had this in my 3.16 merge window queue, but it is small and obvious
    enough for 3.15. I cherry-picked and retested against current rc8"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: send, fix corrupted path strings for long paths

    Linus Torvalds
     
  • Pull SCSI target fixes from Nicholas Bellinger:
    "Here are the remaining fixes for v3.15.

    This series includes:

    - iser-target fix for ImmediateData exception reference count bug
    (Sagi + nab)
    - iscsi-target fix for MC/S login + potential iser-target MRDSL
    buffer overrun (Santosh + Roland)
    - iser-target fix for v3.15-rc multi network portal shutdown
    regression (nab)
    - target fix for allowing READ_CAPCITY during ALUA Standby access
    state (Chris + nab)
    - target fix for NULL pointer dereference of alua_access_state for
    un-configured devices (Chris + nab)"

    * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    target: Fix alua_access_state attribute OOPs for un-configured devices
    target: Allow READ_CAPACITY opcode in ALUA Standby access state
    iser-target: Fix multi network portal shutdown regression
    iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value()
    iser-target: Add missing target_put_sess_cmd for ImmedateData failure

    Linus Torvalds
     
  • Pull x86 fixes from Peter Anvin:
    "A significantly larger than I'd like set of patches for just below the
    wire. All of these, however, fix real problems.

    The one thing that is genuinely scary in here is the change of SMP
    initialization, but that *does* fix a confirmed hang when booting
    virtual machines.

    There is also a patch to actually do the right thing about not
    offlining a CPU when there are not enough interrupt vectors available
    in the system; the accounting was done incorrectly. The worst case
    for that patch is that we fail to offline CPUs when we should (the new
    code is strictly more conservative than the old), so is not
    particularly risky.

    Most of the rest is minor stuff; the EFI patches are all about
    exporting correct information to boot loaders and kexec"

    * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/boot: EFI_MIXED should not prohibit loading above 4G
    x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
    x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
    x86: Fix list/memory corruption on CPU hotplug
    x86: irq: Get correct available vectors for cpu disable
    x86/efi: Do not export efi runtime map in case old map
    x86/efi: earlyprintk=efi,keep fix

    Linus Torvalds
     
  • commit 7d453eee36ae ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
    regression for the functionality to load kernels above 4G. The relevant
    (incorrect) reasoning behind this change can be seen in the commit
    message,

    "The xloadflags field in the bzImage header is also updated to reflect
    that the kernel supports both entry points by setting both of
    XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
    XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
    guaranteed to be addressable with 32-bits."

    This is obviously bogus since 32-bit EFI loaders will never place the
    kernel above the 4G mark. So this restriction is entirely unnecessary.

    But things are worse than that - since we want to encourage people to
    always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
    the box for both 32-bit and 64-bit firmware, commit 7d453eee36ae
    effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.

    Remove the overzealous and superfluous restriction and restore the
    XLF_CAN_BE_LOADED_ABOVE_4G functionality.

    Cc: "H. Peter Anvin"
    Cc: Dave Young
    Cc: Vivek Goyal
    Signed-off-by: Matt Fleming
    Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.org
    Signed-off-by: H. Peter Anvin

    Matt Fleming
     

07 Jun, 2014

3 commits

  • The age table walker doesn't check non-present hugetlb entry in common
    path, so hugetlb_entry() callbacks must check it. The reason for this
    behavior is that some callers want to handle it in its own way.

    [ I think that reason is bogus, btw - it should just do what the regular
    code does, which is to call the "pte_hole()" function for such hugetlb
    entries - Linus]

    However, some callers don't check it now, which causes unpredictable
    result, for example when we have a race between migrating hugepage and
    reading /proc/pid/numa_maps. This patch fixes it by adding !pte_present
    checks on buggy callbacks.

    This bug exists for years and got visible by introducing hugepage
    migration.

    ChangeLog v2:
    - fix if condition (check !pte_present() instead of pte_present())

    Reported-by: Sasha Levin
    Signed-off-by: Naoya Horiguchi
    Cc: Rik van Riel
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    [ Backported to 3.15. Signed-off-by: Josh Boyer ]
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • If a path has more than 230 characters, we allocate a new buffer to
    use for the path, but we were forgotting to copy the contents of the
    previous buffer into the new one, which has random content from the
    kmalloc call.

    Test:

    mkfs.btrfs -f /dev/sdd
    mount /dev/sdd /mnt

    TEST_PATH="/mnt/fdmanana/.config/google-chrome-mysetup/Default/Pepper_Data/Shockwave_Flash/WritableRoot/#SharedObjects/JSHJ4ZKN/s.wsj.net/[[IMPORT]]/players.edgesuite.net/flash/plugins/osmf/advanced-streaming-plugin/v2.7/osmf1.6/Ak#"
    mkdir -p $TEST_PATH
    echo "hello world" > $TEST_PATH/amaiAdvancedStreamingPlugin.txt

    btrfs subvolume snapshot -r /mnt /mnt/mysnap1
    btrfs send /mnt/mysnap1 -f /tmp/1.snap

    A test for xfstests follows.

    Signed-off-by: Filipe David Borba Manana
    Cc: Marc Merlin
    Tested-by: Marc MERLIN
    Signed-off-by: Chris Mason

    Filipe Manana
     
  • Pull scheduler fixes from Ingo Molnar:
    "Four misc fixes: each was deemed serious enough to warrant v3.15
    inclusion"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
    sched/dl: Fix race in dl_task_timer()
    sched: Fix sched_policy < 0 comparison
    sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled

    Linus Torvalds
     

06 Jun, 2014

10 commits

  • While working address sanitizer for kernel I've discovered
    use-after-free bug in __put_anon_vma.

    For the last anon_vma, anon_vma->root freed before child anon_vma.
    Later in anon_vma_free(anon_vma) we are referencing to already freed
    anon_vma->root to check rwsem.

    This fixes it by freeing the child anon_vma before freeing
    anon_vma->root.

    Signed-off-by: Andrey Ryabinin
    Acked-by: Peter Zijlstra
    Cc: # v3.0+
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • This patch fixes a OOPs where an attempt to write to the per-device
    alua_access_state configfs attribute at:

    /sys/kernel/config/target/core/$HBA/$DEV/alua/$TG_PT_GP/alua_access_state

    results in an NULL pointer dereference when the backend device has not
    yet been configured.

    This patch adds an explicit check for DF_CONFIGURED, and fails with
    -ENODEV to avoid this case.

    Reported-by: Chris Boot
    Reported-by: Philip Gaw
    Cc: Chris Boot
    Cc: Philip Gaw
    Cc: stable@vger.kernel.org # 3.8+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch allows READ_CAPACITY + SAI_READ_CAPACITY_16 opcode
    processing to occur while the associated ALUA group is in Standby
    access state.

    This is required to avoid host side LUN probe failures during the
    initial scan if an ALUA group has already implicitly changed into
    Standby access state.

    This addresses a bug reported by Chris + Philip using dm-multipath
    + ESX hosts configured with ALUA multipath.

    Reported-by: Chris Boot
    Reported-by: Philip Gaw
    Cc: Chris Boot
    Cc: Philip Gaw
    Cc: Hannes Reinecke
    Cc: stable@vger.kernel.org
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
    of the framebuffer when early_ioremap() is no longer available and
    dropping __init from functions that may be invoked after
    free_initmem() - Dave Young

    * We shouldn't be exporting the EFI runtime map in sysfs if not using
    the new 1:1 EFI mapping code since in that case the mappings are not
    static across a kexec reboot - Dave Young

    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     
  • Pull perf fixes from Ingo Molnar:
    "Two last minute tooling fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf probe: Fix perf probe to find correct variable DIE
    perf probe: Fix a segfault if asked for variable it doesn't find

    Linus Torvalds
     
  • Merge futex fixes from Thomas Gleixner:
    "So with more awake and less futex wreckaged brain, I went through my
    list of points again and came up with the following 4 patches.

    1) Prevent pi requeueing on the same futex

    I kept Kees check for uaddr1 == uaddr2 as a early check for private
    futexes and added a key comparison to both futex_requeue and
    futex_wait_requeue_pi.

    Sebastian, sorry for the confusion yesterday night. I really
    misunderstood your question.

    You are right the check is pointless for shared futexes where the
    same physical address is mapped to two different virtual addresses.

    2) Sanity check atomic acquisiton in futex_lock_pi_atomic

    That's basically what Darren suggested.

    I just simplified it to use futex_top_waiter() to find kernel
    internal state. If state is found return -EINVAL and do not bother
    to fix up the user space variable. It's corrupted already.

    3) Ensure state consistency in futex_unlock_pi

    The code is silly versus the owner died bit. There is no point to
    preserve it on unlock when the user space thread owns the futex.

    What's worse is that it does not update the user space value when
    the owner died bit is set. So the kernel itself creates observable
    inconsistency.

    Another "optimization" is to retry an atomic unlock. That's
    pointless as in a sane environment user space would not call into
    that code if it could have unlocked it atomically. So we always
    check whether there is kernel state around and only if there is
    none, we do the unlock by setting the user space value to 0.

    4) Sanitize lookup_pi_state

    lookup_pi_state is ambigous about TID == 0 in the user space value.

    This can be a valid state even if there is kernel state on this
    uaddr, but we miss a few corner case checks.

    I tried to come up with a smaller solution hacking the checks into
    the current cruft, but it turned out to be ugly as hell and I got
    more confused than I was before. So I rewrote the sanity checks
    along the state documentation with awful lots of commentry"

    * emailed patches from Thomas Gleixner :
    futex: Make lookup_pi_state more robust
    futex: Always cleanup owner tid in unlock_pi
    futex: Validate atomic acquisition in futex_lock_pi_atomic()
    futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)

    Linus Torvalds
     
  • The current implementation of lookup_pi_state has ambigous handling of
    the TID value 0 in the user space futex. We can get into the kernel
    even if the TID value is 0, because either there is a stale waiters bit
    or the owner died bit is set or we are called from the requeue_pi path
    or from user space just for fun.

    The current code avoids an explicit sanity check for pid = 0 in case
    that kernel internal state (waiters) are found for the user space
    address. This can lead to state leakage and worse under some
    circumstances.

    Handle the cases explicit:

    Waiter | pi_state | pi->owner | uTID | uODIED | ?

    [1] NULL | --- | --- | 0 | 0/1 | Valid
    [2] NULL | --- | --- | >0 | 0/1 | Valid

    [3] Found | NULL | -- | Any | 0/1 | Invalid

    [4] Found | Found | NULL | 0 | 1 | Valid
    [5] Found | Found | NULL | >0 | 1 | Invalid

    [6] Found | Found | task | 0 | 1 | Valid

    [7] Found | Found | NULL | Any | 0 | Invalid

    [8] Found | Found | task | ==taskTID | 0/1 | Valid
    [9] Found | Found | task | 0 | 0 | Invalid
    [10] Found | Found | task | !=taskTID | 0/1 | Invalid

    [1] Indicates that the kernel can acquire the futex atomically. We
    came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

    [2] Valid, if TID does not belong to a kernel thread. If no matching
    thread is found then it indicates that the owner TID has died.

    [3] Invalid. The waiter is queued on a non PI futex

    [4] Valid state after exit_robust_list(), which sets the user space
    value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

    [5] The user space value got manipulated between exit_robust_list()
    and exit_pi_state_list()

    [6] Valid state after exit_pi_state_list() which sets the new owner in
    the pi_state but cannot access the user space value.

    [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

    [8] Owner and user space value match

    [9] There is no transient state which sets the user space TID to 0
    except exit_robust_list(), but this is indicated by the
    FUTEX_OWNER_DIED bit. See [4]

    [10] There is no transient state which leaves owner and user space
    TID out of sync.

    Signed-off-by: Thomas Gleixner
    Cc: Kees Cook
    Cc: Will Drewry
    Cc: Darren Hart
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • If the owner died bit is set at futex_unlock_pi, we currently do not
    cleanup the user space futex. So the owner TID of the current owner
    (the unlocker) persists. That's observable inconsistant state,
    especially when the ownership of the pi state got transferred.

    Clean it up unconditionally.

    Signed-off-by: Thomas Gleixner
    Cc: Kees Cook
    Cc: Will Drewry
    Cc: Darren Hart
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • We need to protect the atomic acquisition in the kernel against rogue
    user space which sets the user space futex to 0, so the kernel side
    acquisition succeeds while there is existing state in the kernel
    associated to the real owner.

    Verify whether the futex has waiters associated with kernel state. If
    it has, return -EINVAL. The state is corrupted already, so no point in
    cleaning it up. Subsequent calls will fail as well. Not our problem.

    [ tglx: Use futex_top_waiter() and explain why we do not need to try
    restoring the already corrupted user space state. ]

    Signed-off-by: Darren Hart
    Cc: Kees Cook
    Cc: Will Drewry
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • …tex_requeue(..., requeue_pi=1)

    If uaddr == uaddr2, then we have broken the rule of only requeueing from
    a non-pi futex to a pi futex with this call. If we attempt this, then
    dangling pointers may be left for rt_waiter resulting in an exploitable
    condition.

    This change brings futex_requeue() in line with futex_wait_requeue_pi()
    which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
    uaddr == uaddr2 in futex_wait_requeue_pi()")

    [ tglx: Compare the resulting keys as well, as uaddrs might be
    different depending on the mapping ]

    Fixes CVE-2014-3153.

    Reported-by: Pinkie Pie
    Signed-off-by: Will Drewry <wad@chromium.org>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Darren Hart <dvhart@linux.intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Thomas Gleixner
     

05 Jun, 2014

8 commits

  • Hang is observed on virtual machines during CPU hotplug,
    especially in big guests with many CPUs. (It reproducible
    more often if host is over-committed).

    It happens because master CPU gives up waiting on
    secondary CPU and allows it to run wild. As result
    AP causes locking or crashing system. For example
    as described here:

    https://lkml.org/lkml/2014/3/6/257

    If master CPU have sent STARTUP IPI successfully,
    and AP signalled to master CPU that it's ready
    to start initialization, make master CPU wait
    indefinitely till AP is onlined.
    To ensure that AP won't ever run wild, make it
    wait at early startup till master CPU confirms its
    intention to wait for AP. If AP doesn't respond in 10
    seconds, the master CPU will timeout and cancel
    AP onlining.

    Signed-off-by: Igor Mammedov
    Acked-by: Toshi Kani
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com
    Signed-off-by: Ingo Molnar

    Igor Mammedov
     
  • If system is running without debug level logging,
    it will not log error if do_boot_cpu() failed to
    wakeup AP. It may lead to silent AP bringup
    failures at boot time.
    Change message level to KERN_ERR to make error
    visible to user as it's done on other architectures.

    Signed-off-by: Igor Mammedov
    Acked-by: Toshi Kani
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com
    Signed-off-by: Ingo Molnar

    Igor Mammedov
     
  • currently if AP wake up is failed, master CPU marks AP as not
    present in do_boot_cpu() by calling set_cpu_present(cpu, false).
    That leads to following list corruption on the next physical CPU
    hotplug:

    [ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
    [ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
    [ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
    [ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
    [ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
    [ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    [ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
    [ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
    [ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
    [ 418.178445] Call Trace:
    [ 418.185811] [] dump_stack+0x49/0x5c
    [ 418.186440] [] warn_slowpath_common+0x8c/0xc0
    [ 418.187192] [] warn_slowpath_fmt+0x46/0x50
    [ 418.191231] [] ? acpi_ns_get_node+0xb7/0xc7
    [ 418.193889] [] __list_add+0xbe/0xd0
    [ 418.196649] [] kobject_add_internal+0x79/0x200
    [ 418.208610] [] kobject_add_varg+0x38/0x60
    [ 418.213831] [] kobject_add+0x44/0x70
    [ 418.229961] [] device_add+0xd0/0x550
    [ 418.234991] [] ? pm_runtime_init+0xe5/0xf0
    [ 418.250226] [] device_register+0x1e/0x30
    [ 418.255296] [] register_cpu+0xe3/0x130
    [ 418.266539] [] arch_register_cpu+0x65/0x150
    [ 418.285845] [] acpi_processor_hotadd_init+0x5a/0x9b
    ...
    Which is caused by the fact that generic_processor_info() allocates
    logical CPU id by calling:

    cpu = cpumask_next_zero(-1, cpu_present_mask);

    which returns id of previously failed to wake up CPU, since its
    bit is cleared by do_boot_cpu() and as result register_cpu()
    tries to register another CPU with the same id as already
    present but failed to be onlined CPU.

    Taking in account that AP will not do anything if master CPU
    failed to wake it up, there is no reason to mark that AP as not
    present and break next cpu hotplug attempts. As a side effect of
    not marking AP as not present, user would be allowed to online
    it again later.

    Also fix memory corruption in acpi_unmap_lsapic()

    if during CPU hotplug master CPU failed to wake up AP
    it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

    However following attempt to unplug that CPU will lead to
    out of bound write access to __apicid_to_node[] which is
    32768 items long on x86_64 kernel.

    So with above fix of cpu_present_mask make sure that a present
    CPU has a valid APIC ID by not setting x86_cpu_to_apicid
    to BAD_APICID in do_boot_cpu() on failure and allow
    acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.

    Signed-off-by: Igor Mammedov
    Acked-by: Toshi Kani
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
    Signed-off-by: Ingo Molnar

    Igor Mammedov
     
  • tg_set_cfs_bandwidth() sets cfs_b->timer_active to 0 to
    force the period timer restart. It's not safe, because
    can lead to deadlock, described in commit 927b54fccbf0:
    "__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
    waiting for the hrtimer to finish. However, if sched_cfs_period_timer
    runs for another loop iteration, the hrtimer can attempt to take
    rq->lock, resulting in deadlock."

    Three CPUs must be involved:

    CPU0 CPU1 CPU2
    take rq->lock period timer fired
    ... take cfs_b lock
    ... ... tg_set_cfs_bandwidth()
    throttle_cfs_rq() release cfs_b lock take cfs_b lock
    ... distribute_cfs_runtime() timer_active = 0
    take cfs_b->lock wait for rq->lock ...
    __start_cfs_bandwidth()
    {wait for timer callback
    break if timer_active == 1}

    So, CPU0 and CPU1 are deadlocked.

    Instead of resetting cfs_b->timer_active, tg_set_cfs_bandwidth can
    wait for period timer callbacks (ignoring cfs_b->timer_active) and
    restart the timer explicitly.

    Signed-off-by: Roman Gushchin
    Reviewed-by: Ben Segall
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/87wqdi9g8e.wl\%klamm@yandex-team.ru
    Cc: pjt@google.com
    Cc: chris.j.arges@canonical.com
    Cc: gregkh@linuxfoundation.org
    Cc: Linus Torvalds
    Signed-off-by: Ingo Molnar

    Roman Gushchin
     
  • Throttled task is still on rq, and it may be moved to other cpu
    if user is playing with sched_setaffinity(). Therefore, unlocked
    task_rq() access makes the race.

    Juri Lelli reports he got this race when dl_bandwidth_enabled()
    was not set.

    Other thing, pointed by Peter Zijlstra:

    "Now I suppose the problem can still actually happen when
    you change the root domain and trigger a effective affinity
    change that way".

    To fix that we do the same as made in __task_rq_lock(). We do not
    use __task_rq_lock() itself, because it has a useful lockdep check,
    which is not correct in case of dl_task_timer(). We do not need
    pi_lock locked here. This case is an exception (PeterZ):

    "The only reason we don't strictly need ->pi_lock now is because
    we're guaranteed to have p->state == TASK_RUNNING here and are
    thus free of ttwu races".

    Signed-off-by: Kirill Tkhai
    Signed-off-by: Peter Zijlstra
    Cc: # v3.14+
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ru
    Signed-off-by: Ingo Molnar

    Kirill Tkhai
     
  • attr.sched_policy is u32, therefore a comparison against < 0 is never true.
    Fix this by casting sched_policy to int.

    This issue was reported by coverity CID 1219934.

    Fixes: dbdb22754fde ("sched: Disallow sched_attr::sched_policy < 0")
    Signed-off-by: Richard Weinberger
    Signed-off-by: Peter Zijlstra
    Cc: Michael Kerrisk
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1401741514-7045-1-git-send-email-richard@nod.at
    Signed-off-by: Ingo Molnar

    Richard Weinberger
     
  • As Peter Zijlstra told me, we have the following path:

    do_exit()
    exit_itimers()
    itimer_delete()
    spin_lock_irqsave(&timer->it_lock, &flags);
    timer_delete_hook(timer);
    kc->timer_del(timer) := posix_cpu_timer_del()
    put_task_struct()
    __put_task_struct()
    task_numa_free()
    spin_lock(&grp->lock);

    Which means that task_numa_free() can be called with interrupts
    disabled, which means that we should not be using spin_lock_irq() but
    spin_lock_irqsave() instead. Otherwise we are enabling interrupts while
    holding an interrupt unsafe lock!

    Signed-off-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Mike Galbraith
    Cc: Eric Dumazet
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140527182541.GH11096@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • …it/jolsa/perf into perf/urgent

    Pull perf/urgent fixes from Jiri Olsa:

    * Fix perf probe to find correct variable DIE (Masami Hiramatsu)

    * Fix a segfault in perf probe if asked for variable it doesn't find (Masami Hiramatsu)

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar