07 Mar, 2015

40 commits

  • commit 428d53be5e7468769d4e7899cca06ed5f783a6e1 upstream.

    We have to delete the allocated interrupt info if __inject_vm() fails.

    Otherwise user space can keep flooding kvm with floating interrupts and
    provoke more and more memory leaks.

    Reported-by: Dominik Dingel
    Reviewed-by: Dominik Dingel
    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit 8e2207cdd087ebb031e9118d1fd0902c6533a5e5 upstream.

    If a vm with no VCPUs is created, the injection of a floating irq
    leads to an endless loop in the kernel.

    Let's skip the search for a destination VCPU for a floating irq if no
    VCPUs were created.

    Reviewed-by: Dominik Dingel
    Reviewed-by: Cornelia Huck
    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit 0ac96caf0f9381088c673a16d910b1d329670edf upstream.

    The hrtimer that handles the wait with enabled timer interrupts
    should not be disturbed by changes of the host time.

    This patch changes our hrtimer to be based on a monotonic clock.

    Signed-off-by: David Hildenbrand
    Acked-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit 2d00f759427bb3ed963b60f570830e9eca7e1c69 upstream.

    Patch 0759d0681cae ("KVM: s390: cleanup handle_wait by reusing
    kvm_vcpu_block") changed the way pending guest clock comparator
    interrupts are detected. It was assumed that as soon as the hrtimer
    wakes up, the condition for the guest ckc is satisfied.

    This is however only true as long as adjclock() doesn't speed
    up the monotonic clock. Reason is that the hrtimer is based on
    CLOCK_MONOTONIC, the guest clock comparator detection is based
    on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
    TOD clock, the hrtimer wakes the target VCPU up too early and
    the target VCPU will not detect any pending interrupts, therefore
    going back to sleep. It will never be woken up again because the
    hrtimer has finished. The VCPU is stuck.

    As a quick fix, we have to forward the hrtimer until the guest
    clock comparator is really due, to guarantee properly timed wake
    ups.

    As the hrtimer callback might be triggered on another cpu, we
    have to make sure that the timer is really stopped and not currently
    executing the callback on another cpu. This can happen if the vcpu
    thread is scheduled onto another physical cpu, but the timer base
    is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.

    A proper fix might be to introduce a RAW based hrtimer.

    Reported-by: Christian Borntraeger
    Signed-off-by: David Hildenbrand
    Acked-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit 7f187922ddf6b67f2999a76dcb71663097b75497 upstream.

    When the guest writes to the TSC, the masterclock TSC copy must be
    updated as well along with the TSC_OFFSET update, otherwise a negative
    tsc_timestamp is calculated at kvm_guest_time_update.

    Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
    "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
    becomes redundant, so remove the do_request boolean and collapse
    everything into a single condition.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Marcelo Tosatti
     
  • commit 23b133bdc452aa441fcb9b82cbf6dd05cfd342d0 upstream.

    Check length of extended attributes and allocation descriptors when
    loading inodes from disk. Otherwise corrupted filesystems could confuse
    the code and make the kernel oops.

    Reported-by: Carl Henrik Lunde
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 79144954278d4bb5989f8b903adcac7a20ff2a5a upstream.

    Store blocksize in a local variable in udf_fill_inode() since it is used
    a lot of times.

    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit ed4cbc81addbc076b016c5b979fd1a02f0897f0a upstream.

    activate_mm() and switch_mm() call get_new_mmu_context() which in turn
    can enable the HTW before the entryhi is changed with the new ASID.
    Since the latter will enable the HTW in local_flush_tlb_all(),
    then there is a small timing window where the HTW is running with the
    new ASID but with an old pgd since the TLBMISS_HANDLER_SETUP_PGD
    hasn't assigned a new one yet. In order to prevent that, we introduce a
    simple htw counter to avoid starting HTW accidentally due to nested
    htw_{start,stop}() sequences. Moreover, since various IPI calls can
    enforce TLB flushing operations on a different core, such an operation
    may interrupt another htw_{stop,start} in progress leading inconsistent
    updates of the htw_seq variable. In order to avoid that, we disable the
    interrupts whenever we update that variable.

    Signed-off-by: Markos Chandras
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/9118/
    Signed-off-by: Ralf Baechle
    Signed-off-by: Greg Kroah-Hartman

    Markos Chandras
     
  • commit 06f34e1c28f3608b0ce5b310e41102d3fe7b65a1 upstream.

    We used to calculate page address differently in 2 cases:

    1. In virt_to_page(x) we do
    --->8---
    mem_map + (x - CONFIG_LINUX_LINK_BASE) >> PAGE_SHIFT
    --->8---

    2. In in pte_page(x) we do
    --->8---
    mem_map + (pte_val(x) - PAGE_OFFSET) >> PAGE_SHIFT
    --->8---

    That leads to problems in case PAGE_OFFSET != CONFIG_LINUX_LINK_BASE -
    different pages will be selected depending on where and how we calculate
    page address.

    In particular in the STAR 9000853582 when gdb attempted to read memory
    of another process it got improper page in get_user_pages() because this
    is exactly one of the places where we search for a page by pte_page().

    The fix is trivial - we need to calculate page address similarly in both
    cases.

    Signed-off-by: Alexey Brodkin
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Alexey Brodkin
     
  • commit 5f1437f61a0b351d25b528c159360da3d5e8c77b upstream.

    When the UART is in DMA receive mode (RDMAS set) and one character
    just arrived while another interrupt is handled (e.g. TX), the RDRF
    (receiver data register full flag) is set due to the water level of
    1. But since the DMA will take care of this character, there is no
    need to handle it by calling lpuart_prepare_rx. Handling it leads to
    adding the RX timeout timer twice:

    [ 74.336698] Kernel BUG at 80053070 [verbose debug info unavailable]
    [ 74.342999] Internal error: Oops - BUG: 0 [#1] ARM0:00.00 khungtaskd
    [ 74.347817] Modules linked in: 0 S 0.0 0.0 0:00.00 writeback
    [ 74.350926] CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00001-g39d78e2 #1788
    [ 74.358617] Hardware name: Freescale Vybrid VF610 (Device Tree)t
    [ 74.364563] task: 807a7678 ti: 8079c000 task.ti: 8079c000 kblockd
    [ 74.370002] PC is at add_timer+0x24/0x28.0 0.0 0:00.09 kworker/u2:1
    [ 74.373960] LR is at lpuart_int+0x15c/0x3d8
    [ 74.378171] pc : [] lr : [] psr: a0010193
    [ 74.378171] sp : 8079de10 ip : 8079de20 fp : 8079de1c
    [ 74.389694] r10: 807d44c0 r9 : 8688c300 r8 : 00000013
    [ 74.394943] r7 : 20010193 r6 : 00000000 r5 : 000000a0 r4 : 86997210
    [ 74.401498] r3 : ffffa7da r2 : 80817868 r1 : 86997210 r0 : 86997344
    [ 74.408052] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
    [ 74.415489] Control: 10c5387d Table: 8611c059 DAC: 00000015
    [ 74.421265] Process swapper (pid: 0, stack limit = 0x8079c230)
    ...

    Solve this by only execute the receiver path (lpuart_prepare_rx) if
    the DMA receive mode (RDMAS) is not set. Also, make sure the flag is
    cleared on initialization, in case it has been left set.

    This can be best reproduced using UART as a serial console, then
    running top while dd'ing data into the terminal.

    Signed-off-by: Stefan Agner
    Signed-off-by: Greg Kroah-Hartman

    Stefan Agner
     
  • commit 4a8588a1cf867333187d9ff071e6fbdab587d194 upstream.

    If the serial port gets closed while a RX transfer is in progress,
    the timer might fire after the serial port shutdown finished. This
    leads in a NULL pointer dereference:

    [ 7.508324] Unable to handle kernel NULL pointer dereference at virtual address 00000000
    [ 7.516590] pgd = 86348000
    [ 7.519445] [00000000] *pgd=86179831, *pte=00000000, *ppte=00000000
    [ 7.526145] Internal error: Oops: 17 [#1] ARM
    [ 7.530611] Modules linked in:
    [ 7.533876] CPU: 0 PID: 123 Comm: systemd Not tainted 3.19.0-rc3-00004-g5b11ea7 #1778
    [ 7.541827] Hardware name: Freescale Vybrid VF610 (Device Tree)
    [ 7.547862] task: 861c3400 ti: 86ac8000 task.ti: 86ac8000
    [ 7.553392] PC is at lpuart_timer_func+0x24/0xf8
    [ 7.558127] LR is at lpuart_timer_func+0x20/0xf8
    [ 7.562857] pc : [] lr : [] psr: 600b0113
    [ 7.562857] sp : 86ac9b90 ip : 86ac9b90 fp : 86ac9bbc
    [ 7.574467] r10: 80817180 r9 : 80817b98 r8 : 80817998
    [ 7.579803] r7 : 807acee0 r6 : 86989000 r5 : 00000100 r4 : 86997210
    [ 7.586444] r3 : 86ac8000 r2 : 86ac9bc0 r1 : 86997210 r0 : 00000000
    [ 7.593085] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
    [ 7.600341] Control: 10c5387d Table: 86348059 DAC: 00000015
    [ 7.606203] Process systemd (pid: 123, stack limit = 0x86ac8230)

    Setup the timer on UART startup which allows to delete the timer
    unconditionally on shutdown. This also saves the initialization
    on each transfer.

    Signed-off-by: Stefan Agner
    Signed-off-by: Greg Kroah-Hartman

    Stefan Agner
     
  • commit 29183a70b0b828500816bd794b3fe192fce89f73 upstream.

    Additional validation of adjtimex freq values to avoid
    potential multiplication overflows were added in commit
    5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values)

    Unfortunately the patch used LONG_MAX/MIN instead of
    LLONG_MAX/MIN, which was fine on 64-bit systems, but being
    much smaller on 32-bit systems caused false positives
    resulting in most direct frequency adjustments to fail w/
    EINVAL.

    ntpd only does direct frequency adjustments at startup, so
    the issue was not as easily observed there, but other time
    sync applications like ptpd and chrony were more effected by
    the bug.

    See bugs:

    https://bugzilla.kernel.org/show_bug.cgi?id=92481
    https://bugzilla.redhat.com/show_bug.cgi?id=1188074

    This patch changes the checks to use LLONG_MAX for
    clarity, and additionally the checks are disabled
    on 32-bit systems since LLONG_MAX/PPM_SCALE is always
    larger then the 32-bit long freq value, so multiplication
    overflows aren't possible there.

    Reported-by: Josh Boyer
    Reported-by: George Joseph
    Tested-by: George Joseph
    Signed-off-by: John Stultz
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Sasha Levin
    Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
    [ Prettified the changelog and the comments a bit. ]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    John Stultz
     
  • commit 146755923262037fc4c54abc28c04b1103f3cc51 upstream.

    The output of KDB 'summary' command should report MemTotal, MemFree
    and Buffers output in kB. Current codes report in unit of pages.

    A define of K(x) as
    is defined in the code, but not used.

    This patch would apply the define to convert the values to kB.
    Please include me on Cc on replies. I do not subscribe to linux-kernel.

    Signed-off-by: Jay Lan
    Signed-off-by: Jason Wessel
    Signed-off-by: Greg Kroah-Hartman

    Jay Lan
     
  • commit 165235180ff61f0012ea68a299e46daec43dcaa7 upstream.

    mvebu_armada375_smp_wa_init is only used on armada 375 but is defined
    for all mvebu machines. As it calls a function that is only provided
    sometimes, this can result in a link error:

    arch/arm/mach-mvebu/built-in.o: In function `mvebu_armada375_smp_wa_init':
    :(.text+0x228): undefined reference to `mvebu_setup_boot_addr_wa'

    To solve this, we can just change the existing #ifdef around the
    function to also check for Armada375 SMP platforms.

    Signed-off-by: Arnd Bergmann
    Fixes: 305969fb6292 ("ARM: mvebu: use the common function for Armada 375 SMP workaround")
    Cc: Andrew Lunn
    Cc: Jason Cooper
    Cc: Gregory Clement
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit 95fcedb027a27f32bf2434f9271635c380e57fb5 upstream.

    The vexpress tc2 power management code calls mcpm_loopback, which
    is only available if ARM_CPU_SUSPEND is enabled, otherwise we
    get a link error:

    arch/arm/mach-vexpress/built-in.o: In function `tc2_pm_init':
    arch/arm/mach-vexpress/tc2_pm.c:389: undefined reference to `mcpm_loopback'

    This explicitly selects ARM_CPU_SUSPEND like other platforms that
    need it.

    Signed-off-by: Arnd Bergmann
    Fixes: 3592d7e002438 ("ARM: 8082/1: TC2: test the MCPM loopback during boot")
    Acked-by: Nicolas Pitre
    Acked-by: Liviu Dudau
    Cc: Kevin Hilman
    Cc: Sudeep Holla
    Cc: Lorenzo Pieralisi
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit 9bc78f32c2e430aebf6def965b316aa95e37a20c upstream.

    Add regulator_has_full_constraints() call to poodle board file to let
    regulator core know that we do not have any additional regulators left.
    This lets it substitute unprovided regulators with dummy ones.

    This fixes the following warnings that can be seen on poodle if
    regulators are enabled:

    ads7846 spi1.0: unable to get regulator: -517
    spi spi1.0: Driver ads7846 requests probe deferral
    wm8731 0-001b: Failed to get supply 'AVDD': -517
    wm8731 0-001b: Failed to request supplies: -517
    wm8731 0-001b: ASoC: failed to probe component -517

    Signed-off-by: Dmitry Eremin-Solenikov
    Acked-by: Mark Brown
    Signed-off-by: Robert Jarzmik
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Eremin-Solenikov
     
  • commit 271e80176aae4e5b481f4bb92df9768c6075bbca upstream.

    Add regulator_has_full_constraints() call to corgi board file to let
    regulator core know that we do not have any additional regulators left.
    This lets it substitute unprovided regulators with dummy ones.

    This fixes the following warnings that can be seen on corgi if
    regulators are enabled:

    ads7846 spi1.0: unable to get regulator: -517
    spi spi1.0: Driver ads7846 requests probe deferral
    wm8731 0-001b: Failed to get supply 'AVDD': -517
    wm8731 0-001b: Failed to request supplies: -517
    wm8731 0-001b: ASoC: failed to probe component -517
    corgi-audio corgi-audio: ASoC: failed to instantiate card -517

    Signed-off-by: Dmitry Eremin-Solenikov
    Acked-by: Mark Brown
    Signed-off-by: Robert Jarzmik
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Eremin-Solenikov
     
  • commit 19e3ae6b4f07a87822c1c9e7ed99d31860e701af upstream.

    The vcs device's poll/fasync support relies on the vt notifier to signal
    changes to the screen content. Notifier invocations were missing for
    changes that comes through the selection interface though. Fix that.

    Tested with BRLTTY 5.2.

    Signed-off-by: Nicolas Pitre
    Cc: Dave Mielke
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Pitre
     
  • commit 074f9dd55f9cab1b82690ed7e44bcf38b9616ce0 upstream.

    Currently the USB stack assumes that all host controller drivers are
    capable of receiving wakeup requests from downstream devices.
    However, this isn't true for the isp1760-hcd driver, which means that
    it isn't safe to do a runtime suspend of any device attached to a
    root-hub port if the device requires wakeup.

    This patch adds a "cant_recv_wakeups" flag to the usb_hcd structure
    and sets the flag in isp1760-hcd. The core is modified to prevent a
    direct child of the root hub from being put into runtime suspend with
    wakeup enabled if the flag is set.

    Signed-off-by: Alan Stern
    Tested-by: Nicolas Pitre
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     
  • commit 524134d422316a59d5464ccbc12036bbe90c5563 upstream.

    The USB stack provides a mechanism for drivers to request an
    asynchronous device reset (usb_queue_reset_device()). The mechanism
    uses a work item (reset_ws) embedded in the usb_interface structure
    used by the driver, and the reset is carried out by a work queue
    routine.

    The asynchronous reset can race with driver unbinding. When this
    happens, we try to cancel the queued reset before unbinding the
    driver, on the theory that the driver won't care about any resets once
    it is unbound.

    However, thanks to the fact that lockdep now tracks work queue
    accesses, this can provoke a lockdep warning in situations where the
    device reset causes another interface's driver to be unbound; see

    http://marc.info/?l=linux-usb&m=141893165203776&w=2

    for an example. The reason is that the work routine for reset_ws in
    one interface calls cancel_queued_work() for the reset_ws in another
    interface. Lockdep thinks this might lead to a work routine trying to
    cancel itself. The simplest solution is not to cancel queued resets
    when unbinding drivers.

    This means we now need to acquire a reference to the usb_interface
    when queuing a reset_ws work item and to drop the reference when the
    work routine finishes. We also need to make sure that the
    usb_interface structure doesn't outlive its parent usb_device; this
    means acquiring and dropping a reference when the interface is created
    and destroyed.

    In addition, cancelling a queued reset can fail (if the device is in
    the middle of an earlier reset), and this can cause usb_reset_device()
    to try to rebind an interface that has been deallocated (see
    http://marc.info/?l=linux-usb&m=142175717016628&w=2 for details).
    Acquiring the extra references prevents this failure.

    Signed-off-by: Alan Stern
    Reported-by: Russell King - ARM Linux
    Reported-by: Olivier Sobrie
    Tested-by: Olivier Sobrie
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     
  • commit 5efd2ea8c9f4f12916ffc8ba636792ce052f6911 upstream.

    the following error pops up during "testusb -a -t 10"
    | musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128, f134e000/be842000 (bad dma)
    hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of
    size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in
    hcd_buffer_alloc() returning memory which is 32 bytes aligned and it
    might by identified by buffer_offset() as another buffer. This means the
    buffer which is on a 32 byte boundary will not get freed, instead it
    tries to free another buffer with the error message.

    This patch fixes the issue by creating the smallest DMA buffer with the
    size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is
    smaller). This might be 32, 64 or even 128 bytes. The next three pools
    will have the size 128, 512 and 2048.
    In case the smallest pool is 128 bytes then we have only three pools
    instead of four (and zero the first entry in the array).
    The last pool size is always 2048 bytes which is the assumed PAGE_SIZE /
    2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where
    we would end up with 8KiB buffer in case we have 16KiB pages.
    Instead I think it makes sense to have a common size(s) and extend them
    if there is need to.
    There is a BUILD_BUG_ON() now in case someone has a minalign of more than
    128 bytes.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Alan Stern
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Andrzej Siewior
     
  • commit c99197902da284b4b723451c1471c45b18537cde upstream.

    The usb_hcd_unlink_urb() routine in hcd.c contains two possible
    use-after-free errors. The dev_dbg() statement at the end of the
    routine dereferences urb and urb->dev even though both structures may
    have been deallocated.

    This patch fixes the problem by storing urb->dev in a local variable
    (avoiding the dereference of urb) and moving the dev_dbg() up before
    the usb_put_dev() call.

    Signed-off-by: Alan Stern
    Reported-by: Joe Lawrence
    Tested-by: Joe Lawrence
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     
  • commit a6f0331236fa75afba14bbcf6668d42cebb55c43 upstream.

    Added the USB serial console device ID for Siemens Ruggedcom devices
    which have a USB port for their serial console.

    Signed-off-by: Len Sorensen
    Signed-off-by: Johan Hovold
    Signed-off-by: Greg Kroah-Hartman

    Lennart Sorensen
     
  • commit 663b7ee9517eec6deea9a48c7a1392a9a34f7809 upstream.

    We might enter the interrupt handler with hw_ready already set,
    but prior we actually started the reset flow.
    To soleve this we move the reset release from the interrupt handler
    to the HW start wait function which is part of the reset sequence.

    Signed-off-by: Alexander Usyskin
    Signed-off-by: Tomas Winkler
    Signed-off-by: Greg Kroah-Hartman

    Alexander Usyskin
     
  • commit 1ab1e79b9fd4b01331490bbe2e630a0fc0b25449 upstream.

    We should mask interrupt set bit when writing back
    hcsr value in reset bit clean-up.

    This is refinement for
    mei: clean reset bit before reset
    commit b13a65ef190e488e2761d65bdd2e1fe8a3a125f5

    Signed-off-by: Alexander Usyskin
    Signed-off-by: Tomas Winkler
    Signed-off-by: Greg Kroah-Hartman

    Alexander Usyskin
     
  • commit 6fbb9bdf0f3fbe23aeff806489791aa876adaffb upstream.

    -EDEFER error wasn't handle properly by atmel_serial_probe().
    As an example, when atmel_serial_probe() is called for the first time, we pass
    the test_and_set_bit() test to check whether the port has already been
    initalized. Then we call atmel_init_port(), which may return -EDEFER, possibly
    returned before by clk_get(). Consequently atmel_serial_probe() used to return
    this error code WITHOUT clearing the port bit in the "atmel_ports_in_use" mask.
    When atmel_serial_probe() was called for the second time, it used to fail on
    the test_and_set_bit() function then returning -EBUSY.

    When atmel_serial_probe() fails, this patch make it clear the port bit in the
    "atmel_ports_in_use" mask, if needed, before returning the error code.

    Signed-off-by: Cyrille Pitchen
    Acked-by: Nicolas Ferre
    Signed-off-by: Greg Kroah-Hartman

    Cyrille Pitchen
     
  • commit 37480a05685ed5b8e1b9bf5e5c53b5810258b149 upstream.

    Commit 26df6d13406d1a5 ("tty: Add EXTPROC support for LINEMODE")
    allows a process which has opened a pty master to send _any_ signal
    to the process group of the pty slave. Although potentially
    exploitable by a malicious program running a setuid program on
    a pty slave, it's unknown if this exploit currently exists.

    Limit to signals actually used.

    Cc: Theodore Ts'o
    Cc: Howard Chu
    Cc: One Thousand Gnomes
    Cc: Jiri Slaby
    Signed-off-by: Peter Hurley
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    Peter Hurley
     
  • commit 91117a20245b59f70b563523edbf998a62fc6383 upstream.

    The 'pfn' returned by axonram was completely bogus, and has been since
    2008.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Jan Kara
    Reviewed-by: Mathieu Desnoyers
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Matthew Wilcox
     
  • commit 6d1cff2a885850b78b40c34777b46cf5da5d1050 upstream.

    We hit use after free on dereferncing pointer to task_smack struct in
    smk_of_task() called from smack_task_to_inode().

    task_security() macro uses task_cred_xxx() to get pointer to the task_smack.
    task_cred_xxx() could be used only for non-pointer members of task's
    credentials. It cannot be used for pointer members since what they point
    to may disapper after dropping RCU read lock.

    Mainly task_security() used this way:
    smk_of_task(task_security(p))

    Intead of this introduce function smk_of_task_struct() which
    takes task_struct as argument and returns pointer to smk_known struct
    and do this under RCU read lock.
    Bogus task_security() macro is not used anymore, so remove it.

    KASan's report for this:

    AddressSanitizer: use after free in smack_task_to_inode+0x50/0x70 at addr c4635600
    =============================================================================
    BUG kmalloc-64 (Tainted: PO): kasan error
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in new_task_smack+0x44/0xd8 age=39 cpu=0 pid=1866
    kmem_cache_alloc_trace+0x88/0x1bc
    new_task_smack+0x44/0xd8
    smack_cred_prepare+0x48/0x21c
    security_prepare_creds+0x44/0x4c
    prepare_creds+0xdc/0x110
    smack_setprocattr+0x104/0x150
    security_setprocattr+0x4c/0x54
    proc_pid_attr_write+0x12c/0x194
    vfs_write+0x1b0/0x370
    SyS_write+0x5c/0x94
    ret_fast_syscall+0x0/0x48
    INFO: Freed in smack_cred_free+0xc4/0xd0 age=27 cpu=0 pid=1564
    kfree+0x270/0x290
    smack_cred_free+0xc4/0xd0
    security_cred_free+0x34/0x3c
    put_cred_rcu+0x58/0xcc
    rcu_process_callbacks+0x738/0x998
    __do_softirq+0x264/0x4cc
    do_softirq+0x94/0xf4
    irq_exit+0xbc/0x120
    handle_IRQ+0x104/0x134
    gic_handle_irq+0x70/0xac
    __irq_svc+0x44/0x78
    _raw_spin_unlock+0x18/0x48
    sync_inodes_sb+0x17c/0x1d8
    sync_filesystem+0xac/0xfc
    vdfs_file_fsync+0x90/0xc0
    vfs_fsync_range+0x74/0x7c
    INFO: Slab 0xd3b23f50 objects=32 used=31 fp=0xc4635600 flags=0x4080
    INFO: Object 0xc4635600 @offset=5632 fp=0x (null)

    Bytes b4 c46355f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Object c4635600: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object c4635610: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object c4635620: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object c4635630: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
    Redzone c4635640: bb bb bb bb ....
    Padding c46356e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Padding c46356f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
    CPU: 5 PID: 834 Comm: launchpad_prelo Tainted: PBO 3.10.30 #1
    Backtrace:
    [] (dump_backtrace+0x0/0x158) from [] (show_stack+0x20/0x24)
    r7:c4634010 r6:d3b23f50 r5:c4635600 r4:d1002140
    [] (show_stack+0x0/0x24) from [] (dump_stack+0x20/0x28)
    [] (dump_stack+0x0/0x28) from [] (print_trailer+0x124/0x144)
    [] (print_trailer+0x0/0x144) from [] (object_err+0x3c/0x44)
    r7:c4635600 r6:d1002140 r5:d3b23f50 r4:c4635600
    [] (object_err+0x0/0x44) from [] (kasan_report_error+0x2b8/0x538)
    r6:d1002140 r5:d3b23f50 r4:c6429cf8 r3:c09e1aa7
    [] (kasan_report_error+0x0/0x538) from [] (__asan_load4+0xd4/0xf8)
    [] (__asan_load4+0x0/0xf8) from [] (smack_task_to_inode+0x50/0x70)
    r5:c4635600 r4:ca9da000
    [] (smack_task_to_inode+0x0/0x70) from [] (security_task_to_inode+0x3c/0x44)
    r5:cca25e80 r4:c0ba9780
    [] (security_task_to_inode+0x0/0x44) from [] (pid_revalidate+0x124/0x178)
    r6:00000000 r5:cca25e80 r4:cbabe3c0 r3:00008124
    [] (pid_revalidate+0x0/0x178) from [] (lookup_fast+0x35c/0x43y4)
    r9:c6429efc r8:00000101 r7:c079d940 r6:c6429e90 r5:c6429ed8 r4:c83c4148
    [] (lookup_fast+0x0/0x434) from [] (do_last.isra.24+0x1c0/0x1108)
    [] (do_last.isra.24+0x0/0x1108) from [] (path_openat.isra.25+0xf4/0x648)
    [] (path_openat.isra.25+0x0/0x648) from [] (do_filp_open+0x3c/0x88)
    [] (do_filp_open+0x0/0x88) from [] (do_sys_open+0xf0/0x198)
    r7:00000001 r6:c0ea2180 r5:0000000b r4:00000000
    [] (do_sys_open+0x0/0x198) from [] (SyS_open+0x30/0x34)
    [] (SyS_open+0x0/0x34) from [] (ret_fast_syscall+0x0/0x48)
    Read of size 4 by thread T834:
    Memory state around the buggy address:
    c4635380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    c4635400: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
    c4635480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    c4635500: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
    c4635580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    >c4635600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    c4635680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    c4635700: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
    c4635780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    c4635800: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
    c4635880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ==================================================================

    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Greg Kroah-Hartman

    Andrey Ryabinin
     
  • commit 1e0d6714aceb770b04161fbedd7765d0e1fc27bd upstream.

    When an application connects to the ring buffer via splice, it can only
    read full pages. Splice does not work with partial pages. If there is
    not enough data to fill a page, the splice command will either block
    or return -EAGAIN (if set to nonblock).

    Code was added where if the page is not full, to just sleep again.
    The problem is, it will get woken up again on the next event. That
    is, when something is written into the ring buffer, if there is a waiter
    it will wake it up. The waiter would then check the buffer, see that
    it still does not have enough data to fill a page and go back to sleep.
    To make matters worse, when the waiter goes back to sleep, it could
    cause another event, which would wake it back up again to see it
    doesn't have enough data and sleep again. This produces a tremendous
    overhead and fills the ring buffer with noise.

    For example, recording sched_switch on an idle system for 10 seconds
    produces 25,350,475 events!!!

    Create another wait queue for those waiters wanting full pages.
    When an event is written, it only wakes up waiters if there's a full
    page of data. It does not wake up the waiter if the page is not yet
    full.

    After this change, recording sched_switch on an idle system for 10
    seconds produces only 800 events. Getting rid of 25,349,675 useless
    events (99.9969% of events!!), is something to take seriously.

    Cc: Rabin Vincent
    Fixes: e30f53aad220 "tracing: Do not busy wait in buffer splice"
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Red Hat)
     
  • commit 04f81f0154e4bf002be6f4d85668ce1257efa4d9 upstream.

    Using the IPCB() macro to get the IPv4 options is convenient, but
    unfortunately NetLabel often needs to examine the CIPSO option outside
    of the scope of the IP layer in the stack. While historically IPCB()
    worked above the IP layer, due to the inclusion of the inet_skb_param
    struct at the head of the {tcp,udp}_skb_cb structs, recent commit
    971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    reordered the tcp_skb_cb struct and invalidated this IPCB() trick.

    This patch fixes the problem by creating a new function,
    cipso_v4_optptr(), which locates the CIPSO option inside the IP header
    without calling IPCB(). Unfortunately, this isn't as fast as a simple
    lookup so some additional tweaks were made to limit the use of this
    new function.

    Reported-by: Casey Schaufler
    Signed-off-by: Paul Moore
    Tested-by: Casey Schaufler
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     
  • commit c6ce194325cef342313e3d27620411ce90a89c50 upstream.

    Hi,

    If you can manage to submit an async write as the first async I/O from
    the context of a process with realtime scheduling priority, then a
    cfq_queue is allocated, but filed into the wrong async_cfqq bucket. It
    ends up in the best effort array, but actually has realtime I/O
    scheduling priority set in cfqq->ioprio.

    The reason is that cfq_get_queue assumes the default scheduling class and
    priority when there is no information present (i.e. when the async cfqq
    is created):

    static struct cfq_queue *
    cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
    struct bio *bio, gfp_t gfp_mask)
    {
    const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
    const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);

    cic->ioprio starts out as 0, which is "invalid". So, class of 0
    (IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:

    async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);

    static struct cfq_queue **
    cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
    {
    switch (ioprio_class) {
    case IOPRIO_CLASS_RT:
    return &cfqd->async_cfqq[0][ioprio];
    case IOPRIO_CLASS_NONE:
    ioprio = IOPRIO_NORM;
    /* fall through */
    case IOPRIO_CLASS_BE:
    return &cfqd->async_cfqq[1][ioprio];
    case IOPRIO_CLASS_IDLE:
    return &cfqd->async_idle_cfqq;
    default:
    BUG();
    }
    }

    Here, instead of returning a class mapped from the process' scheduling
    priority, we get back the bucket associated with IOPRIO_CLASS_BE.

    Now, there is no queue allocated there yet, so we create it:

    cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);

    That function ends up doing this:

    cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
    cfq_init_prio_data(cfqq, cic);

    cfq_init_cfqq marks the priority as having changed. Then, cfq_init_prio
    data does this:

    ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
    switch (ioprio_class) {
    default:
    printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
    case IOPRIO_CLASS_NONE:
    /*
    * no prio set, inherit CPU scheduling settings
    */
    cfqq->ioprio = task_nice_ioprio(tsk);
    cfqq->ioprio_class = task_nice_ioclass(tsk);
    break;

    So we basically have two code paths that treat IOPRIO_CLASS_NONE
    differently, which results in an RT async cfqq filed into a best effort
    bucket.

    Attached is a patch which fixes the problem. I'm not sure how to make
    it cleaner. Suggestions would be welcome.

    Signed-off-by: Jeff Moyer
    Tested-by: Hidehiro Kawai
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jeff Moyer
     
  • commit 69abaffec7d47a083739b79e3066cb3730eba72e upstream.

    Cfq_lookup_create_cfqg() allocates struct blkcg_gq using GFP_ATOMIC.
    In cfq_find_alloc_queue() possible allocation failure is not handled.
    As a result kernel oopses on NULL pointer dereference when
    cfq_link_cfqq_cfqg() calls cfqg_get() for NULL pointer.

    Bug was introduced in v3.5 in commit cd1604fab4f9 ("blkcg: factor
    out blkio_group creation"). Prior to that commit cfq group lookup
    had returned pointer to root group as fallback.

    This patch handles this error using existing fallback oom_cfqq.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Tejun Heo
    Acked-by: Vivek Goyal
    Fixes: cd1604fab4f9 ("blkcg: factor out blkio_group creation")
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     
  • commit 3fd7b60f2c7418239d586e359e0c6d8503e10646 upstream.

    This patch drops legacy active_ts_list usage within iscsi_target_tq.c
    code. It was originally used to track the active thread sets during
    iscsi-target shutdown, and is no longer used by modern upstream code.

    Two people have reported list corruption using traditional iscsi-target
    and iser-target with the following backtrace, that appears to be related
    to iscsi_thread_set->ts_list being used across both active_ts_list and
    inactive_ts_list.

    [ 60.782534] ------------[ cut here ]------------
    [ 60.782543] WARNING: CPU: 0 PID: 9430 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
    [ 60.782545] list_del corruption, ffff88045b00d180->next is LIST_POISON1 (dead000000100100)
    [ 60.782546] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt ib_isert rdma_cm iw_cm ib_addr iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod configfs ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac edac_core sg i2c_i801 lpc_ich mfd_core mtip32xx igb i2c_algo_bit i2c_core ptp pps_core ioatdma dca wmi ext3(F) jbd(F) mbcache(F) sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci(F) libahci(F) isci(F) libsas(F) scsi_transport_sas(F) [last unloaded: speedstep_lib]
    [ 60.782597] CPU: 0 PID: 9430 Comm: iscsi_ttx Tainted: GF 3.12.19+ #2
    [ 60.782598] Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
    [ 60.782599] 0000000000000035 ffff88044de31d08 ffffffff81553ae7 0000000000000035
    [ 60.782602] ffff88044de31d58 ffff88044de31d48 ffffffff8104d1cc 0000000000000002
    [ 60.782605] ffff88045b00d180 ffff88045b00d0c0 ffff88045b00d0c0 ffff88044de31e58
    [ 60.782607] Call Trace:
    [ 60.782611] [] dump_stack+0x49/0x62
    [ 60.782615] [] warn_slowpath_common+0x8c/0xc0
    [ 60.782618] [] warn_slowpath_fmt+0x46/0x50
    [ 60.782620] [] __list_del_entry+0x63/0xd0
    [ 60.782622] [] list_del+0x11/0x40
    [ 60.782630] [] iscsi_del_ts_from_active_list+0x29/0x50 [iscsi_target_mod]
    [ 60.782635] [] iscsi_tx_thread_pre_handler+0xa1/0x180 [iscsi_target_mod]
    [ 60.782642] [] iscsi_target_tx_thread+0x4e/0x220 [iscsi_target_mod]
    [ 60.782647] [] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
    [ 60.782652] [] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
    [ 60.782655] [] kthread+0xce/0xe0
    [ 60.782657] [] ? kthread_freezable_should_stop+0x70/0x70
    [ 60.782660] [] ret_from_fork+0x7c/0xb0
    [ 60.782662] [] ? kthread_freezable_should_stop+0x70/0x70
    [ 60.782663] ---[ end trace 9662f4a661d33965 ]---

    Since this code is no longer used, go ahead and drop the problematic usage
    all-together.

    Reported-by: Gavin Guo
    Reported-by: Moussa Ba
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Nicholas Bellinger
     
  • commit 7772855a996ec6e16944b120ab5ce21050279821 upstream.

    With scsi-mq enabled, userspace programs can get unexpected EWOULDBLOCK
    (a.k.a. EAGAIN) errors when submitting commands to the SCSI generic
    driver. Fix by calling blk_get_request() with GFP_KERNEL instead of
    GFP_ATOMIC.

    Note: to avoid introducing a potential deadlock, this patch should be
    applied after the patch titled "sg: fix unkillable I/O wait deadlock
    with scsi-mq".

    Signed-off-by: Tony Battersby
    Acked-by: Douglas Gilbert
    Tested-by: Douglas Gilbert
    Signed-off-by: James Bottomley
    Signed-off-by: Greg Kroah-Hartman

    Tony Battersby
     
  • commit 7568615c1054907ea8c7701ab86dad51aa099888 upstream.

    When using the write()/read() interface for submitting commands, the
    SCSI generic driver does not call blk_put_request() on a completed SCSI
    command until userspace calls read() to get the command completion.
    Since scsi-mq uses a fixed number of preallocated requests, this makes
    it possible for userspace to exhaust the entire preallocated supply of
    requests. For places in the kernel that call blk_get_request() with
    GFP_KERNEL, this can cause the calling process to deadlock in a
    permanent unkillable I/O wait in blk_get_request() -> ... -> bt_get().
    For places in the kernel that call blk_get_request() with GFP_ATOMIC,
    this can cause blk_get_request() always to return -EWOULDBLOCK. Note
    that these problems happen only if scsi-mq is enabled. Prevent the
    problems by calling blk_put_request() as soon as the SCSI command
    completes instead of waiting for userspace to call read().

    Signed-off-by: Tony Battersby
    Acked-by: Douglas Gilbert
    Tested-by: Douglas Gilbert
    Signed-off-by: James Bottomley
    Signed-off-by: Greg Kroah-Hartman

    Tony Battersby
     
  • commit d8ba1f971497c19cf80da1ea5391a46a5f9fbd41 upstream.

    If the call to decode_rc_list() fails due to a memory allocation error,
    then we need to truncate the array size to ensure that we only call
    kfree() on those pointer that were allocated.

    Reported-by: David Ramos
    Fixes: 4aece6a19cf7f ("nfs41: cb_sequence xdr implementation")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit ea7c38fef0b774a5dc16fb0ca5935f0ae8568176 upstream.

    If we have to do a return-on-close in the delegreturn code, then
    we must ensure that the inode and super block remain referenced.

    Cc: Peng Tao
    Signed-off-by: Trond Myklebust
    Reviewed-by: Peng Tao
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 03a9a42a1a7e5b3e7919ddfacc1d1cc81882a955 upstream.

    Fix an Oopsable condition when nsm_mon_unmon is called as part of the
    namespace cleanup, which now apparently happens after the utsname
    has been freed.

    Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home
    Reported-by: Bruno Prémont
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit cb5d04bc39e914124e811ea55f3034d2379a5f6c upstream.

    With pgio refactoring in v3.15, .init_read and .init_write can be
    called with valid pgio->pg_lseg. file layout was fixed at that time
    by commit c6194271f (pnfs: filelayout: support non page aligned
    layouts). But the generic helper still needs to be fixed.

    Signed-off-by: Peng Tao
    Signed-off-by: Greg Kroah-Hartman

    Peng Tao