07 Mar, 2015
40 commits
-
commit 428d53be5e7468769d4e7899cca06ed5f783a6e1 upstream.
We have to delete the allocated interrupt info if __inject_vm() fails.
Otherwise user space can keep flooding kvm with floating interrupts and
provoke more and more memory leaks.Reported-by: Dominik Dingel
Reviewed-by: Dominik Dingel
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
Signed-off-by: Greg Kroah-Hartman -
commit 8e2207cdd087ebb031e9118d1fd0902c6533a5e5 upstream.
If a vm with no VCPUs is created, the injection of a floating irq
leads to an endless loop in the kernel.Let's skip the search for a destination VCPU for a floating irq if no
VCPUs were created.Reviewed-by: Dominik Dingel
Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
Signed-off-by: Greg Kroah-Hartman -
commit 0ac96caf0f9381088c673a16d910b1d329670edf upstream.
The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.This patch changes our hrtimer to be based on a monotonic clock.
Signed-off-by: David Hildenbrand
Acked-by: Cornelia Huck
Signed-off-by: Christian Borntraeger
Signed-off-by: Greg Kroah-Hartman -
commit 2d00f759427bb3ed963b60f570830e9eca7e1c69 upstream.
Patch 0759d0681cae ("KVM: s390: cleanup handle_wait by reusing
kvm_vcpu_block") changed the way pending guest clock comparator
interrupts are detected. It was assumed that as soon as the hrtimer
wakes up, the condition for the guest ckc is satisfied.This is however only true as long as adjclock() doesn't speed
up the monotonic clock. Reason is that the hrtimer is based on
CLOCK_MONOTONIC, the guest clock comparator detection is based
on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
TOD clock, the hrtimer wakes the target VCPU up too early and
the target VCPU will not detect any pending interrupts, therefore
going back to sleep. It will never be woken up again because the
hrtimer has finished. The VCPU is stuck.As a quick fix, we have to forward the hrtimer until the guest
clock comparator is really due, to guarantee properly timed wake
ups.As the hrtimer callback might be triggered on another cpu, we
have to make sure that the timer is really stopped and not currently
executing the callback on another cpu. This can happen if the vcpu
thread is scheduled onto another physical cpu, but the timer base
is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.A proper fix might be to introduce a RAW based hrtimer.
Reported-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Acked-by: Cornelia Huck
Signed-off-by: Christian Borntraeger
Signed-off-by: Greg Kroah-Hartman -
commit 7f187922ddf6b67f2999a76dcb71663097b75497 upstream.
When the guest writes to the TSC, the masterclock TSC copy must be
updated as well along with the TSC_OFFSET update, otherwise a negative
tsc_timestamp is calculated at kvm_guest_time_update.Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
"if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
becomes redundant, so remove the do_request boolean and collapse
everything into a single condition.Signed-off-by: Marcelo Tosatti
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman -
commit 23b133bdc452aa441fcb9b82cbf6dd05cfd342d0 upstream.
Check length of extended attributes and allocation descriptors when
loading inodes from disk. Otherwise corrupted filesystems could confuse
the code and make the kernel oops.Reported-by: Carl Henrik Lunde
Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman -
commit 79144954278d4bb5989f8b903adcac7a20ff2a5a upstream.
Store blocksize in a local variable in udf_fill_inode() since it is used
a lot of times.Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman -
commit ed4cbc81addbc076b016c5b979fd1a02f0897f0a upstream.
activate_mm() and switch_mm() call get_new_mmu_context() which in turn
can enable the HTW before the entryhi is changed with the new ASID.
Since the latter will enable the HTW in local_flush_tlb_all(),
then there is a small timing window where the HTW is running with the
new ASID but with an old pgd since the TLBMISS_HANDLER_SETUP_PGD
hasn't assigned a new one yet. In order to prevent that, we introduce a
simple htw counter to avoid starting HTW accidentally due to nested
htw_{start,stop}() sequences. Moreover, since various IPI calls can
enforce TLB flushing operations on a different core, such an operation
may interrupt another htw_{stop,start} in progress leading inconsistent
updates of the htw_seq variable. In order to avoid that, we disable the
interrupts whenever we update that variable.Signed-off-by: Markos Chandras
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9118/
Signed-off-by: Ralf Baechle
Signed-off-by: Greg Kroah-Hartman -
commit 06f34e1c28f3608b0ce5b310e41102d3fe7b65a1 upstream.
We used to calculate page address differently in 2 cases:
1. In virt_to_page(x) we do
--->8---
mem_map + (x - CONFIG_LINUX_LINK_BASE) >> PAGE_SHIFT
--->8---2. In in pte_page(x) we do
--->8---
mem_map + (pte_val(x) - PAGE_OFFSET) >> PAGE_SHIFT
--->8---That leads to problems in case PAGE_OFFSET != CONFIG_LINUX_LINK_BASE -
different pages will be selected depending on where and how we calculate
page address.In particular in the STAR 9000853582 when gdb attempted to read memory
of another process it got improper page in get_user_pages() because this
is exactly one of the places where we search for a page by pte_page().The fix is trivial - we need to calculate page address similarly in both
cases.Signed-off-by: Alexey Brodkin
Signed-off-by: Vineet Gupta
Signed-off-by: Greg Kroah-Hartman -
commit 5f1437f61a0b351d25b528c159360da3d5e8c77b upstream.
When the UART is in DMA receive mode (RDMAS set) and one character
just arrived while another interrupt is handled (e.g. TX), the RDRF
(receiver data register full flag) is set due to the water level of
1. But since the DMA will take care of this character, there is no
need to handle it by calling lpuart_prepare_rx. Handling it leads to
adding the RX timeout timer twice:[ 74.336698] Kernel BUG at 80053070 [verbose debug info unavailable]
[ 74.342999] Internal error: Oops - BUG: 0 [#1] ARM0:00.00 khungtaskd
[ 74.347817] Modules linked in: 0 S 0.0 0.0 0:00.00 writeback
[ 74.350926] CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00001-g39d78e2 #1788
[ 74.358617] Hardware name: Freescale Vybrid VF610 (Device Tree)t
[ 74.364563] task: 807a7678 ti: 8079c000 task.ti: 8079c000 kblockd
[ 74.370002] PC is at add_timer+0x24/0x28.0 0.0 0:00.09 kworker/u2:1
[ 74.373960] LR is at lpuart_int+0x15c/0x3d8
[ 74.378171] pc : [] lr : [] psr: a0010193
[ 74.378171] sp : 8079de10 ip : 8079de20 fp : 8079de1c
[ 74.389694] r10: 807d44c0 r9 : 8688c300 r8 : 00000013
[ 74.394943] r7 : 20010193 r6 : 00000000 r5 : 000000a0 r4 : 86997210
[ 74.401498] r3 : ffffa7da r2 : 80817868 r1 : 86997210 r0 : 86997344
[ 74.408052] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 74.415489] Control: 10c5387d Table: 8611c059 DAC: 00000015
[ 74.421265] Process swapper (pid: 0, stack limit = 0x8079c230)
...Solve this by only execute the receiver path (lpuart_prepare_rx) if
the DMA receive mode (RDMAS) is not set. Also, make sure the flag is
cleared on initialization, in case it has been left set.This can be best reproduced using UART as a serial console, then
running top while dd'ing data into the terminal.Signed-off-by: Stefan Agner
Signed-off-by: Greg Kroah-Hartman -
commit 4a8588a1cf867333187d9ff071e6fbdab587d194 upstream.
If the serial port gets closed while a RX transfer is in progress,
the timer might fire after the serial port shutdown finished. This
leads in a NULL pointer dereference:[ 7.508324] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 7.516590] pgd = 86348000
[ 7.519445] [00000000] *pgd=86179831, *pte=00000000, *ppte=00000000
[ 7.526145] Internal error: Oops: 17 [#1] ARM
[ 7.530611] Modules linked in:
[ 7.533876] CPU: 0 PID: 123 Comm: systemd Not tainted 3.19.0-rc3-00004-g5b11ea7 #1778
[ 7.541827] Hardware name: Freescale Vybrid VF610 (Device Tree)
[ 7.547862] task: 861c3400 ti: 86ac8000 task.ti: 86ac8000
[ 7.553392] PC is at lpuart_timer_func+0x24/0xf8
[ 7.558127] LR is at lpuart_timer_func+0x20/0xf8
[ 7.562857] pc : [] lr : [] psr: 600b0113
[ 7.562857] sp : 86ac9b90 ip : 86ac9b90 fp : 86ac9bbc
[ 7.574467] r10: 80817180 r9 : 80817b98 r8 : 80817998
[ 7.579803] r7 : 807acee0 r6 : 86989000 r5 : 00000100 r4 : 86997210
[ 7.586444] r3 : 86ac8000 r2 : 86ac9bc0 r1 : 86997210 r0 : 00000000
[ 7.593085] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 7.600341] Control: 10c5387d Table: 86348059 DAC: 00000015
[ 7.606203] Process systemd (pid: 123, stack limit = 0x86ac8230)Setup the timer on UART startup which allows to delete the timer
unconditionally on shutdown. This also saves the initialization
on each transfer.Signed-off-by: Stefan Agner
Signed-off-by: Greg Kroah-Hartman -
commit 29183a70b0b828500816bd794b3fe192fce89f73 upstream.
Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values)Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.See bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=92481
https://bugzilla.redhat.com/show_bug.cgi?id=1188074This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.Reported-by: Josh Boyer
Reported-by: George Joseph
Tested-by: George Joseph
Signed-off-by: John Stultz
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Sasha Levin
Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 146755923262037fc4c54abc28c04b1103f3cc51 upstream.
The output of KDB 'summary' command should report MemTotal, MemFree
and Buffers output in kB. Current codes report in unit of pages.A define of K(x) as
is defined in the code, but not used.This patch would apply the define to convert the values to kB.
Please include me on Cc on replies. I do not subscribe to linux-kernel.Signed-off-by: Jay Lan
Signed-off-by: Jason Wessel
Signed-off-by: Greg Kroah-Hartman -
commit 165235180ff61f0012ea68a299e46daec43dcaa7 upstream.
mvebu_armada375_smp_wa_init is only used on armada 375 but is defined
for all mvebu machines. As it calls a function that is only provided
sometimes, this can result in a link error:arch/arm/mach-mvebu/built-in.o: In function `mvebu_armada375_smp_wa_init':
:(.text+0x228): undefined reference to `mvebu_setup_boot_addr_wa'To solve this, we can just change the existing #ifdef around the
function to also check for Armada375 SMP platforms.Signed-off-by: Arnd Bergmann
Fixes: 305969fb6292 ("ARM: mvebu: use the common function for Armada 375 SMP workaround")
Cc: Andrew Lunn
Cc: Jason Cooper
Cc: Gregory Clement
Signed-off-by: Greg Kroah-Hartman -
commit 95fcedb027a27f32bf2434f9271635c380e57fb5 upstream.
The vexpress tc2 power management code calls mcpm_loopback, which
is only available if ARM_CPU_SUSPEND is enabled, otherwise we
get a link error:arch/arm/mach-vexpress/built-in.o: In function `tc2_pm_init':
arch/arm/mach-vexpress/tc2_pm.c:389: undefined reference to `mcpm_loopback'This explicitly selects ARM_CPU_SUSPEND like other platforms that
need it.Signed-off-by: Arnd Bergmann
Fixes: 3592d7e002438 ("ARM: 8082/1: TC2: test the MCPM loopback during boot")
Acked-by: Nicolas Pitre
Acked-by: Liviu Dudau
Cc: Kevin Hilman
Cc: Sudeep Holla
Cc: Lorenzo Pieralisi
Signed-off-by: Greg Kroah-Hartman -
commit 9bc78f32c2e430aebf6def965b316aa95e37a20c upstream.
Add regulator_has_full_constraints() call to poodle board file to let
regulator core know that we do not have any additional regulators left.
This lets it substitute unprovided regulators with dummy ones.This fixes the following warnings that can be seen on poodle if
regulators are enabled:ads7846 spi1.0: unable to get regulator: -517
spi spi1.0: Driver ads7846 requests probe deferral
wm8731 0-001b: Failed to get supply 'AVDD': -517
wm8731 0-001b: Failed to request supplies: -517
wm8731 0-001b: ASoC: failed to probe component -517Signed-off-by: Dmitry Eremin-Solenikov
Acked-by: Mark Brown
Signed-off-by: Robert Jarzmik
Signed-off-by: Greg Kroah-Hartman -
commit 271e80176aae4e5b481f4bb92df9768c6075bbca upstream.
Add regulator_has_full_constraints() call to corgi board file to let
regulator core know that we do not have any additional regulators left.
This lets it substitute unprovided regulators with dummy ones.This fixes the following warnings that can be seen on corgi if
regulators are enabled:ads7846 spi1.0: unable to get regulator: -517
spi spi1.0: Driver ads7846 requests probe deferral
wm8731 0-001b: Failed to get supply 'AVDD': -517
wm8731 0-001b: Failed to request supplies: -517
wm8731 0-001b: ASoC: failed to probe component -517
corgi-audio corgi-audio: ASoC: failed to instantiate card -517Signed-off-by: Dmitry Eremin-Solenikov
Acked-by: Mark Brown
Signed-off-by: Robert Jarzmik
Signed-off-by: Greg Kroah-Hartman -
commit 19e3ae6b4f07a87822c1c9e7ed99d31860e701af upstream.
The vcs device's poll/fasync support relies on the vt notifier to signal
changes to the screen content. Notifier invocations were missing for
changes that comes through the selection interface though. Fix that.Tested with BRLTTY 5.2.
Signed-off-by: Nicolas Pitre
Cc: Dave Mielke
Signed-off-by: Greg Kroah-Hartman -
commit 074f9dd55f9cab1b82690ed7e44bcf38b9616ce0 upstream.
Currently the USB stack assumes that all host controller drivers are
capable of receiving wakeup requests from downstream devices.
However, this isn't true for the isp1760-hcd driver, which means that
it isn't safe to do a runtime suspend of any device attached to a
root-hub port if the device requires wakeup.This patch adds a "cant_recv_wakeups" flag to the usb_hcd structure
and sets the flag in isp1760-hcd. The core is modified to prevent a
direct child of the root hub from being put into runtime suspend with
wakeup enabled if the flag is set.Signed-off-by: Alan Stern
Tested-by: Nicolas Pitre
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman -
commit 524134d422316a59d5464ccbc12036bbe90c5563 upstream.
The USB stack provides a mechanism for drivers to request an
asynchronous device reset (usb_queue_reset_device()). The mechanism
uses a work item (reset_ws) embedded in the usb_interface structure
used by the driver, and the reset is carried out by a work queue
routine.The asynchronous reset can race with driver unbinding. When this
happens, we try to cancel the queued reset before unbinding the
driver, on the theory that the driver won't care about any resets once
it is unbound.However, thanks to the fact that lockdep now tracks work queue
accesses, this can provoke a lockdep warning in situations where the
device reset causes another interface's driver to be unbound; seehttp://marc.info/?l=linux-usb&m=141893165203776&w=2
for an example. The reason is that the work routine for reset_ws in
one interface calls cancel_queued_work() for the reset_ws in another
interface. Lockdep thinks this might lead to a work routine trying to
cancel itself. The simplest solution is not to cancel queued resets
when unbinding drivers.This means we now need to acquire a reference to the usb_interface
when queuing a reset_ws work item and to drop the reference when the
work routine finishes. We also need to make sure that the
usb_interface structure doesn't outlive its parent usb_device; this
means acquiring and dropping a reference when the interface is created
and destroyed.In addition, cancelling a queued reset can fail (if the device is in
the middle of an earlier reset), and this can cause usb_reset_device()
to try to rebind an interface that has been deallocated (see
http://marc.info/?l=linux-usb&m=142175717016628&w=2 for details).
Acquiring the extra references prevents this failure.Signed-off-by: Alan Stern
Reported-by: Russell King - ARM Linux
Reported-by: Olivier Sobrie
Tested-by: Olivier Sobrie
Signed-off-by: Greg Kroah-Hartman -
commit 5efd2ea8c9f4f12916ffc8ba636792ce052f6911 upstream.
the following error pops up during "testusb -a -t 10"
| musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128, f134e000/be842000 (bad dma)
hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of
size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in
hcd_buffer_alloc() returning memory which is 32 bytes aligned and it
might by identified by buffer_offset() as another buffer. This means the
buffer which is on a 32 byte boundary will not get freed, instead it
tries to free another buffer with the error message.This patch fixes the issue by creating the smallest DMA buffer with the
size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is
smaller). This might be 32, 64 or even 128 bytes. The next three pools
will have the size 128, 512 and 2048.
In case the smallest pool is 128 bytes then we have only three pools
instead of four (and zero the first entry in the array).
The last pool size is always 2048 bytes which is the assumed PAGE_SIZE /
2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where
we would end up with 8KiB buffer in case we have 16KiB pages.
Instead I think it makes sense to have a common size(s) and extend them
if there is need to.
There is a BUILD_BUG_ON() now in case someone has a minalign of more than
128 bytes.Signed-off-by: Sebastian Andrzej Siewior
Acked-by: Alan Stern
Signed-off-by: Greg Kroah-Hartman -
commit c99197902da284b4b723451c1471c45b18537cde upstream.
The usb_hcd_unlink_urb() routine in hcd.c contains two possible
use-after-free errors. The dev_dbg() statement at the end of the
routine dereferences urb and urb->dev even though both structures may
have been deallocated.This patch fixes the problem by storing urb->dev in a local variable
(avoiding the dereference of urb) and moving the dev_dbg() up before
the usb_put_dev() call.Signed-off-by: Alan Stern
Reported-by: Joe Lawrence
Tested-by: Joe Lawrence
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman -
commit a6f0331236fa75afba14bbcf6668d42cebb55c43 upstream.
Added the USB serial console device ID for Siemens Ruggedcom devices
which have a USB port for their serial console.Signed-off-by: Len Sorensen
Signed-off-by: Johan Hovold
Signed-off-by: Greg Kroah-Hartman -
commit 663b7ee9517eec6deea9a48c7a1392a9a34f7809 upstream.
We might enter the interrupt handler with hw_ready already set,
but prior we actually started the reset flow.
To soleve this we move the reset release from the interrupt handler
to the HW start wait function which is part of the reset sequence.Signed-off-by: Alexander Usyskin
Signed-off-by: Tomas Winkler
Signed-off-by: Greg Kroah-Hartman -
commit 1ab1e79b9fd4b01331490bbe2e630a0fc0b25449 upstream.
We should mask interrupt set bit when writing back
hcsr value in reset bit clean-up.This is refinement for
mei: clean reset bit before reset
commit b13a65ef190e488e2761d65bdd2e1fe8a3a125f5Signed-off-by: Alexander Usyskin
Signed-off-by: Tomas Winkler
Signed-off-by: Greg Kroah-Hartman -
commit 6fbb9bdf0f3fbe23aeff806489791aa876adaffb upstream.
-EDEFER error wasn't handle properly by atmel_serial_probe().
As an example, when atmel_serial_probe() is called for the first time, we pass
the test_and_set_bit() test to check whether the port has already been
initalized. Then we call atmel_init_port(), which may return -EDEFER, possibly
returned before by clk_get(). Consequently atmel_serial_probe() used to return
this error code WITHOUT clearing the port bit in the "atmel_ports_in_use" mask.
When atmel_serial_probe() was called for the second time, it used to fail on
the test_and_set_bit() function then returning -EBUSY.When atmel_serial_probe() fails, this patch make it clear the port bit in the
"atmel_ports_in_use" mask, if needed, before returning the error code.Signed-off-by: Cyrille Pitchen
Acked-by: Nicolas Ferre
Signed-off-by: Greg Kroah-Hartman -
commit 37480a05685ed5b8e1b9bf5e5c53b5810258b149 upstream.
Commit 26df6d13406d1a5 ("tty: Add EXTPROC support for LINEMODE")
allows a process which has opened a pty master to send _any_ signal
to the process group of the pty slave. Although potentially
exploitable by a malicious program running a setuid program on
a pty slave, it's unknown if this exploit currently exists.Limit to signals actually used.
Cc: Theodore Ts'o
Cc: Howard Chu
Cc: One Thousand Gnomes
Cc: Jiri Slaby
Signed-off-by: Peter Hurley
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman -
commit 91117a20245b59f70b563523edbf998a62fc6383 upstream.
The 'pfn' returned by axonram was completely bogus, and has been since
2008.Signed-off-by: Matthew Wilcox
Reviewed-by: Jan Kara
Reviewed-by: Mathieu Desnoyers
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 6d1cff2a885850b78b40c34777b46cf5da5d1050 upstream.
We hit use after free on dereferncing pointer to task_smack struct in
smk_of_task() called from smack_task_to_inode().task_security() macro uses task_cred_xxx() to get pointer to the task_smack.
task_cred_xxx() could be used only for non-pointer members of task's
credentials. It cannot be used for pointer members since what they point
to may disapper after dropping RCU read lock.Mainly task_security() used this way:
smk_of_task(task_security(p))Intead of this introduce function smk_of_task_struct() which
takes task_struct as argument and returns pointer to smk_known struct
and do this under RCU read lock.
Bogus task_security() macro is not used anymore, so remove it.KASan's report for this:
AddressSanitizer: use after free in smack_task_to_inode+0x50/0x70 at addr c4635600
=============================================================================
BUG kmalloc-64 (Tainted: PO): kasan error
-----------------------------------------------------------------------------Disabling lock debugging due to kernel taint
INFO: Allocated in new_task_smack+0x44/0xd8 age=39 cpu=0 pid=1866
kmem_cache_alloc_trace+0x88/0x1bc
new_task_smack+0x44/0xd8
smack_cred_prepare+0x48/0x21c
security_prepare_creds+0x44/0x4c
prepare_creds+0xdc/0x110
smack_setprocattr+0x104/0x150
security_setprocattr+0x4c/0x54
proc_pid_attr_write+0x12c/0x194
vfs_write+0x1b0/0x370
SyS_write+0x5c/0x94
ret_fast_syscall+0x0/0x48
INFO: Freed in smack_cred_free+0xc4/0xd0 age=27 cpu=0 pid=1564
kfree+0x270/0x290
smack_cred_free+0xc4/0xd0
security_cred_free+0x34/0x3c
put_cred_rcu+0x58/0xcc
rcu_process_callbacks+0x738/0x998
__do_softirq+0x264/0x4cc
do_softirq+0x94/0xf4
irq_exit+0xbc/0x120
handle_IRQ+0x104/0x134
gic_handle_irq+0x70/0xac
__irq_svc+0x44/0x78
_raw_spin_unlock+0x18/0x48
sync_inodes_sb+0x17c/0x1d8
sync_filesystem+0xac/0xfc
vdfs_file_fsync+0x90/0xc0
vfs_fsync_range+0x74/0x7c
INFO: Slab 0xd3b23f50 objects=32 used=31 fp=0xc4635600 flags=0x4080
INFO: Object 0xc4635600 @offset=5632 fp=0x (null)Bytes b4 c46355f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object c4635600: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635610: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635620: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635630: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone c4635640: bb bb bb bb ....
Padding c46356e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c46356f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 5 PID: 834 Comm: launchpad_prelo Tainted: PBO 3.10.30 #1
Backtrace:
[] (dump_backtrace+0x0/0x158) from [] (show_stack+0x20/0x24)
r7:c4634010 r6:d3b23f50 r5:c4635600 r4:d1002140
[] (show_stack+0x0/0x24) from [] (dump_stack+0x20/0x28)
[] (dump_stack+0x0/0x28) from [] (print_trailer+0x124/0x144)
[] (print_trailer+0x0/0x144) from [] (object_err+0x3c/0x44)
r7:c4635600 r6:d1002140 r5:d3b23f50 r4:c4635600
[] (object_err+0x0/0x44) from [] (kasan_report_error+0x2b8/0x538)
r6:d1002140 r5:d3b23f50 r4:c6429cf8 r3:c09e1aa7
[] (kasan_report_error+0x0/0x538) from [] (__asan_load4+0xd4/0xf8)
[] (__asan_load4+0x0/0xf8) from [] (smack_task_to_inode+0x50/0x70)
r5:c4635600 r4:ca9da000
[] (smack_task_to_inode+0x0/0x70) from [] (security_task_to_inode+0x3c/0x44)
r5:cca25e80 r4:c0ba9780
[] (security_task_to_inode+0x0/0x44) from [] (pid_revalidate+0x124/0x178)
r6:00000000 r5:cca25e80 r4:cbabe3c0 r3:00008124
[] (pid_revalidate+0x0/0x178) from [] (lookup_fast+0x35c/0x43y4)
r9:c6429efc r8:00000101 r7:c079d940 r6:c6429e90 r5:c6429ed8 r4:c83c4148
[] (lookup_fast+0x0/0x434) from [] (do_last.isra.24+0x1c0/0x1108)
[] (do_last.isra.24+0x0/0x1108) from [] (path_openat.isra.25+0xf4/0x648)
[] (path_openat.isra.25+0x0/0x648) from [] (do_filp_open+0x3c/0x88)
[] (do_filp_open+0x0/0x88) from [] (do_sys_open+0xf0/0x198)
r7:00000001 r6:c0ea2180 r5:0000000b r4:00000000
[] (do_sys_open+0x0/0x198) from [] (SyS_open+0x30/0x34)
[] (SyS_open+0x0/0x34) from [] (ret_fast_syscall+0x0/0x48)
Read of size 4 by thread T834:
Memory state around the buggy address:
c4635380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635400: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
c4635480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635500: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
c4635580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>c4635600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
c4635680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
c4635700: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
c4635780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635800: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
c4635880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================Signed-off-by: Andrey Ryabinin
Signed-off-by: Greg Kroah-Hartman -
commit 1e0d6714aceb770b04161fbedd7765d0e1fc27bd upstream.
When an application connects to the ring buffer via splice, it can only
read full pages. Splice does not work with partial pages. If there is
not enough data to fill a page, the splice command will either block
or return -EAGAIN (if set to nonblock).Code was added where if the page is not full, to just sleep again.
The problem is, it will get woken up again on the next event. That
is, when something is written into the ring buffer, if there is a waiter
it will wake it up. The waiter would then check the buffer, see that
it still does not have enough data to fill a page and go back to sleep.
To make matters worse, when the waiter goes back to sleep, it could
cause another event, which would wake it back up again to see it
doesn't have enough data and sleep again. This produces a tremendous
overhead and fills the ring buffer with noise.For example, recording sched_switch on an idle system for 10 seconds
produces 25,350,475 events!!!Create another wait queue for those waiters wanting full pages.
When an event is written, it only wakes up waiters if there's a full
page of data. It does not wake up the waiter if the page is not yet
full.After this change, recording sched_switch on an idle system for 10
seconds produces only 800 events. Getting rid of 25,349,675 useless
events (99.9969% of events!!), is something to take seriously.Cc: Rabin Vincent
Fixes: e30f53aad220 "tracing: Do not busy wait in buffer splice"
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman -
commit 04f81f0154e4bf002be6f4d85668ce1257efa4d9 upstream.
Using the IPCB() macro to get the IPv4 options is convenient, but
unfortunately NetLabel often needs to examine the CIPSO option outside
of the scope of the IP layer in the stack. While historically IPCB()
worked above the IP layer, due to the inclusion of the inet_skb_param
struct at the head of the {tcp,udp}_skb_cb structs, recent commit
971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
reordered the tcp_skb_cb struct and invalidated this IPCB() trick.This patch fixes the problem by creating a new function,
cipso_v4_optptr(), which locates the CIPSO option inside the IP header
without calling IPCB(). Unfortunately, this isn't as fast as a simple
lookup so some additional tweaks were made to limit the use of this
new function.Reported-by: Casey Schaufler
Signed-off-by: Paul Moore
Tested-by: Casey Schaufler
Signed-off-by: Greg Kroah-Hartman -
commit c6ce194325cef342313e3d27620411ce90a89c50 upstream.
Hi,
If you can manage to submit an async write as the first async I/O from
the context of a process with realtime scheduling priority, then a
cfq_queue is allocated, but filed into the wrong async_cfqq bucket. It
ends up in the best effort array, but actually has realtime I/O
scheduling priority set in cfqq->ioprio.The reason is that cfq_get_queue assumes the default scheduling class and
priority when there is no information present (i.e. when the async cfqq
is created):static struct cfq_queue *
cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
struct bio *bio, gfp_t gfp_mask)
{
const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);cic->ioprio starts out as 0, which is "invalid". So, class of 0
(IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
static struct cfq_queue **
cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
{
switch (ioprio_class) {
case IOPRIO_CLASS_RT:
return &cfqd->async_cfqq[0][ioprio];
case IOPRIO_CLASS_NONE:
ioprio = IOPRIO_NORM;
/* fall through */
case IOPRIO_CLASS_BE:
return &cfqd->async_cfqq[1][ioprio];
case IOPRIO_CLASS_IDLE:
return &cfqd->async_idle_cfqq;
default:
BUG();
}
}Here, instead of returning a class mapped from the process' scheduling
priority, we get back the bucket associated with IOPRIO_CLASS_BE.Now, there is no queue allocated there yet, so we create it:
cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
That function ends up doing this:
cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
cfq_init_prio_data(cfqq, cic);cfq_init_cfqq marks the priority as having changed. Then, cfq_init_prio
data does this:ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
switch (ioprio_class) {
default:
printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
case IOPRIO_CLASS_NONE:
/*
* no prio set, inherit CPU scheduling settings
*/
cfqq->ioprio = task_nice_ioprio(tsk);
cfqq->ioprio_class = task_nice_ioclass(tsk);
break;So we basically have two code paths that treat IOPRIO_CLASS_NONE
differently, which results in an RT async cfqq filed into a best effort
bucket.Attached is a patch which fixes the problem. I'm not sure how to make
it cleaner. Suggestions would be welcome.Signed-off-by: Jeff Moyer
Tested-by: Hidehiro Kawai
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 69abaffec7d47a083739b79e3066cb3730eba72e upstream.
Cfq_lookup_create_cfqg() allocates struct blkcg_gq using GFP_ATOMIC.
In cfq_find_alloc_queue() possible allocation failure is not handled.
As a result kernel oopses on NULL pointer dereference when
cfq_link_cfqq_cfqg() calls cfqg_get() for NULL pointer.Bug was introduced in v3.5 in commit cd1604fab4f9 ("blkcg: factor
out blkio_group creation"). Prior to that commit cfq group lookup
had returned pointer to root group as fallback.This patch handles this error using existing fallback oom_cfqq.
Signed-off-by: Konstantin Khlebnikov
Acked-by: Tejun Heo
Acked-by: Vivek Goyal
Fixes: cd1604fab4f9 ("blkcg: factor out blkio_group creation")
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 3fd7b60f2c7418239d586e359e0c6d8503e10646 upstream.
This patch drops legacy active_ts_list usage within iscsi_target_tq.c
code. It was originally used to track the active thread sets during
iscsi-target shutdown, and is no longer used by modern upstream code.Two people have reported list corruption using traditional iscsi-target
and iser-target with the following backtrace, that appears to be related
to iscsi_thread_set->ts_list being used across both active_ts_list and
inactive_ts_list.[ 60.782534] ------------[ cut here ]------------
[ 60.782543] WARNING: CPU: 0 PID: 9430 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
[ 60.782545] list_del corruption, ffff88045b00d180->next is LIST_POISON1 (dead000000100100)
[ 60.782546] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt ib_isert rdma_cm iw_cm ib_addr iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod configfs ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac edac_core sg i2c_i801 lpc_ich mfd_core mtip32xx igb i2c_algo_bit i2c_core ptp pps_core ioatdma dca wmi ext3(F) jbd(F) mbcache(F) sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci(F) libahci(F) isci(F) libsas(F) scsi_transport_sas(F) [last unloaded: speedstep_lib]
[ 60.782597] CPU: 0 PID: 9430 Comm: iscsi_ttx Tainted: GF 3.12.19+ #2
[ 60.782598] Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
[ 60.782599] 0000000000000035 ffff88044de31d08 ffffffff81553ae7 0000000000000035
[ 60.782602] ffff88044de31d58 ffff88044de31d48 ffffffff8104d1cc 0000000000000002
[ 60.782605] ffff88045b00d180 ffff88045b00d0c0 ffff88045b00d0c0 ffff88044de31e58
[ 60.782607] Call Trace:
[ 60.782611] [] dump_stack+0x49/0x62
[ 60.782615] [] warn_slowpath_common+0x8c/0xc0
[ 60.782618] [] warn_slowpath_fmt+0x46/0x50
[ 60.782620] [] __list_del_entry+0x63/0xd0
[ 60.782622] [] list_del+0x11/0x40
[ 60.782630] [] iscsi_del_ts_from_active_list+0x29/0x50 [iscsi_target_mod]
[ 60.782635] [] iscsi_tx_thread_pre_handler+0xa1/0x180 [iscsi_target_mod]
[ 60.782642] [] iscsi_target_tx_thread+0x4e/0x220 [iscsi_target_mod]
[ 60.782647] [] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
[ 60.782652] [] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
[ 60.782655] [] kthread+0xce/0xe0
[ 60.782657] [] ? kthread_freezable_should_stop+0x70/0x70
[ 60.782660] [] ret_from_fork+0x7c/0xb0
[ 60.782662] [] ? kthread_freezable_should_stop+0x70/0x70
[ 60.782663] ---[ end trace 9662f4a661d33965 ]---Since this code is no longer used, go ahead and drop the problematic usage
all-together.Reported-by: Gavin Guo
Reported-by: Moussa Ba
Signed-off-by: Nicholas Bellinger
Signed-off-by: Greg Kroah-Hartman -
commit 7772855a996ec6e16944b120ab5ce21050279821 upstream.
With scsi-mq enabled, userspace programs can get unexpected EWOULDBLOCK
(a.k.a. EAGAIN) errors when submitting commands to the SCSI generic
driver. Fix by calling blk_get_request() with GFP_KERNEL instead of
GFP_ATOMIC.Note: to avoid introducing a potential deadlock, this patch should be
applied after the patch titled "sg: fix unkillable I/O wait deadlock
with scsi-mq".Signed-off-by: Tony Battersby
Acked-by: Douglas Gilbert
Tested-by: Douglas Gilbert
Signed-off-by: James Bottomley
Signed-off-by: Greg Kroah-Hartman -
commit 7568615c1054907ea8c7701ab86dad51aa099888 upstream.
When using the write()/read() interface for submitting commands, the
SCSI generic driver does not call blk_put_request() on a completed SCSI
command until userspace calls read() to get the command completion.
Since scsi-mq uses a fixed number of preallocated requests, this makes
it possible for userspace to exhaust the entire preallocated supply of
requests. For places in the kernel that call blk_get_request() with
GFP_KERNEL, this can cause the calling process to deadlock in a
permanent unkillable I/O wait in blk_get_request() -> ... -> bt_get().
For places in the kernel that call blk_get_request() with GFP_ATOMIC,
this can cause blk_get_request() always to return -EWOULDBLOCK. Note
that these problems happen only if scsi-mq is enabled. Prevent the
problems by calling blk_put_request() as soon as the SCSI command
completes instead of waiting for userspace to call read().Signed-off-by: Tony Battersby
Acked-by: Douglas Gilbert
Tested-by: Douglas Gilbert
Signed-off-by: James Bottomley
Signed-off-by: Greg Kroah-Hartman -
commit d8ba1f971497c19cf80da1ea5391a46a5f9fbd41 upstream.
If the call to decode_rc_list() fails due to a memory allocation error,
then we need to truncate the array size to ensure that we only call
kfree() on those pointer that were allocated.Reported-by: David Ramos
Fixes: 4aece6a19cf7f ("nfs41: cb_sequence xdr implementation")
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman -
commit ea7c38fef0b774a5dc16fb0ca5935f0ae8568176 upstream.
If we have to do a return-on-close in the delegreturn code, then
we must ensure that the inode and super block remain referenced.Cc: Peng Tao
Signed-off-by: Trond Myklebust
Reviewed-by: Peng Tao
Signed-off-by: Greg Kroah-Hartman -
commit 03a9a42a1a7e5b3e7919ddfacc1d1cc81882a955 upstream.
Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home
Reported-by: Bruno Prémont
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman -
commit cb5d04bc39e914124e811ea55f3034d2379a5f6c upstream.
With pgio refactoring in v3.15, .init_read and .init_write can be
called with valid pgio->pg_lseg. file layout was fixed at that time
by commit c6194271f (pnfs: filelayout: support non page aligned
layouts). But the generic helper still needs to be fixed.Signed-off-by: Peng Tao
Signed-off-by: Greg Kroah-Hartman