06 Jan, 2017
33 commits
-
commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream.
During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file. A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue. task->cred->user_ns is always the same
as or descendent of mm->user_ns. Which guarantees that having
CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
credentials.To prevent regressions mm->dumpable and mm->user_ns are not considered
when a task has no mm. As simply failing ptrace_may_attach causes
regressions in privileged applications attempting to read things
such as /proc//statAcked-by: Kees Cook
Tested-by: Cyrill Gorcunov
Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman -
commit bcc7f5b4bee8e327689a4d994022765855c807ff upstream.
bdev->bd_contains is not stable before calling __blkdev_get().
When __blkdev_get() is called on a parition with ->bd_openers == 0
it sets
bdev->bd_contains = bdev;
which is not correct for a partition.
After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
and then ->bd_contains is stable.When FMODE_EXCL is used, blkdev_get() calls
bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim()This call happens before __blkdev_get() is called, so ->bd_contains
is not stable. So bd_may_claim() cannot safely use ->bd_contains.
It currently tries to use it, and this can lead to a BUG_ON().This happens when a whole device is already open with a bd_holder (in
use by dm in my particular example) and two threads race to open a
partition of that device for the first time, one opening with O_EXCL and
one without.The thread that doesn't use O_EXCL gets through blkdev_get() to
__blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;Immediately thereafter the other thread, using FMODE_EXCL, calls
bd_start_claiming() from blkdev_get(). This should fail because the
whole device has a holder, but because bdev->bd_contains == bdev
bd_may_claim() incorrectly reports success.
This thread continues and blocks on bd_mutex.The first thread then sets bdev->bd_contains correctly and drops the mutex.
The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
again in:
BUG_ON(!bd_may_claim(bdev, whole, holder));
The BUG_ON fires.Fix this by removing the dependency on ->bd_contains in
bd_may_claim(). As bd_may_claim() has direct access to the whole
device, it can simply test if the target bdev is the whole device.Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block")
Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 52bce91165e5f2db422b2b972e83d389e5e4725c upstream.
Commit 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()")
caused a regression when there were no more readers left on a pipe that
was being spliced into: rather than the expected SIGPIPE and -EPIPE
return value, the writer would end up waiting forever for space to free
up (which obviously was not going to happen with no readers around).Fixes: 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()")
Reported-and-tested-by: Andreas Schwab
Debugged-by: Al Viro
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman -
commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 upstream.
If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc//fd/... during this window, without requiring CAP_SYS_PTRACE.The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):[vfs]
-> proc_pid_link_inode_operations.get_link
-> proc_pid_get_link
-> proc_fd_access_allowed
-> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).Cc: dev@opencontainers.org
Reported-by: Michael Crosby
Signed-off-by: Aleksa Sarai
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman -
commit f84df2a6f268de584a201e8911384a2d244876e3 upstream.
When the user namespace support was merged the need to prevent
ptrace from revealing the contents of an unreadable executable
was overlooked.Correct this oversight by ensuring that the executed file
or files are in mm->user_ns, by adjusting mm->user_ns.Use the new function privileged_wrt_inode_uidgid to see if
the executable is a member of the user namespace, and as such
if having CAP_SYS_PTRACE in the user namespace should allow
tracing the executable. If not update mm->user_ns to
the parent user namespace until an appropriate parent is found.Reported-by: Jann Horn
Fixes: 9e4a36ece652 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.")
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman -
commit 035cd485a47dda64f25ccf8a90b11a07d0b7aa7a upstream.
The OMAP36xx DPLL5, driving EHCI USB, can be subject to a long-term
frequency drift. The frequency drift magnitude depends on the VCO update
rate, which is inversely proportional to the PLL divider. The kernel
DPLL configuration code results in a high value for the divider, leading
to a long term drift high enough to cause USB transmission errors. In
the worst case the USB PHY's ULPI interface can stop responding,
breaking USB operation completely. This manifests itself on the
Beagleboard xM by the LAN9514 reporting 'Cannot enable port 2. Maybe the
cable is bad?' in the kernel log.Errata sprz319 advisory 2.1 documents PLL values that minimize the
drift. Use them automatically when DPLL5 is used for USB operation,
which we detect based on the requested clock rate. The clock framework
will still compute the PLL parameters and resulting rate as usual, but
the PLL M and N values will then be overridden. This can result in the
effective clock rate being slightly different than the rate cached by
the clock framework, but won't cause any adverse effect to USB
operation.Signed-off-by: Richard Watts
[Upported from v3.2 to v4.9]
Signed-off-by: Laurent Pinchart
Tested-by: Ladislav Michl
Signed-off-by: Stephen Boyd
Cc: Adam Ford
Signed-off-by: Greg Kroah-Hartman -
commit 5e0ad0d8747f3e4803a9c3d96d64dd7332506d3c upstream.
Commit [64047d7f4912 ALSA: hda - ignore the assoc and seq when comparing
pin configurations] intented to ignore both seq and assoc at pin
comparing, but it only ignored seq. So that commit may still fail to
match pins on some machines.
Change the bitmask to also ignore assoc.v2: Use macro to do bit masking.
Thanks to Hui Wang for the analysis.
Fixes: 64047d7f4912 ("ALSA: hda - ignore the assoc and seq when comparing...")
Signed-off-by: Kai-Heng Feng
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman -
commit f73cd43ac3b41c0f09a126387f302bbc0d9c726d upstream.
HP Z1 Gen3 AiO with Conexant codec doesn't give an unsolicited event
to the headset mic pin upon the jack plugging, it reports only to the
headphone pin. It results in the missing mic switching. Let's fix up
by simply gating the jack event.Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman -
commit 989dbe4a30728c047316ab87e5fa8b609951ce7c upstream.
This group of new pins is not in the pin quirk table yet, adding
them to the pin quirk table to fix the headset-mic problem.Signed-off-by: Hui Wang
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman -
commit 64047d7f4912de1769d1bf0d34c6322494b13779 upstream.
More and more pin configurations have been adding to the pin quirk
table, lots of them are only different from assoc and seq, but they
all apply to the same QUIRK_FIXUP, if we don't compare assoc and seq
when matching pin configurations, it will greatly reduce the pin
quirk table size.We have tested this change on a couple of Dell laptops, it worked
well.Signed-off-by: Hui Wang
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman -
commit b5337cfe067e96b8a98699da90c7dcd2bec21133 upstream.
I'm using an Alienware 15 R2 and had to use the alienware quirks to
get my headphone output working.I fixed it by adding, SND_PCI_QUIRK(0x1028, 0x0708, "Alienware 15 R2
2016", QUIRK_ALIENWARE) to the patch.Signed-off-by: Sven Hahne
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman -
commit 995c6a7fd9b9212abdf01160f6ce3193176be503 upstream.
Sampling rate changes after first set one are not reflected to the
hardware, while driver and ALSA think the rate has been changed.Fix the problem by properly stopping the interface at the beginning of
prepare call, allowing new rate to be set to the hardware. This keeps
the hardware in sync with the driver.Signed-off-by: Jussi Laako
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman -
commit 82ffb6fc637150b279f49e174166d2aa3853eaf4 upstream.
The Logitech QuickCam Communicate Deluxe/S7500 microphone fails with the
following warning.[ 6.778995] usb 2-1.2.2.2: Warning! Unlikely big volume range (=3072),
cval->res is probably wrong.
[ 6.778996] usb 2-1.2.2.2: [5] FU [Mic Capture Volume] ch = 1, val =
4608/7680/1Adding it to the list of devices in volume_control_quirks makes it work
properly, fixing related typo.Signed-off-by: Con Kolivas
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman -
commit 3e448e13a662fb20145916636127995cbf37eb83 upstream.
ep_list inside gadget structure doesn't contain ep0.
It is stored separately in ep0 field.This causes an urb hang if gadget driver decides to
delay setup handling. On host side this is visible as
timeout error when setting configuration.This bug can be reproduced using for example any gadget
with mass storage function.Fixes: abdb29574322 ("usbip: vudc: Add vudc_transfer")
Signed-off-by: Krzysztof Opasiak
Acked-by: Shuah Khan
Signed-off-by: Greg Kroah-Hartman -
commit ccdb6be9ec6580ef69f68949ebe26e0fb58a6fb0 upstream.
The UHCI controllers in Intel chipsets rely on a platform-specific non-PME
mechanism for wakeup signalling. They can generate wakeup signals even
though they don't support PME.We need to let the USB core know this so that it will enable runtime
suspend for UHCI controllers.Signed-off-by: Alan Stern
Signed-off-by: Bjorn Helgaas
Acked-by: Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman -
commit e8f29bb719b47a234f33b0af62974d7a9521a52c upstream.
usb_endpoint_maxp() returns wMaxPacketSize in its
raw form. Without taking into consideration that it
also contains other bits reserved for isochronous
endpoints.This patch fixes one occasion where this is a
problem by making sure that we initialize
ep->maxpacket only with lower 10 bits of the value
returned by usb_endpoint_maxp(). Note that seperate
patches will be necessary to audit all call sites of
usb_endpoint_maxp() and make sure that
usb_endpoint_maxp() only returns lower 10 bits of
wMaxPacketSize.Signed-off-by: Felipe Balbi
Signed-off-by: Greg Kroah-Hartman -
commit f1d3861d63a5d79b8968a02eea1dcb01bb684e62 upstream.
The current error handling flow uses incorrect goto label, fix it
Fixes: d12a8727171c ("usb: gadget: function: Remove redundant usb_free_all_descriptors")
Signed-off-by: Peter Chen
Signed-off-by: Felipe Balbi
Signed-off-by: Greg Kroah-Hartman -
commit 89778ba335e302a450932ce5b703c1ee6216e949 upstream.
Calling brightness_set manually isn't safe as some LED drivers don't
implement this callback. The best idea is to just use a proper helper
which will fallback to the brightness_set_blocking callback if needed.This fixes:
[ 1461.761528] Unable to handle kernel NULL pointer dereference at virtual address 00000000
(...)
[ 1462.117049] Backtrace:
[ 1462.119521] [] (usbport_trig_port_store [ledtrig_usbport]) from [] (dev_attr_store+0x20/0x2c)
[ 1462.129826] r7:dcabc7c0 r6:dee0ff80 r5:00000002 r4:bf228164
[ 1462.135511] [] (dev_attr_store) from [] (sysfs_kf_write+0x48/0x4c)
[ 1462.143459] r5:00000002 r4:c023f738
[ 1462.147049] [] (sysfs_kf_write) from [] (kernfs_fop_write+0xf8/0x1f8)
[ 1462.155258] r5:00000002 r4:df4a1000
[ 1462.158850] [] (kernfs_fop_write) from [] (__vfs_write+0x34/0x120)
[ 1462.166800] r10:00000000 r9:dee0e000 r8:c000fc24 r7:00000002 r6:dee0ff80 r5:c01689c0
[ 1462.174660] r4:df727a80
[ 1462.177204] [] (__vfs_write) from [] (vfs_write+0xac/0x170)
[ 1462.184543] r9:dee0e000 r8:c000fc24 r7:dee0ff80 r6:b6f092d0 r5:df727a80 r4:00000002
[ 1462.192319] [] (vfs_write) from [] (SyS_write+0x4c/0xa8)
[ 1462.199396] r9:dee0e000 r8:c000fc24 r7:00000002 r6:b6f092d0 r5:df727a80 r4:df727a80
[ 1462.207174] [] (SyS_write) from [] (ret_fast_syscall+0x0/0x3c)
[ 1462.214774] r7:00000004 r6:ffffffff r5:00000000 r4:00000000
[ 1462.220456] Code: bad PC value
[ 1462.223560] ---[ end trace 676638a3a12c7a56 ]---Reported-by: Ralph Sennhauser
Signed-off-by: Rafał Miłecki
Fixes: 0f247626cbb ("usb: core: Introduce a USB port LED trigger")
Signed-off-by: Greg Kroah-Hartman -
commit 37be66767e3cae4fd16e064d8bb7f9f72bf5c045 upstream.
USB-3 does not have any link state that will avoid negotiating a connection
with a plugged-in cable but will signal the host when the cable is
unplugged.For USB-3 we used to first set the link to Disabled, then to RxDdetect to
be able to detect cable connects or disconnects. But in RxDetect the
connected device is detected again and eventually enabled.Instead set the link into U3 and disable remote wakeups for the device.
This is what Windows does, and what Alan Stern suggested.Cc: Alan Stern
Acked-by: Alan Stern
Signed-off-by: Mathias Nyman
Signed-off-by: Greg Kroah-Hartman -
commit 6b9018d4c1e5c958625be94a160a5984351d4632 upstream.
In case of High-Speed, High-Bandwidth endpoints, we
need to tell DWC3 that we have more than one packet
per interval. We do that by setting PCM1 field of
Isochronous-First TRB.Signed-off-by: Felipe Balbi
Signed-off-by: Greg Kroah-Hartman -
commit 301216044e4c27d5a7323c1fa766266fad00db5e upstream.
Add device-id entry for GW Instek AFG-125, which has a byte swapped
bInterfaceSubClass (0x20).Signed-off-by: Nathaniel Quillin
Acked-by: Oliver Neukum
Signed-off-by: Greg Kroah-Hartman -
commit 6774d5f53271d5f60464f824748995b71da401ab upstream.
Kill urbs and disable read before returning from open on failure to
retrieve the line state.Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold
Signed-off-by: Greg Kroah-Hartman -
commit d8a12b7117b42fd708f1e908498350232bdbd5ff upstream.
Adding registration for 3G modem DWM-158 in usb-serial-option
Signed-off-by: Giuseppe Lippolis
Signed-off-by: Johan Hovold
Signed-off-by: Greg Kroah-Hartman -
commit 5b09eff0c379002527ad72ea5ea38f25da8a8650 upstream.
This patch adds support for PIDs 0x1040, 0x1041 of Telit LE922A.
Since the interface positions are the same than the ones used
for other Telit compositions, previous defined blacklists are used.Signed-off-by: Daniele Palmas
Signed-off-by: Johan Hovold
Signed-off-by: Greg Kroah-Hartman -
commit 8d9eddad19467b008e0c881bc3133d7da94b7ec1 upstream.
We were setting the qgroup_rescan_running flag to true only after the
rescan worker started (which is a task run by a queue). So if a user
space task starts a rescan and immediately after asks to wait for the
rescan worker to finish, this second call might happen before the rescan
worker task starts running, in which case the rescan wait ioctl returns
immediatley, not waiting for the rescan worker to finish.This was making the fstest btrfs/022 fail very often.
Fixes: d2c609b834d6 (btrfs: properly track when rescan worker is running)
Signed-off-by: Filipe Manana
Reviewed-by: David Sterba
Signed-off-by: Greg Kroah-Hartman -
commit f177d73949bf758542ca15a1c1945bd2e802cc65 upstream.
We can not simply use the owner field from an extent buffer's header to
get the id of the respective tree when the extent buffer is from a
relocation tree. When we create the root for a relocation tree we leave
(on purpose) the owner field with the same value as the subvolume's tree
root (we do this at ctree.c:btrfs_copy_root()). So we must ignore extent
buffers from relocation trees, which have the BTRFS_HEADER_FLAG_RELOC
flag set, because otherwise we will always consider the extent buffer
as not being the root of the tree (the root of original subvolume tree
is always different from the root of the respective relocation tree).This lead to assertion failures when running with the integrity checker
enabled (CONFIG_BTRFS_FS_CHECK_INTEGRITY=y) such as the following:[ 643.393409] BTRFS critical (device sdg): corrupt leaf, non-root leaf's nritems is 0: block=38506496, root=260, slot=0
[ 643.397609] BTRFS info (device sdg): leaf 38506496 total ptrs 0 free space 3995
[ 643.407075] assertion failed: 0, file: fs/btrfs/disk-io.c, line: 4078
[ 643.408425] ------------[ cut here ]------------
[ 643.409112] kernel BUG at fs/btrfs/ctree.h:3419!
[ 643.409773] invalid opcode: 0000 [#1] PREEMPT SMP
[ 643.410447] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq ppdev psmouse acpi_cpufreq parport_pc evdev parport tpm_tis tpm_tis_core pcspkr serio_raw i2c_piix4 sg tpm i2c_core button processor loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring scsi_mod virtio e1000 floppy
[ 643.414356] CPU: 11 PID: 32726 Comm: btrfs Not tainted 4.8.0-rc8-btrfs-next-35+ #1
[ 643.414356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 643.414356] task: ffff880145e95b00 task.stack: ffff88014826c000
[ 643.414356] RIP: 0010:[] [] assfail.constprop.41+0x1c/0x1e [btrfs]
[ 643.414356] RSP: 0018:ffff88014826fa28 EFLAGS: 00010292
[ 643.414356] RAX: 0000000000000039 RBX: ffff88014e2d7c38 RCX: 0000000000000001
[ 643.414356] RDX: ffff88023f4d2f58 RSI: ffffffff81806c63 RDI: 00000000ffffffff
[ 643.414356] RBP: ffff88014826fa28 R08: 0000000000000001 R09: 0000000000000000
[ 643.414356] R10: ffff88014826f918 R11: ffffffff82f3c5ed R12: ffff880172910000
[ 643.414356] R13: ffff880233992230 R14: ffff8801a68a3310 R15: fffffffffffffff8
[ 643.414356] FS: 00007f9ca305e8c0(0000) GS:ffff88023f4c0000(0000) knlGS:0000000000000000
[ 643.414356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 643.414356] CR2: 00007f9ca3071000 CR3: 000000015d01b000 CR4: 00000000000006e0
[ 643.414356] Stack:
[ 643.414356] ffff88014826fa50 ffffffffa02d655a 000000000000000a ffff88014e2d7c38
[ 643.414356] 0000000000000000 ffff88014826faa8 ffffffffa02b72f3 ffff88014826fab8
[ 643.414356] 00ffffffa03228e4 0000000000000000 0000000000000000 ffff8801bbd4e000
[ 643.414356] Call Trace:
[ 643.414356] [] btrfs_mark_buffer_dirty+0xdf/0xe5 [btrfs]
[ 643.414356] [] btrfs_copy_root+0x18a/0x1d1 [btrfs]
[ 643.414356] [] create_reloc_root+0x72/0x1ba [btrfs]
[ 643.414356] [] btrfs_init_reloc_root+0x7b/0xa7 [btrfs]
[ 643.414356] [] record_root_in_trans+0xdf/0xed [btrfs]
[ 643.414356] [] btrfs_record_root_in_trans+0x50/0x6a [btrfs]
[ 643.414356] [] create_subvol+0x472/0x773 [btrfs]
[ 643.414356] [] btrfs_mksubvol+0x3da/0x463 [btrfs]
[ 643.414356] [] ? btrfs_mksubvol+0x3da/0x463 [btrfs]
[ 643.414356] [] ? preempt_count_add+0x65/0x68
[ 643.414356] [] ? __mnt_want_write+0x62/0x77
[ 643.414356] [] btrfs_ioctl_snap_create_transid+0xce/0x187 [btrfs]
[ 643.414356] [] btrfs_ioctl_snap_create+0x67/0x81 [btrfs]
[ 643.414356] [] btrfs_ioctl+0x508/0x20dd [btrfs]
[ 643.414356] [] ? __this_cpu_preempt_check+0x13/0x15
[ 643.414356] [] ? handle_mm_fault+0x976/0x9ab
[ 643.414356] [] ? arch_local_irq_save+0x9/0xc
[ 643.414356] [] vfs_ioctl+0x18/0x34
[ 643.414356] [] do_vfs_ioctl+0x581/0x600
[ 643.414356] [] ? entry_SYSCALL_64_fastpath+0x5/0xa8
[ 643.414356] [] ? trace_hardirqs_on_caller+0x17b/0x197
[ 643.414356] [] SyS_ioctl+0x57/0x79
[ 643.414356] [] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 643.414356] [] ? trace_hardirqs_off_caller+0x3f/0xaa
[ 643.414356] Code: 89 83 88 00 00 00 31 c0 5b 41 5c 41 5d 5d c3 55 89 f1 48 c7 c2 98 bc 35 a0 48 89 fe 48 c7 c7 05 be 35 a0 48 89 e5 e8 13 46 dd e0 0b 55 89 f1 48 c7 c2 9f d3 35 a0 48 89 fe 48 c7 c7 7a d5 35
[ 643.414356] RIP [] assfail.constprop.41+0x1c/0x1e [btrfs]
[ 643.414356] RSP
[ 643.468267] ---[ end trace 6a1b3fb1a9d7d6e3 ]---This can be easily reproduced by running xfstests with the integrity
checker enabled.Fixes: 1ba98d086fe3 (Btrfs: detect corruption when non-root leaf has zero item)
Signed-off-by: Filipe Manana
Reviewed-by: Liu Bo
Signed-off-by: Greg Kroah-Hartman -
commit ed0df618b1b06d7431ee4d985317fc5419a5d559 upstream.
The balance status item contains currently known filter values, but the
stripes filter was unintentionally not among them. This would mean, that
interrupted and automatically restarted balance does not apply the
stripe filters.Fixes: dee32d0ac3719ef8d640efaf0884111df444730f
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman -
commit 054570a1dc94de20e7a612cddcc5a97db9c37b6f upstream.
During relocation of a data block group we create a relocation tree
for each fs/subvol tree by making a snapshot of each tree using
btrfs_copy_root() and the tree's commit root, and then setting the last
snapshot field for the fs/subvol tree's root to the value of the current
transaction id minus 1. However this can lead to relocation later
dropping references that it did not create if we have qgroups enabled,
leaving the filesystem in an inconsistent state that keeps aborting
transactions.Lets consider the following example to explain the problem, which requires
qgroups to be enabled.We are relocating data block group Y, we have a subvolume with id 258 that
has a root at level 1, that subvolume is used to store directory entries
for snapshots and we are currently at transaction 3404.When committing transaction 3404, we have a pending snapshot and therefore
we call btrfs_run_delayed_items() at transaction.c:create_pending_snapshot()
in order to create its dentry at subvolume 258. This results in COWing
leaf A from root 258 in order to add the dentry. Note that leaf A
also contains file extent items referring to extents from some other
block group X (we are currently relocating block group Y). Later on, still
at create_pending_snapshot() we call qgroup_account_snapshot(), which
switches the commit root for root 258 when it calls switch_commit_roots(),
so now the COWed version of leaf A, lets call it leaf A', is accessible
from the commit root of tree 258. At the end of qgroup_account_snapshot(),
we call record_root_in_trans() with 258 as its argument, which results
in btrfs_init_reloc_root() being called, which in turn calls
relocation.c:create_reloc_root() in order to create a relocation tree
associated to root 258, which results in assigning the value of 3403
(which is the current transaction id minus 1 = 3404 - 1) to the
last_snapshot field of root 258. When creating the relocation tree root
at ctree.c:btrfs_copy_root() we add a shared reference for leaf A',
corresponding to the relocation tree's root, when we call btrfs_inc_ref()
against the COWed root (a copy of the commit root from tree 258), which
is at level 1. So at this point leaf A' has 2 references, one normal
reference corresponding to root 258 and one shared reference corresponding
to the root of the relocation tree.Transaction 3404 finishes its commit and transaction 3405 is started by
relocation when calling merge_reloc_root() for the relocation tree
associated to root 258. In the meanwhile leaf A' is COWed again, in
response to some filesystem operation, when we are still at transaction
3405. However when we COW leaf A', at ctree.c:update_ref_for_cow(), we
call btrfs_block_can_be_shared() in order to figure out if other trees
refer to the leaf and if any such trees exists, add a full back reference
to leaf A' - but btrfs_block_can_be_shared() incorrectly returns false
because the following condition is false:btrfs_header_generation(buf) root_item)
which evaluates to 3404 refs[0] is 1, it does call
btrfs_dec_ref() against leaf A', which results in removing the single
references that the extents from block group X have which are associated
to root 258 - the expectation was to have each of these extents with 2
references - one reference for root 258 and one shared reference related
to the root of the relocation tree, and so we would drop only the shared
reference (because leaf A' was supposed to have the flag
BTRFS_BLOCK_FLAG_FULL_BACKREF set).This leaves the filesystem in an inconsistent state as we now have file
extent items in a subvolume tree that point to extents from block group X
without references in the extent tree. So later on when we try to decrement
the references for these extents, for example due to a file unlink operation,
truncate operation or overwriting ranges of a file, we fail because the
expected references do not exist in the extent tree.This leads to warnings and transaction aborts like the following:
[ 588.965795] ------------[ cut here ]------------
[ 588.965815] WARNING: CPU: 2 PID: 2479 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x432/0x5b0 [btrfs]
[ 588.965816] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs xfs libcrc32c ppdev acpi_cpufreq button tpm_tis e1000 i2c_piix4 pcspkr parport_pc
parport tpm qemu_fw_cfg joydev btrfs xor raid6_pq sr_mod cdrom ata_generic virtio_scsi ata_piix virtio_pci bochs_drm virtio_ring drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops virtio ttm serio_raw drm floppy sg
[ 588.965831] CPU: 2 PID: 2479 Comm: kworker/u8:7 Not tainted 4.7.3-3-default-fdm+ #1
[ 588.965832] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 588.965844] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[ 588.965845] 0000000000000000 ffff8802263bfa28 ffffffff813af542 0000000000000000
[ 588.965847] 0000000000000000 ffff8802263bfa68 ffffffff81081e8b 0000065900000000
[ 588.965848] ffff8801db2af000 000000012bbe2000 0000000000000000 ffff880215703b48
[ 588.965849] Call Trace:
[ 588.965852] [] dump_stack+0x63/0x81
[ 588.965854] [] __warn+0xcb/0xf0
[ 588.965855] [] warn_slowpath_null+0x1d/0x20
[ 588.965863] [] lookup_inline_extent_backref+0x432/0x5b0 [btrfs]
[ 588.965865] [] ? trace_clock_local+0x10/0x30
[ 588.965867] [] ? rb_reserve_next_event+0x6f/0x460
[ 588.965875] [] insert_inline_extent_backref+0x55/0xd0 [btrfs]
[ 588.965882] [] __btrfs_inc_extent_ref.isra.55+0x8f/0x240 [btrfs]
[ 588.965890] [] __btrfs_run_delayed_refs+0x74a/0x1260 [btrfs]
[ 588.965892] [] ? cpuacct_charge+0x86/0xa0
[ 588.965900] [] btrfs_run_delayed_refs+0x9f/0x2c0 [btrfs]
[ 588.965908] [] delayed_ref_async_start+0x94/0xb0 [btrfs]
[ 588.965918] [] btrfs_scrubparity_helper+0xca/0x350 [btrfs]
[ 588.965928] [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
[ 588.965930] [] process_one_work+0x1f3/0x4e0
[ 588.965931] [] worker_thread+0x48/0x4e0
[ 588.965932] [] ? process_one_work+0x4e0/0x4e0
[ 588.965934] [] kthread+0xc9/0xe0
[ 588.965936] [] ret_from_fork+0x1f/0x40
[ 588.965937] [] ? kthread_worker_fn+0x170/0x170
[ 588.965938] ---[ end trace 34e5232c933a1749 ]---
[ 588.966187] ------------[ cut here ]------------
[ 588.966196] WARNING: CPU: 2 PID: 2479 at fs/btrfs/extent-tree.c:2966 btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs]
[ 588.966196] BTRFS: Transaction aborted (error -5)
[ 588.966197] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs xfs libcrc32c ppdev acpi_cpufreq button tpm_tis e1000 i2c_piix4 pcspkr parport_pc
parport tpm qemu_fw_cfg joydev btrfs xor raid6_pq sr_mod cdrom ata_generic virtio_scsi ata_piix virtio_pci bochs_drm virtio_ring drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops virtio ttm serio_raw drm floppy sg
[ 588.966206] CPU: 2 PID: 2479 Comm: kworker/u8:7 Tainted: G W 4.7.3-3-default-fdm+ #1
[ 588.966207] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 588.966217] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[ 588.966217] 0000000000000000 ffff8802263bfc98 ffffffff813af542 ffff8802263bfce8
[ 588.966219] 0000000000000000 ffff8802263bfcd8 ffffffff81081e8b 00000b96345ee000
[ 588.966220] ffffffffa021ae1c ffff880215703b48 00000000000005fe ffff8802345ee000
[ 588.966221] Call Trace:
[ 588.966223] [] dump_stack+0x63/0x81
[ 588.966224] [] __warn+0xcb/0xf0
[ 588.966225] [] warn_slowpath_fmt+0x4f/0x60
[ 588.966233] [] btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs]
[ 588.966241] [] delayed_ref_async_start+0x94/0xb0 [btrfs]
[ 588.966250] [] btrfs_scrubparity_helper+0xca/0x350 [btrfs]
[ 588.966259] [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
[ 588.966260] [] process_one_work+0x1f3/0x4e0
[ 588.966261] [] worker_thread+0x48/0x4e0
[ 588.966263] [] ? process_one_work+0x4e0/0x4e0
[ 588.966264] [] kthread+0xc9/0xe0
[ 588.966265] [] ret_from_fork+0x1f/0x40
[ 588.966267] [] ? kthread_worker_fn+0x170/0x170
[ 588.966268] ---[ end trace 34e5232c933a174a ]---
[ 588.966269] BTRFS: error (device sda2) in btrfs_run_delayed_refs:2966: errno=-5 IO failure
[ 588.966270] BTRFS info (device sda2): forced readonlyThis was happening often on openSUSE and SLE systems using btrfs as the
root filesystem (with its default layout where multiple subvolumes are
used) where balance happens in the background triggered by a cron job and
snapshots are automatically created before/after package installations,
upgrades and removals. The issue could be triggered simply by running the
following loop on the first system boot post installation:while true; do
zypper -n in nfs-kernel-server
zypper -n rm nfs-kernel-server
done(If we were fast enough and made that loop before the cron job triggered
a balance operation and the balance finished)So fix by setting the last_snapshot field of the root to the value of the
generation of its commit root. Like this btrfs_block_can_be_shared()
behaves correctly for the case where the relocation root is created during
a transaction commit and for the case where it's created before a
transaction commit.Fixes: 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting when creating snapshot)
Signed-off-by: Filipe Manana
Reviewed-by: Josef Bacik
Signed-off-by: Greg Kroah-Hartman -
commit 2a7bf53f577e49c43de4ffa7776056de26db65d9 upstream.
If a log tree has a layout like the following:
leaf N:
...
item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
dir log end 1275809046
leaf N + 1:
item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8
dir log end 18446744073709551615
...When we pass the value 1275809046 + 1 as the parameter start_ret to the
function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
end up with path->slots[0] having the value 239 (points to the last item
of leaf N, item 240). Because the dir log item in that position has an
offset value smaller than *start_ret (1275809046 + 1) we need to move on
to the next leaf, however the logic for that is wrong since it compares
the current slot to the number of items in the leaf, which is smaller
and therefore we don't lookup for the next leaf but instead we set the
slot to point to an item that does not exist, at slot 240, and we later
operate on that slot which has unexpected content or in the worst case
can result in an invalid memory access (accessing beyond the last page
of leaf N's extent buffer).So fix the logic that checks when we need to lookup at the next leaf
by first incrementing the slot and only after to check if that slot
is beyond the last item of the current leaf.Signed-off-by: Robbie Ko
Reviewed-by: Filipe Manana
Fixes: e02119d5a7b4 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
Signed-off-by: Filipe Manana
[Modified changelog for clarity and correctness]
Signed-off-by: Greg Kroah-Hartman -
commit ec125cfb7ae2157af3dd45dd8abe823e3e233eec upstream.
While logging new directory entries, at tree-log.c:log_new_dir_dentries(),
after we call btrfs_search_forward() we get a leaf with a read lock on it,
and without unlocking that leaf we can end up calling btrfs_iget() to get
an inode pointer. The later (btrfs_iget()) can end up doing a read-only
search on the same tree again, if the inode is not in memory already, which
ends up causing a deadlock if some other task in the meanwhile started a
write search on the tree and is attempting to write lock the same leaf
that btrfs_search_forward() locked while holding write locks on upper
levels of the tree blocking the read search from btrfs_iget(). In this
scenario we get a deadlock.So fix this by releasing the search path before calling btrfs_iget() at
tree-log.c:log_new_dir_dentries().Example trace of such deadlock:
[ 4077.478852] kworker/u24:10 D ffff88107fc90640 0 14431 2 0x00000000
[ 4077.486752] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 4077.494346] ffff880ffa56bad0 0000000000000046 0000000000009000 ffff880ffa56bfd8
[ 4077.502629] ffff880ffa56bfd8 ffff881016ce21c0 ffffffffa06ecb26 ffff88101a5d6138
[ 4077.510915] ffff880ebb5173b0 ffff880ffa56baf8 ffff880ebb517410 ffff881016ce21c0
[ 4077.519202] Call Trace:
[ 4077.528752] [] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs]
[ 4077.536049] [] ? wake_up_atomic_t+0x30/0x30
[ 4077.542574] [] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4077.550171] [] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs]
[ 4077.558252] [] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs]
[ 4077.566140] [] ? add_delayed_data_ref+0xe2/0x150 [btrfs]
[ 4077.573928] [] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs]
[ 4077.582399] [] ? __set_extent_bit+0x4c0/0x5c0 [btrfs]
[ 4077.589896] [] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs]
[ 4077.599632] [] ? start_transaction+0x8d/0x470 [btrfs]
[ 4077.607134] [] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs]
[ 4077.615329] [] ? process_one_work+0x142/0x3d0
[ 4077.622043] [] ? worker_thread+0x109/0x3b0
[ 4077.628459] [] ? manage_workers.isra.26+0x270/0x270
[ 4077.635759] [] ? kthread+0xaf/0xc0
[ 4077.641404] [] ? kthread_create_on_node+0x110/0x110
[ 4077.648696] [] ? ret_from_fork+0x58/0x90
[ 4077.654926] [] ? kthread_create_on_node+0x110/0x110[ 4078.358087] kworker/u24:15 D ffff88107fcd0640 0 14436 2 0x00000000
[ 4078.365981] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 4078.373574] ffff880ffa57fad0 0000000000000046 0000000000009000 ffff880ffa57ffd8
[ 4078.381864] ffff880ffa57ffd8 ffff88103004d0a0 ffffffffa06ecb26 ffff88101a5d6138
[ 4078.390163] ffff880fbeffc298 ffff880ffa57faf8 ffff880fbeffc2f8 ffff88103004d0a0
[ 4078.398466] Call Trace:
[ 4078.408019] [] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs]
[ 4078.415322] [] ? wake_up_atomic_t+0x30/0x30
[ 4078.421844] [] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4078.429438] [] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs]
[ 4078.437518] [] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs]
[ 4078.445404] [] ? add_delayed_data_ref+0xe2/0x150 [btrfs]
[ 4078.453194] [] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs]
[ 4078.461663] [] ? __set_extent_bit+0x4c0/0x5c0 [btrfs]
[ 4078.469161] [] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs]
[ 4078.478893] [] ? start_transaction+0x8d/0x470 [btrfs]
[ 4078.486388] [] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs]
[ 4078.494561] [] ? process_one_work+0x142/0x3d0
[ 4078.501278] [] ? pwq_activate_delayed_work+0x27/0x40
[ 4078.508673] [] ? worker_thread+0x109/0x3b0
[ 4078.515098] [] ? manage_workers.isra.26+0x270/0x270
[ 4078.522396] [] ? kthread+0xaf/0xc0
[ 4078.528032] [] ? kthread_create_on_node+0x110/0x110
[ 4078.535325] [] ? ret_from_fork+0x58/0x90
[ 4078.541552] [] ? kthread_create_on_node+0x110/0x110[ 4079.355824] user-space-program D ffff88107fd30640 0 32020 1 0x00000000
[ 4079.363716] ffff880eae8eba10 0000000000000086 0000000000009000 ffff880eae8ebfd8
[ 4079.372003] ffff880eae8ebfd8 ffff881016c162c0 ffffffffa06ecb26 ffff88101a5d6138
[ 4079.380294] ffff880fbed4b4c8 ffff880eae8eba38 ffff880fbed4b528 ffff881016c162c0
[ 4079.388586] Call Trace:
[ 4079.398134] [] ? btrfs_tree_lock+0x85/0x2f0 [btrfs]
[ 4079.405431] [] ? wake_up_atomic_t+0x30/0x30
[ 4079.411955] [] ? btrfs_lock_root_node+0x2b/0x40 [btrfs]
[ 4079.419644] [] ? btrfs_search_slot+0xa03/0xb10 [btrfs]
[ 4079.427237] [] ? btrfs_buffer_uptodate+0x52/0x70 [btrfs]
[ 4079.435041] [] ? generic_bin_search.constprop.38+0x80/0x190 [btrfs]
[ 4079.443897] [] ? btrfs_insert_empty_items+0x74/0xd0 [btrfs]
[ 4079.451975] [] ? copy_items+0x128/0x850 [btrfs]
[ 4079.458890] [] ? btrfs_log_inode+0x629/0xbf3 [btrfs]
[ 4079.466292] [] ? btrfs_log_inode_parent+0xc61/0xf30 [btrfs]
[ 4079.474373] [] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs]
[ 4079.482161] [] ? btrfs_sync_file+0x20d/0x330 [btrfs]
[ 4079.489558] [] ? do_fsync+0x4c/0x80
[ 4079.495300] [] ? SyS_fdatasync+0xa/0x10
[ 4079.501422] [] ? system_call_fastpath+0x16/0x1b[ 4079.508334] user-space-program D ffff88107fc30640 0 32021 1 0x00000004
[ 4079.516226] ffff880eae8efbf8 0000000000000086 0000000000009000 ffff880eae8effd8
[ 4079.524513] ffff880eae8effd8 ffff881030279610 ffffffffa06ecb26 ffff88101a5d6138
[ 4079.532802] ffff880ebb671d88 ffff880eae8efc20 ffff880ebb671de8 ffff881030279610
[ 4079.541092] Call Trace:
[ 4079.550642] [] ? btrfs_tree_lock+0x85/0x2f0 [btrfs]
[ 4079.557941] [] ? wake_up_atomic_t+0x30/0x30
[ 4079.564463] [] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4079.572058] [] ? btrfs_truncate_inode_items+0x168/0xb90 [btrfs]
[ 4079.580526] [] ? join_transaction.isra.15+0x1e/0x3a0 [btrfs]
[ 4079.588701] [] ? start_transaction+0x8d/0x470 [btrfs]
[ 4079.596196] [] ? block_rsv_add_bytes+0x16/0x50 [btrfs]
[ 4079.603789] [] ? btrfs_truncate+0xe9/0x2e0 [btrfs]
[ 4079.610994] [] ? btrfs_setattr+0x30b/0x410 [btrfs]
[ 4079.618197] [] ? notify_change+0x1dc/0x680
[ 4079.624625] [] ? aa_path_perm+0xd4/0x160
[ 4079.630854] [] ? do_truncate+0x5b/0x90
[ 4079.636889] [] ? do_sys_ftruncate.constprop.15+0x10a/0x160
[ 4079.644869] [] ? SyS_fcntl+0x5b/0x570
[ 4079.650805] [] ? system_call_fastpath+0x16/0x1b[ 4080.410607] user-space-program D ffff88107fc70640 0 32028 12639 0x00000004
[ 4080.418489] ffff880eaeccbbe0 0000000000000086 0000000000009000 ffff880eaeccbfd8
[ 4080.426778] ffff880eaeccbfd8 ffff880f317ef1e0 ffffffffa06ecb26 ffff88101a5d6138
[ 4080.435067] ffff880ef7e93928 ffff880f317ef1e0 ffff880eaeccbc08 ffff880f317ef1e0
[ 4080.443353] Call Trace:
[ 4080.452920] [] ? btrfs_tree_read_lock+0xdd/0x190 [btrfs]
[ 4080.460703] [] ? wake_up_atomic_t+0x30/0x30
[ 4080.467225] [] ? btrfs_read_lock_root_node+0x2b/0x40 [btrfs]
[ 4080.475400] [] ? btrfs_search_slot+0x801/0xb10 [btrfs]
[ 4080.482994] [] ? btrfs_clean_one_deleted_snapshot+0xe0/0xe0 [btrfs]
[ 4080.491857] [] ? btrfs_lookup_inode+0x26/0x90 [btrfs]
[ 4080.499353] [] ? kmem_cache_alloc+0xaf/0xc0
[ 4080.505879] [] ? btrfs_iget+0xd5/0x5d0 [btrfs]
[ 4080.512696] [] ? btrfs_get_token_64+0x104/0x120 [btrfs]
[ 4080.520387] [] ? btrfs_log_inode_parent+0xbdf/0xf30 [btrfs]
[ 4080.528469] [] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs]
[ 4080.536258] [] ? btrfs_sync_file+0x20d/0x330 [btrfs]
[ 4080.543657] [] ? do_fsync+0x4c/0x80
[ 4080.549399] [] ? SyS_fdatasync+0xa/0x10
[ 4080.555534] [] ? system_call_fastpath+0x16/0x1bSigned-off-by: Robbie Ko
Reviewed-by: Filipe Manana
Fixes: 2f2ff0ee5e43 (Btrfs: fix metadata inconsistencies after directory fsync)
Signed-off-by: Filipe Manana
[Modified changelog for clarity and correctness]
Signed-off-by: Greg Kroah-Hartman -
commit ef85b25e982b5bba1530b936e283ef129f02ab9d upstream.
This can only happen with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y.
Commit 1ba98d0 ("Btrfs: detect corruption when non-root leaf has zero item")
assumes that a leaf is its root when leaf->bytenr == btrfs_root_bytenr(root),
however, we should not use btrfs_root_bytenr(root) since it's mainly got
updated during committing transaction. So the check can fail when doing
COW on this leaf while it is a root.This changes to use "if (leaf == btrfs_root_node(root))" instead, just like
how we check whether leaf is a root in __btrfs_cow_block().Fixes: 1ba98d086fe3 (Btrfs: detect corruption when non-root leaf has zero item)
Reported-by: Jeff Mahoney
Signed-off-by: Liu Bo
Reviewed-by: Filipe Manana
Signed-off-by: Greg Kroah-Hartman -
commit 2939e1a86f758b55cdba73e29397dd3d94df13bc upstream.
Problem statement: unprivileged user who has read-write access to more than
one btrfs subvolume may easily consume all kernel memory (eventually
triggering oom-killer).Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):
[root@kteam1 ~]# cat prep.sh
DEV=/dev/sdb
mkfs.btrfs -f $DEV
mount $DEV /mnt
for i in `seq 1 16`
do
mkdir /mnt/$i
btrfs subvolume create /mnt/SV_$i
ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
chmod a+rwx /mnt/$i
done[root@kteam1 ~]# sh prep.sh
[maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done
[root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0
kmalloc-128 9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0
kmalloc-128 24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0
kmalloc-128 42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata 1336070 1336070 0The huge numbers above come from insane number of async_work-s allocated
and queued by btrfs_wq_run_delayed_node.The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
worker func (btrfs_async_run_delayed_root) processes at least
BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
works as expected while the list is almost empty. As soon as it is getting
bigger, worker func starts to process more than one item at a time, it takes
longer, and the chances to have async_works queued more than needed is getting
higher.The problem above is worsened by another flaw of delayed-inode implementation:
if async_work was queued in a throttling branch (number of items >=
BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
the func occupies CPU infinitely (up to 30sec in my experiments): while the
func is trying to drain the list, the user activity may add more and more
items to the list.The patch fixes both problems in straightforward way: refuse queuing too
many works in btrfs_wq_run_delayed_node and bail out of worker func if
at least BTRFS_DELAYED_WRITEBACK items are processed.Changed in v2: remove support of thresh == NO_THRESHOLD.
Signed-off-by: Maxim Patlasov
Signed-off-by: Chris Mason
Signed-off-by: Greg Kroah-Hartman -
commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream.
Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following[ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[ 144.971337] IP: [] raw_notifier_chain_register+0x1b/0x40[ 145.122628] Call Trace:
[ 145.125086] [] __register_cpu_notifier+0x18/0x20
[ 145.131350] [] zswap_pool_create+0x273/0x400
[ 145.137268] [] __zswap_param_set+0x1fc/0x300
[ 145.143188] [] ? trace_hardirqs_on+0xd/0x10
[ 145.149018] [] ? kernel_param_lock+0x28/0x30
[ 145.154940] [] ? __might_fault+0x4f/0xa0
[ 145.160511] [] zswap_compressor_param_set+0x17/0x20
[ 145.167035] [] param_attr_store+0x5c/0xb0
[ 145.172694] [] module_attr_store+0x1d/0x30
[ 145.178443] [] sysfs_kf_write+0x4f/0x70
[ 145.183925] [] kernfs_fop_write+0x149/0x180
[ 145.189761] [] __vfs_write+0x18/0x40
[ 145.194982] [] vfs_write+0xb2/0x1a0
[ 145.200122] [] SyS_write+0x52/0xa0
[ 145.205177] [] entry_SYSCALL_64_fastpath+0x12/0x17This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao
Signed-off-by: Michal Hocko
Cc: linux-mm@kvack.org
Cc: Andrew Morton
Cc: Dan Streetman
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
12 Dec, 2016
2 commits
-
Pull MIPS fixes from Ralf Baechle:
"Two more MIPS fixes for 4.9:- RTC: Return -ENODEV so an external RTC will be tried
- Fix mask of GPE frequency
These two have been tested on Imagination's automated test system and
also both received positive reviews on the linux-mips mailing list"* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Lantiq: Fix mask of GPE frequency
MIPS: Return -ENODEV from weak implementation of rtc_mips_set_time
11 Dec, 2016
4 commits
-
The hardware documentation says bit 11:10 are used for the GPE
frequency selection. Fix the mask in the define to match these bits.Signed-off-by: Hauke Mehrtens
Reported-by: Dan Carpenter
Reviewed-by: Thomas Langer
Cc: linux-mips@linux-mips.org
Cc: john@phrozen.org
Patchwork: https://patchwork.linux-mips.org/patch/14648/
Signed-off-by: Ralf Baechle -
The sync_cmos_clock function in kernel/time/ntp.c first tries to update
the internal clock of the cpu by calling the "update_persistent_clock64"
architecture specific function. If this returns -ENODEV, it then tries
to update an external RTC using "rtc_set_ntp_time".On the mips architecture, the weak implementation of the underlying
function would return 0 if it wasn't overridden. This meant that the
sync_cmos_clock function would never try to update an external RTC
(if both CONFIG_GENERIC_CMOS_UPDATE and CONFIG_RTC_SYSTOHC are
configured)Returning -ENODEV instead, means that an external RTC will be tried.
Signed-off-by: Luuk Paulussen
Reviewed-by: Richard Laing
Reviewed-by: Scott Parlane
Reviewed-by: Chris Packham
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14649/
Signed-off-by: Ralf Baechle -
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:- Fix pointer size when caam is used with AArch64 boot loader on
AArch32 kernel.- Fix ahash state corruption in marvell driver.
- Fix buggy algif_aed tag handling.
- Prevent mcryptd from being used with incompatible algorithms which
can cause crashes"* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algif_aead - fix uninitialized variable warning
crypto: mcryptd - Check mcryptd algorithm compatibility
crypto: algif_aead - fix AEAD tag memory handling
crypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernel
crypto: marvell - Don't corrupt state of an STD req for re-stepped ahash
crypto: marvell - Don't copy hash operation twice into the SRAM -
Pull networking fixes from David Miller:
1) Limit the number of can filters to avoid > MAX_ORDER allocations.
Fix from Marc Kleine-Budde.2) Limit GSO max size in netvsc driver to avoid problems with NVGRE
configurations. From Stephen Hemminger.3) Return proper error when memory allocation fails in
ser_gigaset_init(), from Dan Carpenter.4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao
Feng.5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers, from
Florian Fainelli.6) Handle probe deferral properly in smsc911x driver.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: mlx5: Fix Kconfig help text
net: smsc911x: back out silently on probe deferrals
ibmveth: set correct gso_size and gso_type
net: ethernet: cpmac: Call SET_NETDEV_DEV()
net: ethernet: lantiq_etop: Call SET_NETDEV_DEV()
vhost-vsock: fix orphan connection reset
cxgb4/cxgb4vf: Assign netdev->dev_port with port ID
driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed
ser_gigaset: return -ENOMEM on error instead of success
NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
can: peak: fix bad memory access and free sequence
phy: Don't increment MDIO bus refcount unless it's a different owner
netvsc: reduce maximum GSO size
drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links
can: raw: raw_setsockopt: limit number of can_filter that can be set
10 Dec, 2016
1 commit
-
Since the following commit, Infiniband and Ethernet have not been
mutually exclusive.Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet
Signed-off-by: Christopher Covington
Signed-off-by: David S. Miller