01 Nov, 2015

5 commits

  • ffs counts bit starting with 1 (for the least significant bit), __ffs
    counts bits starting with 0. This patch changes various occurrences of ffs
    to __ffs and removes subtraction of 1 from the result.

    Note that __ffs (unlike ffs) is not defined when called with zero
    argument, but it is not called with zero argument in any of these cases.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     
  • Remove DM's unneeded NULL tests before calling these destroy functions,
    now that they check for NULL, thanks to these v4.3 commits:
    3942d2991 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()")
    4e3ca3e03 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@ expression x; @@
    -if (x != NULL)
    \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Mike Snitzer

    Julia Lawall
     
  • This adds support to pass through persistent reservation requests
    similar to the existing ioctl handling, and with the same limitations,
    e.g. devices may only have a single target attached.

    This is mostly intended for multipathing.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Mike Snitzer

    Christoph Hellwig
     
  • This moves the call to blkdev_ioctl and the argument checking to DM core
    code, and only leaves a callout to find the block device to operate on
    in the targets. This simplifies the code and allows us to pass through
    ioctl-like command using other methods in the next patch.

    Also split out a helper around calling the prepare_ioctl method that
    will be reused for persistent reservation handling.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Mike Snitzer

    Christoph Hellwig
     
  • This reverts commit a1989b330093578ea5470bea0a00f940c444c466.

    That commit introduced a regression at least for the case of the SG_IO ioctl()
    running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
    are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
    than blocking due to queue_if_no_path until a path becomes active, for example.

    That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
    (qemu "-device scsi-block" [1], libvirt "" [2])
    from multipath devices; which leads to SCSI/filesystem errors in such a guest.

    More general scenarios can hit that regression too. The following demonstration
    employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
    (some output & user changes omitted for brevity and comments added for clarity).

    Reverting that commit restores normal operation (queueing) in failing scenarios;
    tested on linux-next (next-20151022).

    1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)

    $ cat sg_simple0.c
    ... see [3] ...
    $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
    $ gcc sgio_inquiry.c -o sgio_inquiry

    2) The ioctl() works fine with active paths present.

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | |- 8:0:11:0 sdz 65:144 active undef running
    | `- 9:0:9:0 sdbf 67:144 active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
    |- 8:0:12:0 sdae 65:224 active undef running
    `- 9:0:12:0 sdbo 68:32 active undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    Some of the INQUIRY command's response:
    IBM 2145 0000
    INQUIRY duration=0 millisecs, resid=0

    3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
    for unprivileged users (rather than blocking due to queue_if_no_path).

    # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
    do multipathd -k"fail path $path"; done

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | |- 8:0:11:0 sdz 65:144 failed undef running
    | `- 9:0:9:0 sdbf 67:144 failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
    |- 8:0:12:0 sdae 65:224 failed undef running
    `- 9:0:12:0 sdbo 68:32 failed undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

    4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
    it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().

    $ dmesg

    [] device-mapper: multipath: Failing path 65:144.
    [] device-mapper: multipath: Failing path 67:144.
    [] device-mapper: multipath: Failing path 65:224.
    [] device-mapper: multipath: Failing path 68:32.
    [] sgio_inquiry: sending ioctl 2285 to a partition!

    5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
    (then queueing happens -- in this example, queue_if_no_path is set);
    this is due to a conditional check in scsi_verify_blk_ioctl().

    # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

    # ./sgio_inquiry /dev/mapper/85ag56 &
    [1] 72830

    # cat /proc/72830/stack
    [] 0xc00000171c0df700
    [] __switch_to+0x204/0x350
    [] msleep+0x5c/0x80
    [] dm_blk_ioctl+0x70/0x170
    [] blkdev_ioctl+0x2b0/0x9b0
    [] block_ioctl+0x64/0xd0
    [] do_vfs_ioctl+0x490/0x780
    [] SyS_ioctl+0xd4/0xf0
    [] system_call+0x38/0xd0

    6) This is the function call chain exercised in this analysis:

    SYSCALL_DEFINE3(ioctl, ) @ fs/ioctl.c
    -> do_vfs_ioctl()
    -> vfs_ioctl()
    ...
    error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
    ...
    -> dm_blk_ioctl() @ drivers/md/dm.c
    -> multipath_ioctl() @ drivers/md/dm-mpath.c
    ...
    (bdev = NULL, due to no active paths)
    ...
    if (!bdev || ) {
    int err = scsi_verify_blk_ioctl(NULL, cmd);
    if (err)
    r = err;
    }
    ...
    -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
    ...
    if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
    return 0;
    ...
    if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
    return 0;
    ...
    printk_ratelimited(KERN_WARNING
    "%s: sending ioctl %x to a partition!\n" );

    return -ENOIOCTLCMD;

    'device')
    [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)

    Signed-off-by: Mauricio Faria de Oliveira
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Mauricio Faria de Oliveira
     

30 Oct, 2015

1 commit

  • Commit bfebd1cdb497a57757c83f5fbf1a29931591e2a4 ("dm: add full blk-mq
    support to request-based DM") moves the initialization of the fields
    backing_dev_info.congested_fn, backing_dev_info.congested_data and
    queuedata from the function dm_init_md_queue (that is called when the
    device is created) to dm_init_old_md_queue (that is called after the
    device type is determined).

    There is no locking when accessing these variables, thus it is possible
    for other parts of the kernel to briefly see this data in a transient
    state (e.g. queue->backing_dev_info.congested_fn initialized and
    md->queue->backing_dev_info.congested_data uninitialized, resulting in
    passing an incorrect parameter to the function dm_any_congested).

    This queue data is left initialized for blk-mq devices even though they
    that don't use it.

    Fixes: bfebd1cdb497 ("dm: add full blk-mq support to request-based DM")
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # v4.1+

    Mikulas Patocka
     

22 Oct, 2015

5 commits


15 Oct, 2015

2 commits


13 Oct, 2015

2 commits

  • Compiling the nvme driver on 32-bit warns about a cast from a __u64
    variable to a pointer:

    drivers/block/nvme-core.c: In function 'nvme_submit_io':
    drivers/block/nvme-core.c:1847:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    (void __user *)io.addr, length, NULL, 0);

    The cast here is intentional and safe, so we can shut up the
    gcc warning by adding an intermediate cast to 'uintptr_t'.

    I had previously submitted a patch to fix this problem in the
    nvme driver, but it was accepted on the same day that two new
    warnings got added.

    For clarification, I also change the third instance of this cast
    to use uintptr_t instead of unsigned long now.

    Signed-off-by: Arnd Bergmann
    Fixes: d29ec8241c10e ("nvme: submit internal commands through the block layer")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • The nvme driver was moved from drivers/block, losing our implicit
    dependency on CONFIG_BLOCK. This makes it an explicit driver dependency.

    Reported-by: Jim Davis
    Signed-off-by: Keith Busch
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     

10 Oct, 2015

15 commits


04 Oct, 2015

6 commits

  • Linus Torvalds
     
  • Pull strscpy string copy function implementation from Chris Metcalf.

    Chris sent this during the merge window, but I waffled back and forth on
    the pull request, which is why it's going in only now.

    The new "strscpy()" function is definitely easier to use and more secure
    than either strncpy() or strlcpy(), both of which are horrible nasty
    interfaces that have serious and irredeemable problems.

    strncpy() has a useless return value, and doesn't NUL-terminate an
    overlong result. To make matters worse, it pads a short result with
    zeroes, which is a performance disaster if you have big buffers.

    strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
    the insane NUL padding, but having a differently broken return value
    which returns the original length of the source string. Which means
    that it will read characters past the count from the source buffer, and
    you have to trust the source to be properly terminated. It also makes
    error handling fragile, since the test for overflow is unnecessarily
    subtle.

    strscpy() avoids both these problems, guaranteeing the NUL termination
    (but not excessive padding) if the destination size wasn't zero, and
    making the overflow condition very obvious by returning -E2BIG. It also
    doesn't read past the size of the source, and can thus be used for
    untrusted source data too.

    So why did I waffle about this for so long?

    Every time we introduce a new-and-improved interface, people start doing
    these interminable series of trivial conversion patches.

    And every time that happens, somebody does some silly mistake, and the
    conversion patch to the improved interface actually makes things worse.
    Because the patch is mindnumbing and trivial, nobody has the attention
    span to look at it carefully, and it's usually done over large swatches
    of source code which means that not every conversion gets tested.

    So I'm pulling the strscpy() support because it *is* a better interface.
    But I will refuse to pull mindless conversion patches. Use this in
    places where it makes sense, but don't do trivial patches to fix things
    that aren't actually known to be broken.

    * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    tile: use global strscpy() rather than private copy
    string: provide strscpy()
    Make asm/word-at-a-time.h available on all architectures

    Linus Torvalds
     
  • Pull md fixes from Neil Brown:
    "Assorted fixes for md in 4.3-rc.

    Two tagged for -stable, and one is really a cleanup to match and
    improve kmemcache interface.

    * tag 'md/4.3-fixes' of git://neil.brown.name/md:
    md/bitmap: don't pass -1 to bitmap_storage_alloc.
    md/raid1: Avoid raid1 resync getting stuck
    md: drop null test before destroy functions
    md: clear CHANGE_PENDING in readonly array
    md/raid0: apply base queue limits *before* disk_stack_limits
    md/raid5: don't index beyond end of array in need_this_block().
    raid5: update analysis state for failed stripe
    md: wait for pending superblock updates before switching to read-only

    Linus Torvalds
     
  • Pull MIPS updates from Ralf Baechle:
    "This week's round of MIPS fixes:
    - Fix JZ4740 build
    - Fix fallback to GFP_DMA
    - FP seccomp in case of ENOSYS
    - Fix bootmem panic
    - A number of FP and CPS fixes
    - Wire up new syscalls
    - Make sure BPF assembler objects can properly be disassembled
    - Fix BPF assembler code for MIPS I"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
    MIPS: scall: Always run the seccomp syscall filters
    MIPS: Octeon: Fix kernel panic on startup from memory corruption
    MIPS: Fix R2300 FP context switch handling
    MIPS: Fix octeon FP context switch handling
    MIPS: BPF: Fix load delay slots.
    MIPS: BPF: Do all exports of symbols with FEXPORT().
    MIPS: Fix the build on jz4740 after removing the custom gpio.h
    MIPS: CPS: #ifdef on CONFIG_MIPS_MT_SMP rather than CONFIG_MIPS_MT
    MIPS: CPS: Don't include MT code in non-MT kernels.
    MIPS: CPS: Stop dangling delay slot from has_mt.
    MIPS: dma-default: Fix 32-bit fall back to GFP_DMA
    MIPS: Wire up userfaultfd and membarrier syscalls.

    Linus Torvalds
     
  • Pull irq fixes from Thomas Gleixner:
    "This update contains:

    - Fix for a long standing race affecting /proc/irq/NNN

    - One line fix for ARM GICV3-ITS counting the wrong data

    - Warning silencing in ARM GICV3-ITS. Another GCC trying to be
    overly clever issue"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/gic-v3-its: Count additional LPIs for the aliased devices
    irqchip/gic-v3-its: Silence warning when its_lpi_alloc_chunks gets inlined
    genirq: Fix race in register_irq_proc()

    Linus Torvalds
     
  • The MIPS syscall handler code used to return -ENOSYS on invalid
    syscalls. Whilst this is expected, it caused problems for seccomp
    filters because the said filters never had the change to run since
    the code returned -ENOSYS before triggering them. This caused
    problems on the chromium testsuite for filters looking for invalid
    syscalls. This has now changed and the seccomp filters are always
    run even if the syscall is invalid. We return -ENOSYS once we
    return from the seccomp filters. Moreover, similar codepaths have
    been merged in the process which simplifies somewhat the overall
    syscall code.

    Signed-off-by: Markos Chandras
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/11236/
    Signed-off-by: Ralf Baechle

    Markos Chandras
     

03 Oct, 2015

4 commits

  • Pull x86 fixes from Ingo Molnar:
    "Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two
    build failure fixes for corner case configs, x32 header fix and a
    speling fix"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
    x86/mm: Set NX on gap between __ex_table and rodata
    x86/kexec: Fix kexec crash in syscall kexec_file_load()
    x86/process: Unify 32bit and 64bit implementations of get_wchan()
    x86/process: Add proper bound checks in 64bit get_wchan()
    x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
    x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case
    x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag

    Linus Torvalds
     
  • Pull timer fixes from Ingo Molnar:
    "An abs64() fix in the watchdog driver, and two clocksource driver
    NO_IRQ assumption fixes"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    clocksource: Fix abs() usage w/ 64bit values
    clocksource/drivers/keystone: Fix bad NO_IRQ usage
    clocksource/drivers/rockchip: Fix bad NO_IRQ usage

    Linus Torvalds
     
  • Pull EFI fixes from Ingo Molnar:
    "Two EFI fixes: one for x86, one for ARM, fixing a boot crash bug that
    can trigger under newer EFI firmware"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions
    x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Bunch of fixes all over the place, all pretty small: amdgpu, i915,
    exynos, one qxl and one vmwgfx.

    There is also a bunch of mst fixes, I left some cleanups in the series
    as I didn't think it was worth splitting up the tested series"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
    drm/dp/mst: add some defines for logical/physical ports
    drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
    drm/dp/mst: split connector registration into two parts (v2)
    drm/dp/mst: update the link_address_sent before sending the link address (v3)
    drm/dp/mst: fixup handling hotplug on port removal.
    drm/dp/mst: don't pass port into the path builder function
    drm/radeon: drop radeon_fb_helper_set_par
    drm: handle cursor_set2 in restore_fbdev_mode
    drm/exynos: Staticize local function in exynos_drm_gem.c
    drm/exynos: fimd: actually disable dp clock
    drm/exynos: dp: remove suspend/resume functions
    drm/qxl: recreate the primary surface when the bo is not primary
    drm/amdgpu: only print meaningful VM faults
    drm/amdgpu/cgs: remove import_gpu_mem
    drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2
    drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
    drm/vmwgfx: Fix a command submission hang regression
    drm/exynos: remove unused mode_fixup() code
    drm/exynos: remove decon_mode_fixup()
    drm/exynos: remove fimd_mode_fixup()
    ...

    Linus Torvalds