08 Aug, 2016

2 commits

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Commit abf545484d31 changed it from an 'rw' flags type to the
    newer ops based interface, but now we're effectively leaking
    some bdev internals to the rest of the kernel. Since we only
    care about whether it's a read or a write at that level, just
    pass in a bool 'is_write' parameter instead.

    Then we can also move op_is_write() and friends back under
    CONFIG_BLOCK protection.

    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Aug, 2016

5 commits

  • The rw_page users were not converted to use bio/req ops. As a result
    bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
    be sent down as reads.

    Signed-off-by: Mike Christie
    Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code")

    Modified by me to:

    1) Drop op_flags passing into ->rw_page(), as we don't use it.
    2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK

    Signed-off-by: Jens Axboe

    Mike Christie
     
  • Use a switch statement to iterate over the possible operations and
    error out if it's an incorrect one.

    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Fix a fat-fingered conversion to the req_op accessors, and also
    use a switch statement to make it more obvious what is being checked.

    Signed-off-by: Christoph Hellwig
    Reported-by: Dave Chinner
    Fixes: c2df40 ("drivers: use req op accessor");
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Quentin ran into this bug:

    WARNING: CPU: 64 PID: 10085 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80
    sysfs: cannot create duplicate filename '/devices/virtual/block/nbd3/pid'
    Modules linked in: nbd
    CPU: 64 PID: 10085 Comm: qemu-nbd Tainted: G D 4.6.0+ #7
    0000000000000000 ffff8820330bba68 ffffffff814b8791 ffff8820330bbac8
    0000000000000000 ffff8820330bbab8 ffffffff810d04ab ffff8820330bbaa8
    0000001f00000296 0000000000017681 ffff8810380bf000 ffffffffa0001790
    Call Trace:
    [] dump_stack+0x4d/0x6c
    [] __warn+0xdb/0x100
    [] warn_slowpath_fmt+0x44/0x50
    [] sysfs_warn_dup+0x65/0x80
    [] sysfs_add_file_mode_ns+0x172/0x180
    [] sysfs_create_file_ns+0x25/0x30
    [] device_create_file+0x36/0x90
    [] __nbd_ioctl+0x32d/0x9b0 [nbd]
    [] ? find_next_bit+0x18/0x20
    [] ? select_idle_sibling+0xe9/0x120
    [] ? __enqueue_entity+0x67/0x70
    [] ? enqueue_task_fair+0x630/0xe20
    [] ? resched_curr+0x36/0x70
    [] ? check_preempt_curr+0x78/0x90
    [] ? ttwu_do_wakeup+0x12/0x80
    [] ? ttwu_do_activate.constprop.86+0x61/0x70
    [] ? try_to_wake_up+0x185/0x2d0
    [] ? default_wake_function+0xd/0x10
    [] ? autoremove_wake_function+0x11/0x40
    [] nbd_ioctl+0x67/0x94 [nbd]
    [] blkdev_ioctl+0x14d/0x940
    [] ? put_pipe_info+0x22/0x60
    [] block_ioctl+0x3c/0x40
    [] do_vfs_ioctl+0x8d/0x5e0
    [] ? ____fput+0x9/0x10
    [] ? task_work_run+0x72/0x90
    [] SyS_ioctl+0x47/0x80
    [] entry_SYSCALL_64_fastpath+0x17/0x93
    ---[ end trace 7899b295e4f850c8 ]---

    It seems fairly obvious that device_create_file() is not being protected
    from being run concurrently on the same nbd.

    Quentin found the following relevant commits:

    1a2ad21 nbd: add locking to nbd_ioctl
    90b8f28 [PATCH] end of methods switch: remove the old ones
    d4430d6 [PATCH] beginning of methods conversion
    08f8585 [PATCH] move block_device_operations to blkdev.h

    It would seem that the race was introduced in the process of moving nbd
    from BKL to unlocked ioctls.

    By setting nbd->task_recv while the mutex is held, we can prevent other
    processes from running concurrently (since nbd->task_recv is also checked
    while the mutex is held).

    Reported-and-tested-by: Quentin Casasnovas
    Cc: Markus Pargmann
    Cc: Paul Clements
    Cc: Pavel Machek
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Vegard Nossum
    Signed-off-by: Jens Axboe

    Vegard Nossum
     
  • Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
    side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
    this is being used setfdprm userspace for ioctl-only open().

    Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
    modes, while still keeping the original O_NDELAY bug fixed.

    Cc: stable@vger.kernel.org # v4.5+
    Reported-by: Wim Osterholt
    Tested-by: Wim Osterholt
    Signed-off-by: Jiri Kosina
    Signed-off-by: Jens Axboe

    Jiri Kosina
     

03 Aug, 2016

3 commits

  • Merge yet more updates from Andrew Morton:

    - the rest of ocfs2

    - various hotfixes, mainly MM

    - quite a bit of misc stuff - drivers, fork, exec, signals, etc.

    - printk updates

    - firmware

    - checkpatch

    - nilfs2

    - more kexec stuff than usual

    - rapidio updates

    - w1 things

    * emailed patches from Andrew Morton : (111 commits)
    ipc: delete "nr_ipc_ns"
    kcov: allow more fine-grained coverage instrumentation
    init/Kconfig: add clarification for out-of-tree modules
    config: add android config fragments
    init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
    relay: add global mode support for buffer-only channels
    init: allow blacklisting of module_init functions
    w1:omap_hdq: fix regression
    w1: add helper macro module_w1_family
    w1: remove need for ida and use PLATFORM_DEVID_AUTO
    rapidio/switches: add driver for IDT gen3 switches
    powerpc/fsl_rio: apply changes for RIO spec rev 3
    rapidio: modify for rev.3 specification changes
    rapidio: change inbound window size type to u64
    rapidio/idt_gen2: fix locking warning
    rapidio: fix error handling in mbox request/release functions
    rapidio/tsi721_dma: advance queue processing from transfer submit call
    rapidio/tsi721: add messaging mbox selector parameter
    rapidio/tsi721: add PCIe MRRS override parameter
    rapidio/tsi721_dma: add channel mask and queue size parameters
    ...

    Linus Torvalds
     
  • Pull Ceph updates from Ilya Dryomov:
    "The highlights are:

    - RADOS namespace support in libceph and CephFS (Zheng Yan and
    myself). The stopgaps added in 4.5 to deny access to inodes in
    namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
    bit is now fully supported

    - A large rework of the MDS cap flushing code (Zheng Yan)

    - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
    overly pessimistic before, bailing at the first sight of LOOKUP_RCU

    On top of that we've got a few CephFS bug fixes, a couple of cleanups
    and Arnd's workaround for a weird genksyms issue"

    * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
    ceph: fix symbol versioning for ceph_monc_do_statfs
    ceph: Correctly return NXIO errors from ceph_llseek
    ceph: Mark the file cache as unreclaimable
    ceph: optimize cap flush waiting
    ceph: cleanup ceph_flush_snaps()
    ceph: kick cap flushes before sending other cap message
    ceph: introduce an inode flag to indicates if snapflush is needed
    ceph: avoid sending duplicated cap flush message
    ceph: unify cap flush and snapcap flush
    ceph: use list instead of rbtree to track cap flushes
    ceph: update types of some local varibles
    ceph: include 'follows' of pending snapflush in cap reconnect message
    ceph: update cap reconnect message to version 3
    ceph: mount non-default filesystem by name
    libceph: fsmap.user subscription support
    ceph: handle LOOKUP_RCU in ceph_d_revalidate
    ceph: allow dentry_lease_is_valid to work under RCU walk
    ceph: clear d_fsinfo pointer under d_lock
    ceph: remove ceph_mdsc_lease_release
    ceph: don't use ->d_time
    ...

    Linus Torvalds
     
  • kernel.h header doesn't directly use dynamic debug, instead we can
    include it in module.c (which used it via kernel.h). printk.h only uses
    it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen
    in that case.

    Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com
    [luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h]
    Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com
    Signed-off-by: Luis de Bethencourt
    Cc: Rusty Russell
    Cc: Hidehiro Kawai
    Cc: Borislav Petkov
    Cc: Michal Nazarewicz
    Cc: Rasmus Villemoes
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis de Bethencourt
     

29 Jul, 2016

2 commits

  • Pull libnvdimm updates from Dan Williams:

    - Replace pcommit with ADR / directed-flushing.

    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement
    either ADR, or provide one or more flush addresses per nvdimm.

    ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
    to the memory controller on a power-fail event.

    Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
    Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
    A flush hint is an mmio address that when written and fenced assures
    that all previous posted writes targeting a given dimm have been
    flushed to media.

    - On-demand ARS (address range scrub).

    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices. When latent errors are detected we re-scrub the
    media to refresh the bad block list, userspace can also request a
    re-scrub at any time.

    - Support for the Microsoft DSM (device specific method) command
    format.

    - Support for EDK2/OVMF virtual disk device memory ranges.

    - Various fixes and cleanups across the subsystem.

    * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
    libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
    nfit: do an ARS scrub on hitting a latent media error
    nfit: move to nfit/ sub-directory
    nfit, libnvdimm: allow an ARS scrub to be triggered on demand
    libnvdimm: register nvdimm_bus devices with an nd_bus driver
    pmem: clarify a debug print in pmem_clear_poison
    x86/insn: remove pcommit
    Revert "KVM: x86: add pcommit support"
    nfit, tools/testing/nvdimm/: unify shutdown paths
    libnvdimm: move ->module to struct nvdimm_bus_descriptor
    nfit: cleanup acpi_nfit_init calling convention
    nfit: fix _FIT evaluation memory leak + use after free
    tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
    tools/testing/nvdimm: add virtual ramdisk range
    acpi, nfit: treat virtual ramdisk SPA as pmem region
    pmem: kill __pmem address space
    pmem: kill wmb_pmem()
    libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
    fs/dax: remove wmb_pmem()
    libnvdimm, pmem: flush posted-write queues on shutdown
    ...

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    Probably the most interesting part long-term is ->d_init() - that will
    have a bunch of followups in (at least) ceph and lustre, but we'll
    need to sort the barrier-related rules before it can get used for
    really non-trivial stuff.

    Another fun thing is the merge of ->d_iput() callers (dentry_iput()
    and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
    except the one in __d_lookup_lru())"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    fs/dcache.c: avoid soft-lockup in dput()
    vfs: new d_init method
    vfs: Update lookup_dcache() comment
    bdev: get rid of ->bd_inodes
    Remove last traces of ->sync_page
    new helper: d_same_name()
    dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
    vfs: clean up documentation
    vfs: document ->d_real()
    vfs: merge .d_select_inode() into .d_real()
    unify dentry_iput() and dentry_unlink_inode()
    binfmt_misc: ->s_root is not going anywhere
    drop redundant ->owner initializations
    ufs: get rid of redundant checks
    orangefs: constify inode_operations
    missed comment updates from ->direct_IO() prototype change
    file_inode(f)->i_mapping is f->f_mapping
    trim fsnotify hooks a bit
    9p: new helper - v9fs_parent_fid()
    debugfs: ->d_parent is never NULL or negative
    ...

    Linus Torvalds
     

28 Jul, 2016

3 commits

  • Add pool namesapce pointer to struct ceph_file_layout and struct
    ceph_object_locator. Pool namespace is used by when mapping object
    to PG, it's also used when composing OSD request.

    The namespace pointer in struct ceph_file_layout is RCU protected.
    So libceph can read namespace without taking lock.

    Signed-off-by: Yan, Zheng
    [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Define new ceph_file_layout structure and rename old ceph_file_layout
    to ceph_file_layout_legacy. This is preparation for adding namespace
    to ceph_file_layout structure.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Pull xen updates from David Vrabel:
    "Features and fixes for 4.8-rc0:

    - ACPI support for guests on ARM platforms.
    - Generic steal time support for arm and x86.
    - Support cases where kernel cpu is not Xen VCPU number (e.g., if
    in-guest kexec is used).
    - Use the system workqueue instead of a custom workqueue in various
    places"

    * tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
    xen: add static initialization of steal_clock op to xen_time_ops
    xen/pvhvm: run xen_vcpu_setup() for the boot CPU
    xen/evtchn: use xen_vcpu_id mapping
    xen/events: fifo: use xen_vcpu_id mapping
    xen/events: use xen_vcpu_id mapping in events_base
    x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
    x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
    xen: introduce xen_vcpu_id mapping
    x86/acpi: store ACPI ids from MADT for future usage
    x86/xen: update cpuid.h from Xen-4.7
    xen/evtchn: add IOCTL_EVTCHN_RESTRICT
    xen-blkback: really don't leak mode property
    xen-blkback: constify instance of "struct attribute_group"
    xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
    xen-blkback: prefer xenbus_scanf() over xenbus_gather()
    xen: support runqueue steal time on xen
    arm/xen: add support for vm_assist hypercall
    xen: update xen headers
    xen-pciback: drop superfluous variables
    xen-pciback: short-circuit read path used for merging write values
    ...

    Linus Torvalds
     

27 Jul, 2016

10 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds
     
  • Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE from
    now on.

    I did test to see how it helps to make higher order pages. Test
    scenario is as follows.

    KVM guest, 1G memory, ext4 formated zram block device,

    for i in `seq 1 8`;
    do
    dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
    done

    wait `pidof dd`

    for i in `seq 1 2 8`;
    do
    rm -rf mnt/test$i.txt
    done
    fstrim -v mnt

    echo "init"
    cat /proc/buddyinfo

    echo "compaction"
    echo 1 > /proc/sys/vm/compact_memory
    cat /proc/buddyinfo

    old:

    init
    Node 0, zone DMA 208 120 51 41 11 0 0 0 0 0 0
    Node 0, zone DMA32 16380 13777 9184 3805 789 54 3 0 0 0 0
    compaction
    Node 0, zone DMA 132 82 40 39 16 2 1 0 0 0 0
    Node 0, zone DMA32 5219 5526 4969 3455 1831 677 139 15 0 0 0

    new:

    init
    Node 0, zone DMA 379 115 97 19 2 0 0 0 0 0 0
    Node 0, zone DMA32 18891 16774 10862 3947 637 21 0 0 0 0 0
    compaction
    Node 0, zone DMA 214 66 87 29 10 3 0 0 0 0 0
    Node 0, zone DMA32 1612 3139 3154 2469 1745 990 384 94 7 0 0

    As you can see, compaction made so many high-order pages. Yay!

    Link: http://lkml.kernel.org/r/1464736881-24886-13-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • We now allocate streams from CPU_UP hot-plug path, there are no
    context-dependent stream allocations anymore and we can schedule from
    zcomp_strm_alloc(). Use GFP_KERNEL directly and drop a gfp_t parameter.

    Link: http://lkml.kernel.org/r/20160531122017.2878-9-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Add "deflate", "lz4hc", "842" algorithms to the list of known
    compression backends. The real availability of those algorithms,
    however, depends on the corresponding CONFIG_CRYPTO_FOO config options.

    [sergey.senozhatsky@gmail.com: zram-add-more-compression-algorithms-v3]
    Link: http://lkml.kernel.org/r/20160604024902.11778-7-sergey.senozhatsky@gmail.com
    Link: http://lkml.kernel.org/r/20160531122017.2878-8-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Remove lzo/lz4 backends, we use crypto API now.

    [sergey.senozhatsky@gmail.com: zram-delete-custom-lzo-lz4-v3]
    Link: http://lkml.kernel.org/r/20160604024902.11778-6-sergey.senozhatsky@gmail.com
    Link: http://lkml.kernel.org/r/20160531122017.2878-7-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • There is no way to get a string with all the crypto comp algorithms
    supported by the crypto comp engine, so we need to maintain our own
    backends list. At the same time we additionally need to use
    crypto_has_comp() to make sure that the user has requested a compression
    algorithm that is recognized by the crypto comp engine. Relying on
    /proc/crypto is not an options here, because it does not show
    not-yet-inserted compression modules.

    Example:

    modprobe zram
    cat /proc/crypto | grep -i lz4
    modprobe lz4
    cat /proc/crypto | grep -i lz4
    name : lz4
    driver : lz4-generic
    module : lz4

    So the user can't tell exactly if the lz4 is really supported from
    /proc/crypto output, unless someone or something has loaded it.

    This patch also adds crypto_has_comp() to zcomp_available_show(). We
    store all the compression algorithms names in zcomp's `backends' array,
    regardless the CONFIG_CRYPTO_FOO configuration, but show only those that
    are also supported by crypto engine. This helps user to know the exact
    list of compression algorithms that can be used.

    Example:
    module lz4 is not loaded yet, but is supported by the crypto
    engine. /proc/crypto has no information on this module, while
    zram's `comp_algorithm' lists it:

    cat /proc/crypto | grep -i lz4

    cat /sys/block/zram0/comp_algorithm
    [lzo] lz4 deflate lz4hc 842

    We still use the `backends' array to determine if the requested
    compression backend is known to crypto api. This array, however, may not
    contain some entries, therefore as the last step we call crypto_has_comp()
    function which attempts to insmod the requested compression algorithm to
    determine if crypto api supports it. The advantage of this method is that
    now we permit the usage of out-of-tree crypto compression modules
    (implementing S/W or H/W compression).

    [sergey.senozhatsky@gmail.com: zram-use-crypto-api-to-check-alg-availability-v3]
    Link: http://lkml.kernel.org/r/20160604024902.11778-4-sergey.senozhatsky@gmail.com
    Link: http://lkml.kernel.org/r/20160531122017.2878-5-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Joonsoo Kim
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • We don't have an idle zstreams list anymore and our write path now works
    absolutely differently, preventing preemption during compression. This
    removes possibilities of read paths preempting writes at wrong places
    (which could badly affect the performance of both paths) and at the same
    time opens the door for a move from custom LZO/LZ4 compression backends
    implementation to a more generic one, using crypto compress API.

    Joonsoo Kim [1] attempted to do this a while ago, but faced with the
    need of introducing a new crypto API interface. The root cause was the
    fact that crypto API compression algorithms require a compression stream
    structure (in zram terminology) for both compression and decompression
    ops, while in reality only several of compression algorithms really need
    it. This resulted in a concept of context-less crypto API compression
    backends [2]. Both write and read paths, though, would have been
    executed with the preemption enabled, which in the worst case could have
    resulted in a decreased worst-case performance, e.g. consider the
    following case:

    CPU0

    zram_write()
    spin_lock()
    take the last idle stream
    spin_unlock()

    << preempted >>

    zram_read()
    spin_lock()
    no idle streams
    spin_unlock()
    schedule()

    resuming zram_write compression()

    but it took me some time to realize that, and it took even longer to
    evolve zram and to make it ready for crypto API. The key turned out to be
    -- drop the idle streams list entirely. Without the idle streams list we
    are free to use compression algorithms that require compression stream for
    decompression (read), because streams are now placed in per-cpu data and
    each write path has to disable preemption for compression op, almost
    completely eliminating the aforementioned case (technically, we still have
    a small chance, because write path has a fast and a slow paths and the
    slow path is executed with the preemption enabled; but the frequency of
    failed fast path is too low).

    TEST
    ====

    - 4 CPUs, x86_64 system
    - 3G zram, lzo
    - fio tests: read, randread, write, randwrite, rw, randrw

    test script [3] command:
    ZRAM_SIZE=3G LOG_SUFFIX=XXXX FIO_LOOPS=5 ./zram-fio-test.sh

    BASE PATCHED
    jobs1
    READ: 2527.2MB/s 2482.7MB/s
    READ: 2102.7MB/s 2045.0MB/s
    WRITE: 1284.3MB/s 1324.3MB/s
    WRITE: 1080.7MB/s 1101.9MB/s
    READ: 430125KB/s 437498KB/s
    WRITE: 430538KB/s 437919KB/s
    READ: 399593KB/s 403987KB/s
    WRITE: 399910KB/s 404308KB/s
    jobs2
    READ: 8133.5MB/s 7854.8MB/s
    READ: 7086.6MB/s 6912.8MB/s
    WRITE: 3177.2MB/s 3298.3MB/s
    WRITE: 2810.2MB/s 2871.4MB/s
    READ: 1017.6MB/s 1023.4MB/s
    WRITE: 1018.2MB/s 1023.1MB/s
    READ: 977836KB/s 984205KB/s
    WRITE: 979435KB/s 985814KB/s
    jobs3
    READ: 13557MB/s 13391MB/s
    READ: 11876MB/s 11752MB/s
    WRITE: 4641.5MB/s 4682.1MB/s
    WRITE: 4164.9MB/s 4179.3MB/s
    READ: 1453.8MB/s 1455.1MB/s
    WRITE: 1455.1MB/s 1458.2MB/s
    READ: 1387.7MB/s 1395.7MB/s
    WRITE: 1386.1MB/s 1394.9MB/s
    jobs4
    READ: 20271MB/s 20078MB/s
    READ: 18033MB/s 17928MB/s
    WRITE: 6176.8MB/s 6180.5MB/s
    WRITE: 5686.3MB/s 5705.3MB/s
    READ: 2009.4MB/s 2006.7MB/s
    WRITE: 2007.5MB/s 2004.9MB/s
    READ: 1929.7MB/s 1935.6MB/s
    WRITE: 1926.8MB/s 1932.6MB/s
    jobs5
    READ: 18823MB/s 19024MB/s
    READ: 18968MB/s 19071MB/s
    WRITE: 6191.6MB/s 6372.1MB/s
    WRITE: 5818.7MB/s 5787.1MB/s
    READ: 2011.7MB/s 1981.3MB/s
    WRITE: 2011.4MB/s 1980.1MB/s
    READ: 1949.3MB/s 1935.7MB/s
    WRITE: 1940.4MB/s 1926.1MB/s
    jobs6
    READ: 21870MB/s 21715MB/s
    READ: 19957MB/s 19879MB/s
    WRITE: 6528.4MB/s 6537.6MB/s
    WRITE: 6098.9MB/s 6073.6MB/s
    READ: 2048.6MB/s 2049.9MB/s
    WRITE: 2041.7MB/s 2042.9MB/s
    READ: 2013.4MB/s 1990.4MB/s
    WRITE: 2009.4MB/s 1986.5MB/s
    jobs7
    READ: 21359MB/s 21124MB/s
    READ: 19746MB/s 19293MB/s
    WRITE: 6660.4MB/s 6518.8MB/s
    WRITE: 6211.6MB/s 6193.1MB/s
    READ: 2089.7MB/s 2080.6MB/s
    WRITE: 2085.8MB/s 2076.5MB/s
    READ: 2041.2MB/s 2052.5MB/s
    WRITE: 2037.5MB/s 2048.8MB/s
    jobs8
    READ: 20477MB/s 19974MB/s
    READ: 18922MB/s 18576MB/s
    WRITE: 6851.9MB/s 6788.3MB/s
    WRITE: 6407.7MB/s 6347.5MB/s
    READ: 2134.8MB/s 2136.1MB/s
    WRITE: 2132.8MB/s 2134.4MB/s
    READ: 2074.2MB/s 2069.6MB/s
    WRITE: 2087.3MB/s 2082.4MB/s
    jobs9
    READ: 19797MB/s 19994MB/s
    READ: 18806MB/s 18581MB/s
    WRITE: 6878.7MB/s 6822.7MB/s
    WRITE: 6456.8MB/s 6447.2MB/s
    READ: 2141.1MB/s 2154.7MB/s
    WRITE: 2144.4MB/s 2157.3MB/s
    READ: 2084.1MB/s 2085.1MB/s
    WRITE: 2091.5MB/s 2092.5MB/s
    jobs10
    READ: 19794MB/s 19784MB/s
    READ: 18794MB/s 18745MB/s
    WRITE: 6984.4MB/s 6676.3MB/s
    WRITE: 6532.3MB/s 6342.7MB/s
    READ: 2150.6MB/s 2155.4MB/s
    WRITE: 2156.8MB/s 2161.5MB/s
    READ: 2106.4MB/s 2095.6MB/s
    WRITE: 2109.7MB/s 2098.4MB/s

    BASE PATCHED
    jobs1 perfstat
    stalled-cycles-frontend 102,480,595,419 ( 41.53%) 114,508,864,804 ( 46.92%)
    stalled-cycles-backend 51,941,417,832 ( 21.05%) 46,836,112,388 ( 19.19%)
    instructions 283,612,054,215 ( 1.15) 283,918,134,959 ( 1.16)
    branches 56,372,560,385 ( 724.923) 56,449,814,753 ( 733.766)
    branch-misses 374,826,000 ( 0.66%) 326,935,859 ( 0.58%)
    jobs2 perfstat
    stalled-cycles-frontend 155,142,745,777 ( 40.99%) 164,170,979,198 ( 43.82%)
    stalled-cycles-backend 70,813,866,387 ( 18.71%) 66,456,858,165 ( 17.74%)
    instructions 463,436,648,173 ( 1.22) 464,221,890,191 ( 1.24)
    branches 91,088,733,902 ( 760.088) 91,278,144,546 ( 769.133)
    branch-misses 504,460,363 ( 0.55%) 394,033,842 ( 0.43%)
    jobs3 perfstat
    stalled-cycles-frontend 201,300,397,212 ( 39.84%) 223,969,902,257 ( 44.44%)
    stalled-cycles-backend 87,712,593,974 ( 17.36%) 81,618,888,712 ( 16.19%)
    instructions 642,869,545,023 ( 1.27) 644,677,354,132 ( 1.28)
    branches 125,724,560,594 ( 690.682) 126,133,159,521 ( 694.542)
    branch-misses 527,941,798 ( 0.42%) 444,782,220 ( 0.35%)
    jobs4 perfstat
    stalled-cycles-frontend 246,701,197,429 ( 38.12%) 280,076,030,886 ( 43.29%)
    stalled-cycles-backend 119,050,341,112 ( 18.40%) 110,955,641,671 ( 17.15%)
    instructions 822,716,962,127 ( 1.27) 825,536,969,320 ( 1.28)
    branches 160,590,028,545 ( 688.614) 161,152,996,915 ( 691.068)
    branch-misses 650,295,287 ( 0.40%) 550,229,113 ( 0.34%)
    jobs5 perfstat
    stalled-cycles-frontend 298,958,462,516 ( 38.30%) 344,852,200,358 ( 44.16%)
    stalled-cycles-backend 137,558,742,122 ( 17.62%) 129,465,067,102 ( 16.58%)
    instructions 1,005,714,688,752 ( 1.29) 1,007,657,999,432 ( 1.29)
    branches 195,988,773,962 ( 697.730) 196,446,873,984 ( 700.319)
    branch-misses 695,818,940 ( 0.36%) 624,823,263 ( 0.32%)
    jobs6 perfstat
    stalled-cycles-frontend 334,497,602,856 ( 36.71%) 387,590,419,779 ( 42.38%)
    stalled-cycles-backend 163,539,365,335 ( 17.95%) 152,640,193,639 ( 16.69%)
    instructions 1,184,738,177,851 ( 1.30) 1,187,396,281,677 ( 1.30)
    branches 230,592,915,640 ( 702.902) 231,253,802,882 ( 702.356)
    branch-misses 747,934,786 ( 0.32%) 643,902,424 ( 0.28%)
    jobs7 perfstat
    stalled-cycles-frontend 396,724,684,187 ( 37.71%) 460,705,858,952 ( 43.84%)
    stalled-cycles-backend 188,096,616,496 ( 17.88%) 175,785,787,036 ( 16.73%)
    instructions 1,364,041,136,608 ( 1.30) 1,366,689,075,112 ( 1.30)
    branches 265,253,096,936 ( 700.078) 265,890,524,883 ( 702.839)
    branch-misses 784,991,589 ( 0.30%) 729,196,689 ( 0.27%)
    jobs8 perfstat
    stalled-cycles-frontend 440,248,299,870 ( 36.92%) 509,554,793,816 ( 42.46%)
    stalled-cycles-backend 222,575,930,616 ( 18.67%) 213,401,248,432 ( 17.78%)
    instructions 1,542,262,045,114 ( 1.29) 1,545,233,932,257 ( 1.29)
    branches 299,775,178,439 ( 697.666) 300,528,458,505 ( 694.769)
    branch-misses 847,496,084 ( 0.28%) 748,794,308 ( 0.25%)
    jobs9 perfstat
    stalled-cycles-frontend 506,269,882,480 ( 37.86%) 592,798,032,820 ( 44.43%)
    stalled-cycles-backend 253,192,498,861 ( 18.93%) 233,727,666,185 ( 17.52%)
    instructions 1,721,985,080,913 ( 1.29) 1,724,666,236,005 ( 1.29)
    branches 334,517,360,255 ( 694.134) 335,199,758,164 ( 697.131)
    branch-misses 873,496,730 ( 0.26%) 815,379,236 ( 0.24%)
    jobs10 perfstat
    stalled-cycles-frontend 549,063,363,749 ( 37.18%) 651,302,376,662 ( 43.61%)
    stalled-cycles-backend 281,680,986,810 ( 19.07%) 277,005,235,582 ( 18.55%)
    instructions 1,901,859,271,180 ( 1.29) 1,906,311,064,230 ( 1.28)
    branches 369,398,536,153 ( 694.004) 370,527,696,358 ( 688.409)
    branch-misses 967,929,335 ( 0.26%) 890,125,056 ( 0.24%)

    BASE PATCHED
    seconds elapsed 79.421641008 78.735285546
    seconds elapsed 61.471246133 60.869085949
    seconds elapsed 62.317058173 62.224188495
    seconds elapsed 60.030739363 60.081102518
    seconds elapsed 74.070398362 74.317582865
    seconds elapsed 84.985953007 85.414364176
    seconds elapsed 97.724553255 98.173311344
    seconds elapsed 109.488066758 110.268399318
    seconds elapsed 122.768189405 122.967164498
    seconds elapsed 135.130035105 136.934770801

    On my other system (8 x86_64 CPUs, short version of test results):

    BASE PATCHED
    seconds elapsed 19.518065994 19.806320662
    seconds elapsed 15.172772749 15.594718291
    seconds elapsed 13.820925970 13.821708564
    seconds elapsed 13.293097816 14.585206405
    seconds elapsed 16.207284118 16.064431606
    seconds elapsed 17.958376158 17.771825767
    seconds elapsed 19.478009164 19.602961508
    seconds elapsed 21.347152811 21.352318709
    seconds elapsed 24.478121126 24.171088735
    seconds elapsed 26.865057442 26.767327618

    So performance-wise the numbers are quite similar.

    Also update zcomp interface to be more aligned with the crypto API.

    [1] http://marc.info/?l=linux-kernel&m=144480832108927&w=2
    [2] http://marc.info/?l=linux-kernel&m=145379613507518&w=2
    [3] https://github.com/sergey-senozhatsky/zram-perf-test

    Link: http://lkml.kernel.org/r/20160531122017.2878-3-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Suggested-by: Minchan Kim
    Suggested-by: Joonsoo Kim
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • This has started as a 'add zlib support' work, but after some thinking I
    saw no blockers for a bigger change -- a switch to crypto API.

    We don't have an idle zstreams list anymore and our write path now works
    absolutely differently, preventing preemption during compression. This
    removes possibilities of read paths preempting writes at wrong places
    and opens the door for a move from custom LZO/LZ4 compression backends
    implementation to a more generic one, using crypto compress API.

    This patch set also eliminates the need of a new context-less crypto API
    interface, which was quite hard to sell, so we can move along faster.

    benchmarks:

    (x86_64, 4GB, zram-perf script)

    perf reported run-time fio (max jobs=3). I performed fio test with the
    increasing number of parallel jobs (max to 3) on a 3G zram device, using
    `static' data and the following crypto comp algorithms:

    842, deflate, lz4, lz4hc, lzo

    the output was:

    - test running time (which can tell us what algorithms performs faster)

    and

    - zram mm_stat (which tells the compressed memory size, max used memory, etc).

    It's just for information. for example, LZ4HC has twice the running
    time of LZO, but the compressed memory size is: 23592960 vs 34603008
    bytes.

    test-fio-zram-842
    197.907655282 seconds time elapsed
    201.623142884 seconds time elapsed
    226.854291345 seconds time elapsed
    test-fio-zram-DEFLATE
    253.259516155 seconds time elapsed
    258.148563401 seconds time elapsed
    290.251909365 seconds time elapsed
    test-fio-zram-LZ4
    27.022598717 seconds time elapsed
    29.580522717 seconds time elapsed
    33.293463430 seconds time elapsed
    test-fio-zram-LZ4HC
    56.393954615 seconds time elapsed
    74.904659747 seconds time elapsed
    101.940998564 seconds time elapsed
    test-fio-zram-LZO
    28.155948075 seconds time elapsed
    30.390036330 seconds time elapsed
    34.455773159 seconds time elapsed

    zram mm_stat-s (max fio jobs=3)

    test-fio-zram-842
    mm_stat (jobs1): 3221225472 673185792 690266112 0 690266112 0 0
    mm_stat (jobs2): 3221225472 673185792 690266112 0 690266112 0 0
    mm_stat (jobs3): 3221225472 673185792 690266112 0 690266112 0 0
    test-fio-zram-DEFLATE
    mm_stat (jobs1): 3221225472 24379392 37761024 0 37761024 0 0
    mm_stat (jobs2): 3221225472 24379392 37761024 0 37761024 0 0
    mm_stat (jobs3): 3221225472 24379392 37761024 0 37761024 0 0
    test-fio-zram-LZ4
    mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0
    mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0
    mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0
    test-fio-zram-LZ4HC
    mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0
    mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0
    mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0
    test-fio-zram-LZO
    mm_stat (jobs1): 3221225472 34603008 50335744 0 50335744 0 0
    mm_stat (jobs2): 3221225472 34603008 50335744 0 50335744 0 0
    mm_stat (jobs3): 3221225472 34603008 50335744 0 50339840 0 0

    This patch (of 8):

    We don't perform any zstream idle list lookup anymore, so
    zcomp_strm_find()/zcomp_strm_release() names are not representative.

    Rename to zcomp_stream_get()/zcomp_stream_put().

    Link: http://lkml.kernel.org/r/20160531122017.2878-2-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     

24 Jul, 2016

1 commit


22 Jul, 2016

4 commits


21 Jul, 2016

4 commits

  • Currently, presence of direct_access() in block_device_operations
    indicates support of DAX on its block device. Because
    block_device_operations is instantiated with 'const', this DAX
    capablity may not be enabled conditinally.

    In preparation for supporting DAX to device-mapper devices, add
    QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
    support. This will allow to set the DAX capability based on how
    mapped device is composed.

    Signed-off-by: Toshi Kani
    Acked-by: Dan Williams
    Signed-off-by: Mike Snitzer
    Cc: Jens Axboe
    Cc: Ross Zwisler
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc:
    Signed-off-by: Jens Axboe

    Toshi Kani
     
  • blk_get_request is used for BLOCK_PC and similar passthrough requests.
    Currently we always need to call blk_rq_set_block_pc or an open coded
    version of it to allow appending bios using the request mapping helpers
    later on, which is a somewhat awkward API. Instead move the
    initialization part of blk_rq_set_block_pc into blk_get_request, so that
    we always have a safe to use request.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Similar to how SCSI and NVMe prepare passthrough requests. This avoids
    poking into request internals too much.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

13 Jul, 2016

2 commits

  • The __pmem address space was meant to annotate codepaths that touch
    persistent memory and need to coordinate a call to wmb_pmem(). Now that
    wmb_pmem() is gone, there is little need to keep this annotation.

    Cc: Christoph Hellwig
    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     
  • There is no error number returned if loop driver fails in function
    alloc_disk to add new loop device. Add a correct error number to make
    user notify in this case.

    Signed-off-by: Minfei Huang
    Reviewed-by: Ming Lei
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Minfei Huang
     

09 Jul, 2016

1 commit

  • …dimm/nvdimm into for-4.8/drivers

    Dan writes:

    "The removal of ->driverfs_dev in favor of just passing the parent
    device in as a parameter to add_disk(). See below, it has received a
    "Reviewed-by" from Christoph, Bart, and Johannes.

    It is also a pre-requisite for Fam Zheng's work to cleanup gendisk
    uevents vs attribute visibility [1]. We would extend device_add_disk()
    to take an attribute_group list.

    This is based off a branch of block.git/for-4.8/drivers and has
    received a positive build success notification from the kbuild robot
    across several configs.

    [1]: "gendisk: Generate uevent after attribute available"
    http://marc.info/?l=linux-virtualization&m=146725201522201&w=2"

    Jens Axboe
     

08 Jul, 2016

1 commit

  • Pull block IO fixes from Jens Axboe:
    "Three small fixes that have been queued up and tested for this series:

    - A bug fix for xen-blkfront from Bob Liu, fixing an issue with
    incomplete requests during migration.

    - A fix for an ancient issue in retrieving the IO priority of a
    different PID than self, preventing that task from going away while
    we access it. From Omar.

    - A writeback fix from Tahsin, fixing a case where we'd call ihold()
    with a zero ref count inode"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix use-after-free in sys_ioprio_get()
    writeback: inode cgroup wb switch should not call ihold()
    xen-blkfront: save uncompleted reqs in blkfront_resume()

    Linus Torvalds
     

01 Jul, 2016

1 commit


30 Jun, 2016

1 commit