28 May, 2016

1 commit

  • Pull block fixes from Jens Axboe:
    "A set of fixes that wasn't included in the first merge window pull
    request. This pull request contains:

    - A set of NVMe fixes from Keith, and one from Nic for the integrity
    side of it.

    - Fix from Ming, clearing ->mq_ops if we don't successfully setup a
    queue for multiqueue.

    - A set of stability fixes for bcache from Jiri, and also marking
    bcache as orphaned as it's no longer actively maintained (in
    mainline, at least)"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    blk-mq: clear q->mq_ops if init fail
    MAINTAINERS: mark bcache as orphan
    bcache: bch_gc_thread() is not freezable
    bcache: bch_allocator_thread() is not freezable
    bcache: bch_writeback_thread() is not freezable
    nvme/host: Add missing blk_integrity tag_size + flags assignments
    NVMe: Add device ID's with stripe quirk
    NVMe: Short-cut removal on surprise hot-unplug
    NVMe: Allow user initiated rescan
    NVMe: Reduce driver log spamming
    NVMe: Unbind driver on failure
    NVMe: Delete only created queues
    NVMe: Allocate queues only for online cpus

    Linus Torvalds
     

27 May, 2016

1 commit

  • Pull misc DAX updates from Vishal Verma:
    "DAX error handling for 4.7

    - Until now, dax has been disabled if media errors were found on any
    device. This enables the use of DAX in the presence of these
    errors by making all sector-aligned zeroing go through the driver.

    - The driver (already) has the ability to clear errors on writes that
    are sent through the block layer using 'DSMs' defined in ACPI 6.1.

    Other misc changes:

    - When mounting DAX filesystems, check to make sure the partition is
    page aligned. This is a requirement for DAX, and previously, we
    allowed such unaligned mounts to succeed, but subsequent
    reads/writes would fail.

    - Misc/cleanup fixes from Jan that remove unused code from DAX
    related to zeroing, writeback, and some size checks"

    * tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    dax: fix a comment in dax_zero_page_range and dax_truncate_page
    dax: for truncate/hole-punch, do zeroing through the driver if possible
    dax: export a low-level __dax_zero_page_range helper
    dax: use sb_issue_zerout instead of calling dax_clear_sectors
    dax: enable dax in the presence of known media errors (badblocks)
    dax: fallback from pmd to pte on error
    block: Update blkdev_dax_capable() for consistency
    xfs: Add alignment check for DAX mount
    ext2: Add alignment check for DAX mount
    ext4: Add alignment check for DAX mount
    block: Add bdev_dax_supported() for dax mount checks
    block: Add vfs_msg() interface
    dax: Remove redundant inode size checks
    dax: Remove pointless writeback from dax_do_io()
    dax: Remove zeroing from dax_io()
    dax: Remove dead zeroing code from fault handlers
    ext2: Avoid DAX zeroing to corrupt data
    ext2: Fix block zeroing in ext2_get_blocks() for DAX
    dax: Remove complete_unwritten argument
    DAX: move RADIX_DAX_ definitions to dax.c

    Linus Torvalds
     

26 May, 2016

1 commit

  • blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
    was not cleared when blk_mq_init_allocated_queue() fails.
    Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
    - q->all_q_node is not added to all_q_list yet
    - q->tag_set is NULL
    - hctx was not setup yet or already freed

    Fixed it by clearing q->mq_ops on error path.

    Signed-off-by: Ming Lin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Ming Lin
     

24 May, 2016

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "The bulk of this update was stabilized before the merge window and
    appeared in -next. The "device dax" implementation was revised this
    week in response to review feedback, and to address failures detected
    by the recently expanded ndctl unit test suite.

    Not included in this pull request are two dax topic branches (dax
    error handling, and dax radix-tree locking). These topics were
    deferred to get a few more days of -next integration testing, and to
    coordinate a branch baseline with Ted and the ext4 tree. Vishal and
    Ross will send the error handling and locking topics respectively in
    the next few days.

    This branch has received a positive build result from the kbuild robot
    across 226 configs.

    Summary:

    - Device DAX for persistent memory: Device DAX is the device-centric
    analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory
    ranges to be allocated and mapped without need of an intervening
    file system. Device DAX is strict, precise and predictable.
    Specifically this interface:

    a) Guarantees fault granularity with respect to a given page size
    (pte, pmd, or pud) set at configuration time.

    b) Enforces deterministic behavior by being strict about what
    fault scenarios are supported.

    Persistent memory is the first target, but the mechanism is also
    targeted for exclusive allocations of performance/feature
    differentiated memory ranges.

    - Support for the HPE DSM (device specific method) command formats.
    This enables management of these first generation devices until a
    unified DSM specification materializes.

    - Further ACPI 6.1 compliance with support for the common dimm
    identifier format.

    - Various fixes and cleanups across the subsystem"

    * tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (40 commits)
    libnvdimm, dax: fix deletion
    libnvdimm, dax: fix alignment validation
    libnvdimm, dax: autodetect support
    libnvdimm: release ida resources
    Revert "block: enable dax for raw block devices"
    /dev/dax, core: file operations and dax-mmap
    /dev/dax, pmem: direct access to persistent memory
    libnvdimm: stop requiring a driver ->remove() method
    libnvdimm, dax: record the specified alignment of a dax-device instance
    libnvdimm, dax: reserve space to store labels for device-dax
    libnvdimm, dax: introduce device-dax infrastructure
    nfit: add sysfs dimm 'family' and 'dsm_mask' attributes
    tools/testing/nvdimm: ND_CMD_CALL support
    nfit: disable vendor specific commands
    nfit: export subsystem ids as attributes
    nfit: fix format interface code byte order per ACPI6.1
    nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism
    nfit, libnvdimm: clarify "commands" vs "_DSMs"
    libnvdimm: increase max envelope size for ioctl
    acpi/nfit: Add sysfs "id" for NVDIMM ID
    ...

    Linus Torvalds
     

21 May, 2016

2 commits


18 May, 2016

3 commits

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits)
    gitignore: fix wording
    mfd: ab8500-debugfs: fix "between" in printk
    memstick: trivial fix of spelling mistake on management
    cpupowerutils: bench: fix "average"
    treewide: Fix typos in printk
    IB/mlx4: printk fix
    pinctrl: sirf/atlas7: fix printk spelling
    serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/
    w1: comment spelling s/minmum/minimum/
    Blackfin: comment spelling s/divsor/divisor/
    metag: Fix misspellings in comments.
    ia64: Fix misspellings in comments.
    hexagon: Fix misspellings in comments.
    tools/perf: Fix misspellings in comments.
    cris: Fix misspellings in comments.
    c6x: Fix misspellings in comments.
    blackfin: Fix misspelling of 'register' in comment.
    avr32: Fix misspelling of 'definitions' in comment.
    treewide: Fix typos in printk
    Doc: treewide : Fix typos in DocBook/filesystem.xml
    ...

    Linus Torvalds
     
  • Pull block driver updates from Jens Axboe:
    "On top of the core pull request, this is the drivers pull request for
    this merge window. This contains:

    - Switch drivers to the new write back cache API, and kill off the
    flush flags. From me.

    - Kill the discard support for the STEC pci-e flash driver. It's
    trivially broken, and apparently unmaintained, so it's safer to
    just remove it. From Jeff Moyer.

    - A set of lightnvm updates from the usual suspects (Matias/Javier,
    and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei
    Tao.

    - A set of updates for NVMe:

    - Turn the controller state management into a proper state
    machine. From Christoph.

    - Shuffling of code in preparation for NVMe-over-fabrics, also
    from Christoph.

    - Cleanup of the command prep part from Ming Lin.

    - Rewrite of the discard support from Ming Lin.

    - Deadlock fix for namespace removal from Ming Lin.

    - Use the now exported blk-mq tag helper for IO termination.
    From Sagi.

    - Various little fixes from Christoph, Guilherme, Keith, Ming
    Lin, Wang Sheng-Hui.

    - Convert mtip32xx to use the now exported blk-mq tag iter function,
    from Keith"

    * 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits)
    lightnvm: reserved space calculation incorrect
    lightnvm: rename nr_pages to nr_ppas on nvm_rq
    lightnvm: add is_cached entry to struct ppa_addr
    lightnvm: expose gennvm_mark_blk to targets
    lightnvm: remove mgt targets on mgt removal
    lightnvm: pass dma address to hardware rather than pointer
    lightnvm: do not assume sequential lun alloc.
    nvme/lightnvm: Log using the ctrl named device
    lightnvm: rename dma helper functions
    lightnvm: enable metadata to be sent to device
    lightnvm: do not free unused metadata on rrpc
    lightnvm: fix out of bound ppa lun id on bb tbl
    lightnvm: refactor set_bb_tbl for accepting ppa list
    lightnvm: move responsibility for bad blk mgmt to target
    lightnvm: make nvm_set_rqd_ppalist() aware of vblks
    lightnvm: remove struct factory_blks
    lightnvm: refactor device ops->get_bb_tbl()
    lightnvm: introduce nvm_for_each_lun_ppa() macro
    lightnvm: refactor dev->online_target to global nvm_targets
    lightnvm: rename nvm_targets to nvm_tgt_type
    ...

    Linus Torvalds
     
  • Pull core block layer updates from Jens Axboe:
    "This is the core block IO changes for this merge window. Nothing
    earth shattering in here, it's mostly just fixes. In detail:

    - Fix for a long standing issue where wrong ordering in blk-mq caused
    order_to_size() to spew a warning. From Bart.

    - Async discard support from Christoph. Basically just splitting our
    sync interface into a submit + wait part.

    - Add a cleaner interface for flagging whether a device has a write
    back cache or not. We've previously overloaded blk_queue_flush()
    with this, but let's make it more explicit. Drivers cleaned up and
    updated in the drivers pull request. From me.

    - Fix for a double check for whether IO accounting is enabled or not.
    From Michael Callahan.

    - Fix for the async discard from Mike Snitzer, reinstating the early
    EOPNOTSUPP return if the device doesn't support discards.

    - Also from Mike, export bio_inc_remaining() so dm can drop it's
    private copy of it.

    - From Ming Lin, add support for passing in an offset for request
    payloads.

    - Tag function export from Sagi, which will be used in NVMe in the
    drivers pull.

    - Two blktrace related fixes from Shaohua.

    - Propagate NOMERGE flag when making a request from a bio, also from
    Shaohua.

    - An optimization to not parse cgroup paths in blk-throttle, if we
    don't need to. From Shaohua"

    * 'for-4.7/core' of git://git.kernel.dk/linux-block:
    blk-mq: fix undefined behaviour in order_to_size()
    blk-throttle: don't parse cgroup path if trace isn't enabled
    blktrace: add missed mask name
    blktrace: delete garbage for message trace
    block: make bio_inc_remaining() interface accessible again
    block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
    block: Minor blk_account_io_start usage cleanup
    block: add __blkdev_issue_discard
    block: remove struct bio_batch
    block: copy NOMERGE flag from bio to request
    block: add ability to flag write back caching on a device
    blk-mq: Export tagset iter function
    block: add offset in blk_add_request_payload()
    writeback: Fix performance regression in wb_over_bg_thresh()

    Linus Torvalds
     

17 May, 2016

1 commit

  • blkdev_dax_capable() is similar to bdev_dax_supported(), but needs
    to remain as a separate interface for checking dax capability of
    a raw block device.

    Rename and relocate blkdev_dax_capable() to keep them maintained
    consistently, and call bdev_direct_access() for the dax capability
    check.

    There is no change in the behavior.

    Link: https://lkml.org/lkml/2016/5/9/950
    Signed-off-by: Toshi Kani
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Cc: Jens Axboe
    Cc: Andreas Dilger
    Cc: Jan Kara
    Cc: Dave Chinner
    Cc: Dan Williams
    Cc: Ross Zwisler
    Cc: Christoph Hellwig
    Cc: Boaz Harrosh
    Signed-off-by: Vishal Verma

    Toshi Kani
     

16 May, 2016

1 commit

  • When this_order variable in blk_mq_init_rq_map() becomes zero
    the code incorrectly decrements the variable and passes the result
    to order_to_size() helper causing undefined behaviour:

    UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
    shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22

    Fix the code by checking this_order variable for not having the zero
    value first.

    Reported-by: Meelis Roos
    Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Jens Axboe

    Bartlomiej Zolnierkiewicz
     

11 May, 2016

1 commit


10 May, 2016

1 commit


06 May, 2016

2 commits

  • Commit 326e1dbb57 ("block: remove management of bi_remaining when
    restoring original bi_end_io") made bio_inc_remaining() private to bio.c
    because the only use-case that made sense was confined to the
    bio_chain() interface.

    Since that time DM thinp went on to use bio_chain() in its relatively
    complex implementation of async discard support. That implementation,
    even when converted over to use the new async __blkdev_issue_discard()
    interface, depends on deferred completion of the original discard bio --
    which is most appropriately implemented using bio_inc_remaining().

    DM thinp foolishly duplicated bio_inc_remaining(), local to dm-thin.c as
    __bio_inc_remaining(), so re-exporting bio_inc_remaining() allows us to
    put an end to that foolishness.

    All said, bio_inc_remaining() should really only be used in conjunction
    with bio_chain(). It isn't intended for generic bio reference counting.

    Signed-off-by: Mike Snitzer
    Acked-by: Joe Thornber
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • Commit 38f25255330 ("block: add __blkdev_issue_discard") incorrectly
    disallowed the early return of -EOPNOTSUPP if the device doesn't support
    discard (or secure discard). This early return of -EOPNOTSUPP has
    always been part of blkdev_issue_discard() interface so there isn't a
    good reason to break that behaviour -- especially when it can be easily
    reinstated.

    The nuance of allowing early return of -EOPNOTSUPP vs disallowing late
    return of -EOPNOTSUPP is: if the overall device never advertised support
    for discards and one is issued to the device it is beneficial to inform
    the caller that discards are not supported via -EOPNOTSUPP. But if a
    device advertises discard support it means that at least a subset of the
    device does have discard support -- but it could be that discards issued
    to some regions of a stacked device will not be supported. In that case
    the late return of -EOPNOTSUPP must be disallowed.

    Fixes: 38f25255330 ("block: add __blkdev_issue_discard")
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

03 May, 2016

1 commit


02 May, 2016

2 commits

  • This is a version of blkdev_issue_discard which doesn't wait for
    the I/O to complete, but instead allows the caller to submit
    the final bio and/or chain it to others.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ming Lin
    Signed-off-by: Sagi Grimberg
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • It can be replaced with a combination of bio_chain and submit_bio_wait.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ming Lin
    Signed-off-by: Sagi Grimberg
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Apr, 2016

1 commit


16 Apr, 2016

1 commit

  • Pull block fixes from Jens Axboe:
    "A few fixes for the current series. This contains:

    - Two fixes for NVMe:

    One fixes a reset race that can be triggered by repeated
    insert/removal of the module.

    The other fixes an issue on some platforms, where we get probe
    timeouts since legacy interrupts isn't working. This used not to
    be a problem since we had the worker thread poll for completions,
    but since that was killed off, it means those poor souls can't
    successfully probe their NVMe device. Use a proper IRQ check and
    probe (msi-x -> msi ->legacy), like most other drivers to work
    around this. Both from Keith.

    - A loop corruption issue with offset in iters, from Ming Lei.

    - A fix for not having the partition stat per cpu ref count
    initialized before sending out the KOBJ_ADD, which could cause user
    space to access the counter prior to initialization. Also from
    Ming Lei.

    - A fix for using the wrong congestion state, from Kaixu Xia"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: loop: fix filesystem corruption in case of aio/dio
    NVMe: Always use MSI/MSI-x interrupts
    NVMe: Fix reset/remove race
    writeback: fix the wrong congested state variable definition
    block: partition: initialize percpuref before sending out KOBJ_ADD

    Linus Torvalds
     

14 Apr, 2016

1 commit


13 Apr, 2016

6 commits


09 Apr, 2016

1 commit


05 Apr, 2016

2 commits

  • Mostly direct substitution with occasional adjustment or removing
    outdated comments.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

30 Mar, 2016

1 commit

  • The initialization of partition's percpu_ref should have been done before
    sending out KOBJ_ADD uevent, which may cause userspace to read partition
    table. So the uninitialized percpu_ref may be accessed in data path.

    This patch fixes this issue reported by Naveen.

    Reported-by: Naveen Kaje
    Tested-by: Naveen Kaje
    Fixes: 6c71013ecb7e2(block: partition: convert percpu ref)
    Cc: # v4.3+
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

25 Mar, 2016

1 commit

  • Pull block fixes from Jens Axboe:
    "Final round of fixes for this merge window - some of this has come up
    after the initial pull request, and some of it was put in a post-merge
    branch before the merge window.

    This contains:

    - Fix for a bad check for an error on dma mapping in the mtip32xx
    driver, from Alexey Khoroshilov.

    - A set of fixes for lightnvm, from Javier, Matias, and Wenwei.

    - An NVMe completion record corruption fix from Marta, ensuring that
    we read things in the right order.

    - Two writeback fixes from Tejun, marked for stable@ as well.

    - A blk-mq sw queue iterator fix from Thomas, fixing an oops for
    sparse CPU maps. They hit this in the hot plug/unplug rework"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    nvme: avoid cqe corruption when update at the same time as read
    writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
    writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
    blk-mq: Use proper cpumask iterator
    mtip32xx: fix checks for dma mapping errors
    lightnvm: do not load L2P table if not supported
    lightnvm: do not reserve lun on l2p loading
    nvme: lightnvm: return ppa completion status
    lightnvm: add a bitmap of luns
    lightnvm: specify target's logical address area
    null_blk: add lightnvm null_blk device to the nullb_list

    Linus Torvalds
     

20 Mar, 2016

1 commit

  • queue_for_each_ctx() iterates over per_cpu variables under the assumption that
    the possible cpu mask cannot have holes. That's wrong as all cpumasks can have
    holes. In case there are holes the iteration ends up accessing uninitialized
    memory and crashing as a result.

    Replace the macro by a proper for_each_possible_cpu() loop and drop the unused
    macro blk_ctx_sum() which references queue_for_each_ctx().

    Reported-by: Xiong Zhou
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Jens Axboe

    Thomas Gleixner
     

19 Mar, 2016

2 commits

  • Pull libata updates from Tejun Heo:

    - ahci grew runtime power management support so that the controller can
    be turned off if no devices are attached.

    - sata_via isn't dead yet. It got hotplug support and more refined
    workaround for certain WD drives.

    - Misc cleanups. There's a merge from for-4.5-fixes to avoid confusing
    conflicts in ahci PCI ID table.

    * 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
    ata: ahci_xgene: dereferencing uninitialized pointer in probe
    AHCI: Remove obsolete Intel Lewisburg SATA RAID device IDs
    ata: sata_rcar: Use ARCH_RENESAS
    sata_via: Implement hotplug for VT6421
    sata_via: Apply WD workaround only when needed on VT6421
    ahci: Add runtime PM support for the host controller
    ahci: Add functions to manage runtime PM of AHCI ports
    ahci: Convert driver to use modern PM hooks
    ahci: Cache host controller version
    scsi: Drop runtime PM usage count after host is added
    scsi: Set request queue runtime PM status back to active on resume
    block: Add blk_set_runtime_active()
    ata: ahci_mvebu: add support for Armada 3700 variant
    libata: fix unbalanced spin_lock_irqsave/spin_unlock_irq() in ata_scsi_park_show()
    libata: support AHCI on OCTEON platform

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:
    "Here are the core block changes for this merge window. Not a lot of
    exciting stuff going on in this round, most of the changes have been
    on the driver side of things. That pull request is coming next. This
    pull request contains:

    - A set of fixes for chained bio handling from Christoph.

    - A tag bounds check for blk-mq from Hannes, ensuring that we don't
    do something stupid if a device reports an invalid tag value.

    - A set of fixes/updates for the CFQ IO scheduler from Jan Kara.

    - A set of blk-mq fixes from Keith, adding support for dynamic
    hardware queues, and fixing init of max_dev_sectors for stacking
    devices.

    - A fix for the dynamic hw context from Ming.

    - Enabling of cgroup writeback support on a block device, from
    Shaohua"

    * 'for-4.6/core' of git://git.kernel.dk/linux-block:
    blk-mq: add bounds check on tag-to-rq conversion
    block: bio_remaining_done() isn't unlikely
    block: cleanup bio_endio
    block: factor out chained bio completion
    block: don't unecessarily clobber bi_error for chained bios
    block-dev: enable writeback cgroup support
    blk-mq: Fix NULL pointer updating nr_requests
    blk-mq: mark request queue as mq asap
    block: Initialize max_dev_sectors to 0
    blk-mq: dynamic h/w context count
    cfq-iosched: Allow parent cgroup to preempt its child
    cfq-iosched: Allow sync noidle workloads to preempt each other
    cfq-iosched: Reorder checks in cfq_should_preempt()
    cfq-iosched: Don't group_idle if cfqq has big thinktime

    Linus Torvalds
     

17 Mar, 2016

1 commit

  • Pull device mapper updates from Mike Snitzer:

    - Most attention this cycle went to optimizing blk-mq request-based DM
    (dm-mq) that is used exclussively by DM multipath:

    - A stable fix for dm-mq that eliminates excessive context
    switching offers the biggest performance improvement (for both
    IOPs and throughput).

    - But more work is needed, during the next cycle, to reduce
    spinlock contention in DM multipath on large NUMA systems.

    - A stable fix for a NULL pointer seen when DM stats is enabled on a DM
    multipath device that must requeue an IO due to path failure.

    - A stable fix for DM snapshot to disallow the COW and origin devices
    from being identical. This amounts to graceful failure in the face
    of userspace error because these devices shouldn't ever be identical.

    - Stable fixes for DM cache and DM thin provisioning to address crashes
    seen if/when their respective metadata device experiences failures
    that cause the transition to 'fail_io' mode.

    - The DM cache 'mq' policy is now an alias for the 'smq' policy. The
    'smq' policy proved to be consistently better than 'mq'. As such
    'mq', with all its complex user-facing tunables, has been eliminated.

    - Improve DM thin provisioning to consistently return -ENOSPC once the
    thin-pool's data volume is out of space.

    - Improve DM core to properly handle error propagation if
    bio_integrity_clone() fails in clone_bio().

    - Other small cleanups and improvements to DM core.

    * tag 'dm-4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (41 commits)
    dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request()
    dm thin: consistently return -ENOSPC if pool has run out of data space
    dm cache: bump the target version
    dm cache: make sure every metadata function checks fail_io
    dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO
    dm cache policy smq: clarify that mq registration failure was for 'mq'
    dm: return error if bio_integrity_clone() fails in clone_bio()
    dm thin metadata: don't issue prefetches if a transaction abort has failed
    dm snapshot: disallow the COW and origin devices from being identical
    dm cache: make the 'mq' policy an alias for 'smq'
    dm: drop unnecessary assignment of md->queue
    dm: reorder 'struct mapped_device' members to fix alignment and holes
    dm: remove dummy definition of 'struct dm_table'
    dm: add 'dm_numa_node' module parameter
    dm thin metadata: remove needless newline from subtree_dec() DMERR message
    dm mpath: cleanup reinstate_path() et al based on code review
    dm mpath: remove __pgpath_busy forward declaration, rename to pgpath_busy
    dm mpath: switch from 'unsigned' to 'bool' for flags where appropriate
    dm round robin: use percpu 'repeat_count' and 'current_path'
    dm path selector: remove 'repeat_count' return from .select_path hook
    ...

    Linus Torvalds
     

16 Mar, 2016

2 commits

  • This patch has been carried in the Android tree for quite some time and
    is one of the few patches required to get a mainline kernel up and
    running with an exsiting Android userspace. So I wanted to submit it
    for review and consideration if it should be merged.

    For partitions, add new uevent parameters 'PARTN' which specifies the
    partitions index in the table, and 'PARTNAME', which specifies PARTNAME
    specifices the partition name of a partition device.

    Android's userspace uses this for creating device node links from the
    partition name and number, ie:

    /dev/block/platform/soc/by-name/system
    or
    /dev/block/platform/soc/by-num/p1

    One can see its usage here:
    https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#355
    and
    https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#494

    [john.stultz@linaro.org: dropped NPARTS and reworded commit message for context]
    Signed-off-by: Dima Zavin
    Signed-off-by: John Stultz
    Cc: Jens Axboe
    Cc: Rom Lemarchand
    Cc: Android Kernel Team
    Cc: Jeff Moyer
    Cc:
    Cc: Kees Cook
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    San Mehat
     
  • We need to check for a valid index before accessing the array
    element to avoid accessing invalid memory regions.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Moyer

    Modified by Jens to drop the unlikely(), and make the fall through
    path be having a valid tag.

    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

14 Mar, 2016

2 commits

  • We use bio chaining during most I/Os these days due to the delayed
    bio splitting. Additionally XFS will start using it, and there is
    a pending direct I/O rewrite also making heavy use for it. Don't
    pretend it's always unlikely, and let the branch predictor do it's
    job instead.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Replace the while loop that unecessarily checks for a NULL bio in the fast
    path with a simple goto loop.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig