10 Jul, 2019

1 commit

  • Pull block updates from Jens Axboe:
    "This is the main block updates for 5.3. Nothing earth shattering or
    major in here, just fixes, additions, and improvements all over the
    map. This contains:

    - Series of documentation fixes (Bart)

    - Optimization of the blk-mq ctx get/put (Bart)

    - null_blk removal race condition fix (Bob)

    - req/bio_op() cleanups (Chaitanya)

    - Series cleaning up the segment accounting, and request/bio mapping
    (Christoph)

    - Series cleaning up the page getting/putting for bios (Christoph)

    - block cgroup cleanups and moving it to where it is used (Christoph)

    - block cgroup fixes (Tejun)

    - Series of fixes and improvements to bcache, most notably a write
    deadlock fix (Coly)

    - blk-iolatency STS_AGAIN and accounting fixes (Dennis)

    - Series of improvements and fixes to BFQ (Douglas, Paolo)

    - debugfs_create() return value check removal for drbd (Greg)

    - Use struct_size(), where appropriate (Gustavo)

    - Two lighnvm fixes (Heiner, Geert)

    - MD fixes, including a read balance and corruption fix (Guoqing,
    Marcos, Xiao, Yufen)

    - block opal shadow mbr additions (Jonas, Revanth)

    - sbitmap compare-and-exhange improvemnts (Pavel)

    - Fix for potential bio->bi_size overflow (Ming)

    - NVMe pull requests:
    - improved PCIe suspent support (Keith Busch)
    - error injection support for the admin queue (Akinobu Mita)
    - Fibre Channel discovery improvements (James Smart)
    - tracing improvements including nvmetc tracing support (Minwoo Im)
    - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
    Kulkarni)"

    - Various little fixes and improvements to drivers and core"

    * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
    blk-iolatency: fix STS_AGAIN handling
    block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
    blk-mq: simplify blk_mq_make_request()
    blk-mq: remove blk_mq_put_ctx()
    sbitmap: Replace cmpxchg with xchg
    block: fix .bi_size overflow
    block: sed-opal: check size of shadow mbr
    block: sed-opal: ioctl for writing to shadow mbr
    block: sed-opal: add ioctl for done-mark of shadow mbr
    block: never take page references for ITER_BVEC
    direct-io: use bio_release_pages in dio_bio_complete
    block_dev: use bio_release_pages in bio_unmap_user
    block_dev: use bio_release_pages in blkdev_bio_end_io
    iomap: use bio_release_pages in iomap_dio_bio_end_io
    block: use bio_release_pages in bio_map_user_iov
    block: use bio_release_pages in bio_unmap_user
    block: optionally mark pages dirty in bio_release_pages
    block: move the BIO_NO_PAGE_REF check into bio_release_pages
    block: skd_main.c: Remove call to memset after dma_alloc_coherent
    block: mtip32xx: Remove call to memset after dma_alloc_coherent
    ...

    Linus Torvalds
     

07 Jul, 2019

1 commit

  • When the blk-mq debugfs file creation logic was "cleaned up" it was
    cleaned up too much, causing the queue file to not be created in the
    correct location. Turns out the check for the directory being present
    is needed as if that has not happened yet, the files should not be
    created, and the function will be called later on in the initialization
    code so that the files can be created in the correct location.

    Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions")
    Reported-by: Stephen Rothwell
    Cc: linux-block@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Jens Axboe

    Greg Kroah-Hartman
     

21 Jun, 2019

1 commit

  • Now that we've a helper function blk_op_str() to convert the
    REQ_OP_XXX to string XXX, adjust the code to use that. Get rid of
    the duplicate array op_name which is now present in the blk-core.c
    which we renamed it to "blk_op_name" and open coding in the
    blk-mq-debugfs.c.

    Reviewed-by: Bart Van Assche
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     

20 Jun, 2019

3 commits

  • This is a pure code cleanup patch and doesn't change any functionality.
    Having multiple coding styles in the code creates confusion when
    someone tries to add a new code.

    Make queue_poll_stat_show() consistent by adding spaces around binary
    operators with the rest of the code.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • In function __blk_mq_debugfs_rq_show variable op has unsigned int type.
    Since op can never be negative use %u format specifier to match the
    variable type.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • This is a pure code cleanup patch and doesn't change any functionality.
    This removes the redundant else in the code which is not needed since
    we are returning from function anyway.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     

17 Jun, 2019

1 commit

  • This is a pure code cleanup patch and doesn't change any functionality.
    In block layer to identify the request operation req_op() macro is
    used, so change the open coding the req_op() in the blk-mq-debugfs.c.

    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     

15 Jun, 2019

1 commit


13 Jun, 2019

1 commit

  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    When all of these checks are cleaned up, lots of the functions used in
    the blk-mq-debugfs code can now return void, as no need to check the
    return value of them either.

    Overall, this ends up cleaning up the code and making it smaller, always
    a nice win.

    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Jens Axboe

    Greg Kroah-Hartman
     

01 May, 2019

1 commit


10 Mar, 2019

1 commit

  • Pull SCSI updates from James Bottomley:
    "This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
    hisi_sas, target/iscsi and target/core.

    Additionally Christoph refactored gdth as part of the dma changes. The
    major mid-layer change this time is the removal of bidi commands and
    with them the whole of the osd/exofs driver and filesystem. This is a
    major simplification for block and mq in particular"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits)
    scsi: cxgb4i: validate tcp sequence number only if chip version pf
    scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
    scsi: mpt3sas: Add missing breaks in switch statements
    scsi: aacraid: Fix missing break in switch statement
    scsi: kill command serial number
    scsi: csiostor: drop serial_number usage
    scsi: mvumi: use request tag instead of serial_number
    scsi: dpt_i2o: remove serial number usage
    scsi: st: osst: Remove negative constant left-shifts
    scsi: ufs-bsg: Allow reading descriptors
    scsi: ufs: Allow reading descriptor via raw upiu
    scsi: ufs-bsg: Change the calling convention for write descriptor
    scsi: ufs: Remove unused device quirks
    Revert "scsi: ufs: disable vccq if it's not needed by UFS device"
    scsi: megaraid_sas: Remove a bunch of set but not used variables
    scsi: clean obsolete return values of eh_timed_out
    scsi: sd: Optimal I/O size should be a multiple of physical block size
    scsi: MAINTAINERS: SCSI initiator and target tweaks
    scsi: fcoe: make use of fip_mode enum complete
    ...

    Linus Torvalds
     

15 Feb, 2019

3 commits

  • Pull in 5.0-rc6 to avoid a dumb merge conflict with fs/iomap.c.
    This is needed since io_uring is now based on the block branch,
    to avoid a conflict between the multi-page bvecs and the bits
    of io_uring that touch the core block parts.

    * tag 'v5.0-rc6': (525 commits)
    Linux 5.0-rc6
    x86/mm: Make set_pmd_at() paravirt aware
    MAINTAINERS: Update the ocores i2c bus driver maintainer, etc
    blk-mq: remove duplicated definition of blk_mq_freeze_queue
    Blk-iolatency: warn on negative inflight IO counter
    blk-iolatency: fix IO hang due to negative inflight counter
    MAINTAINERS: unify reference to xen-devel list
    x86/mm/cpa: Fix set_mce_nospec()
    futex: Handle early deadlock return correctly
    futex: Fix barrier comment
    net: dsa: b53: Fix for failure when irq is not defined in dt
    blktrace: Show requests without sector
    mips: cm: reprime error cause
    mips: loongson64: remove unreachable(), fix loongson_poweroff().
    sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
    geneve: should not call rt6_lookup() when ipv6 was disabled
    KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
    KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
    kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
    signal: Better detection of synchronous signals
    ...

    Jens Axboe
     
  • QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"),
    physical segment number is mainly figured out in blk_queue_split() for
    fast path, and the flag of BIO_SEG_VALID is set there too.

    Now only blk_recount_segments() and blk_recalc_rq_segments() use this
    flag.

    Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
    is set in blk_queue_split().

    For another user of blk_recalc_rq_segments():

    - run in partial completion branch of blk_update_request, which is an unusual case

    - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed
    since dm-rq is the only user.

    Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the
    current setup of the I/O path, as it isn't going to save you a significant amount
    of cycles.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

10 Feb, 2019

1 commit

  • We have various helpers for setting/clearing this flag, and also
    a helper to check if the queue supports queueable flushes or not.
    But nobody uses them anymore, kill it with fire.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Feb, 2019

1 commit

  • Pull driver core fixes from Greg KH:
    "Here are some driver core fixes for 5.0-rc6.

    Well, not so much "driver core" as "debugfs". There's a lot of
    outstanding debugfs cleanup patches coming in through different
    subsystem trees, and in that process the debugfs core was found that
    it really should return errors when something bad happens, to prevent
    random files from showing up in the root of debugfs afterward. So
    debugfs was fixed up to handle this properly, and then two fixes for
    the relay and blk-mq code was needed as it was making invalid
    assumptions about debugfs return values.

    There's also a cacheinfo fix in here that resolves a tiny issue.

    All of these have been in linux-next for over a week with no reported
    problems"

    * tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    blk-mq: protect debugfs_create_files() from failures
    relay: check return of create_buf_file() properly
    debugfs: debugfs_lookup() should return NULL if not found
    debugfs: return error values, not NULL
    debugfs: fix debugfs_rename parameter checking
    cacheinfo: Keep the old value if of_property_read_u32 fails

    Linus Torvalds
     

06 Feb, 2019

1 commit


31 Jan, 2019

1 commit

  • If debugfs were to return a non-NULL error for a debugfs call, using
    that pointer later in debugfs_create_files() would crash.

    Fix that by properly checking the pointer before referencing it.

    Reported-by: Michal Hocko
    Reported-and-tested-by: syzbot+b382ba6a802a3d242790@syzkaller.appspotmail.com
    Reported-by: Tetsuo Handa
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

24 Jan, 2019

1 commit


18 Dec, 2018

1 commit

  • When a request is added to rq list of sw queue(ctx), the rq may be from
    a different type of hctx, especially after multi queue mapping is
    introduced.

    So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
    blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
    hctx can be dispatched to current hctx in case that read queue or poll
    queue is enabled.

    This patch fixes this issue by introducing per-queue-type list.

    Cc: Christoph Hellwig
    Signed-off-by: Ming Lei

    Changed by me to not use separately cacheline aligned lists, just
    place them all in the same cacheline where we had just the one list
    and lock before.

    Signed-off-by: Jens Axboe

    Ming Lei
     

17 Dec, 2018

2 commits

  • Now we only export hctx->type via sysfs, and there isn't such info
    in hctx entry under debugfs. We often use debugfs only to diagnose
    queue mapping issue, so add the support in debugfs.

    Queue mapping becomes a bit more complicated after multiple queue
    mapping is supported, we may write blktest to verify if queue mapping
    is valid based on blk-mq-debugfs.

    Given not necessary to export hctx->type twice, so remove the export
    from sysfs.

    Cc: Jeff Moyer
    Cc: Mike Snitzer
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • blk-mq-debugfs has been proved as very helpful for debug some
    tough issues, such as IO hang.

    We have seen blk-wbt related IO hang several times, even inside
    Red Hat BZ, there is such report not sovled yet, so this patch
    adds support debugfs on rq_qos.

    Cc: Bart Van Assche
    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Cc: Josef Bacik
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

16 Nov, 2018

1 commit


09 Nov, 2018

1 commit


08 Nov, 2018

3 commits


26 Oct, 2018

1 commit

  • Dispatching a report zones command through the request queue is a major
    pain due to the command reply payload rewriting necessary. Given that
    blkdev_report_zones() is executing everything synchronously, implement
    report zones as a block device file operation instead, allowing major
    simplification of the code in many places.

    sd, null-blk, dm-linear and dm-flakey being the only block device
    drivers supporting exposing zoned block devices, these drivers are
    modified to provide the device side implementation of the
    report_zones() block device file operation.

    For device mappers, a new report_zones() target type operation is
    defined so that the upper block layer calls blkdev_report_zones() can
    be propagated down to the underlying devices of the dm targets.
    Implementation for this new operation is added to the dm-linear and
    dm-flakey targets.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    [Damien]
    * Changed method block_device argument to gendisk
    * Various bug fixes and improvements
    * Added support for null_blk, dm-linear and dm-flakey.
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Mike Snitzer
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

05 Oct, 2018

1 commit


27 Sep, 2018

1 commit

  • The RQF_PREEMPT flag is used for three purposes:
    - In the SCSI core, for making sure that power management requests
    are executed even if a device is in the "quiesced" state.
    - For domain validation by SCSI drivers that use the parallel port.
    - In the IDE driver, for IDE preempt requests.
    Rename "preempt-only" into "pm-only" because the primary purpose of
    this mode is power management. Since the power management core may
    but does not have to resume a runtime suspended device before
    performing system-wide suspend and since a later patch will set
    "pm-only" mode as long as a block device is runtime suspended, make
    it possible to set "pm-only" mode from more than one context. Since
    with this change scsi_device_quiesce() is no longer idempotent, make
    that function return early if it is called for a quiesced queue.

    Signed-off-by: Bart Van Assche
    Acked-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Cc: Jianchao Wang
    Cc: Johannes Thumshirn
    Cc: Alan Stern
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

09 Jul, 2018

3 commits

  • It won't be efficient to dequeue request one by one from sw queue,
    but we have to do that when queue is busy for better merge performance.

    This patch takes the Exponential Weighted Moving Average(EWMA) to figure
    out if queue is busy, then only dequeue request one by one from sw queue
    when queue is busy.

    Fixes: b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")
    Cc: Kashyap Desai
    Cc: Laurence Oberman
    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Cc: Bart Van Assche
    Cc: Hannes Reinecke
    Reported-by: Kashyap Desai
    Tested-by: Kashyap Desai
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Exclude zoned block device members from struct request_queue for
    CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building
    the code that uses these struct request_queue members if
    CONFIG_BLK_DEV_ZONED != n.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Damien Le Moal
    Cc: Matias Bjorling
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Since the implementation of blk_queue_nr_zones() is trivial and since
    it only has a single caller, inline this function.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Damien Le Moal
    Cc: Matias Bjorling
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

21 Jun, 2018

1 commit


29 May, 2018

1 commit

  • This patch simplifies the timeout handling by relying on the request
    reference counting to ensure the iterator is operating on an inflight
    and truly timed out request. Since the reference counting prevents the
    tag from being reallocated, the block layer no longer needs to prevent
    drivers from completing their requests while the timeout handler is
    operating on it: a driver completing a request is allowed to proceed to
    the next state without additional syncronization with the block layer.

    This also removes any need for generation sequence numbers since the
    request lifetime is prevented from being reallocated as a new sequence
    while timeout handling is operating on it.

    To enables this a refcount is added to struct request so that request
    users can be sure they're operating on the same request without it
    changing while they're processing it. The request's tag won't be
    released for reuse until both the timeout handler and the completion
    are done with it.

    Signed-off-by: Keith Busch
    [hch: slight cleanups, added back submission side hctx lock, use cmpxchg
    for completions]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     

10 Apr, 2018

1 commit

  • No driver uses this interface any more, so remove it.

    Cc: Stefan Haberland
    Tested-by: Christian Borntraeger
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

18 Mar, 2018

1 commit

  • Since commit 634f9e4631a8 ("blk-mq: remove REQ_ATOM_COMPLETE usages
    from blk-mq") blk_rq_is_complete() only reports whether or not a
    request has completed for legacy queues. Hence modify the
    blk-mq-debugfs code such that it shows the blk-mq request state
    again.

    Fixes: 634f9e4631a8 ("blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq")
    Signed-off-by: Bart Van Assche
    Cc: Tejun Heo
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

01 Mar, 2018

2 commits

  • When debugging the ZBC code in the mq-deadline scheduler it is very
    important to know which zones are locked and which zones are not
    locked. Hence this patch that exports the zone locking information
    through debugfs.

    Cc: Omar Sandoval
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Reviewed-by: Damien Le Moal
    Tested-by: Damien Le Moal
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Make sure that the queue show and store methods are contiguous and
    also that these appear in alphabetical order.

    Signed-off-by: Bart Van Assche
    Cc: Omar Sandoval
    Cc: Damien Le Moal
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

25 Jan, 2018

1 commit

  • Attributes that only implement .seq_ops are read-only, any write to
    them should be rejected. But currently kernel would crash when
    writing to such debugfs entries, e.g.

    chmod +w /sys/kernel/debug/block//requeue_list
    echo 0 > /sys/kernel/debug/block//requeue_list
    chmod -w /sys/kernel/debug/block//requeue_list

    Fix it by returning -EPERM in blk_mq_debugfs_write() when writing to
    such attributes.

    Cc: Ming Lei
    Signed-off-by: Eryu Guan
    Signed-off-by: Jens Axboe

    Eryu Guan