22 Sep, 2018

1 commit

  • Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
    updating properly on 4.18. This is because we started using ktime to
    track elapsed time, and we convert nanoseconds to jiffies when we update
    the partition counter. However, this gets rounded down, so any I/Os that
    take less than a jiffy are not accounted for. Previously in this case,
    the value of jiffies would sometimes increment while we were doing I/O,
    so at least some I/Os were accounted for.

    Let's convert the stats to use nanoseconds internally. We still report
    milliseconds as before, now more accurately than ever. The value is
    still truncated to 32 bits for backwards compatibility.

    Fixes: 522a777566f5 ("block: consolidate struct request timestamp fields")
    Cc: stable@vger.kernel.org
    Reported-by: Klaus Kusche
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

18 Jul, 2018

2 commits

  • Add tracking of REQ_OP_DISCARD ios to the partition statistics and
    append them to the various stat files in /sys as well as
    /proc/diskstats. These are tracked with the same four stats as reads
    and writes:

    Number of discard ios completed.
    Number of discard ios merged
    Number of discard sectors completed
    Milliseconds spent on discard requests

    This is done via adding a new STAT_DISCARD define to genhd.h and then
    using it to index that stat field for discard requests.

    tj: Refreshed on top of v4.17 and other previous updates.

    Signed-off-by: Michael Callahan
    Signed-off-by: Tejun Heo
    Cc: Andy Newell
    Signed-off-by: Jens Axboe

    Michael Callahan
     
  • Add defines for STAT_READ and STAT_WRITE for indexing the partition
    stat entries. This clarifies some fs/ code which has hardcoded 1 for
    STAT_WRITE and will make it easier to extend the stats with additional
    fields.

    tj: Refreshed on top of v4.17.

    Signed-off-by: Michael Callahan
    Signed-off-by: Tejun Heo
    Cc: "Theodore Ts'o"
    Cc: Jaegeuk Kim
    Signed-off-by: Jens Axboe

    Michael Callahan
     

05 Jun, 2018

1 commit

  • Pull procfs updates from Al Viro:
    "Christoph's proc_create_... cleanups series"

    * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
    xfs, proc: hide unused xfs procfs helpers
    isdn/gigaset: add back gigaset_procinfo assignment
    proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
    tty: replace ->proc_fops with ->proc_show
    ide: replace ->proc_fops with ->proc_show
    ide: remove ide_driver_proc_write
    isdn: replace ->proc_fops with ->proc_show
    atm: switch to proc_create_seq_private
    atm: simplify procfs code
    bluetooth: switch to proc_create_seq_data
    netfilter/x_tables: switch to proc_create_seq_private
    netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
    neigh: switch to proc_create_seq_data
    hostap: switch to proc_create_{seq,single}_data
    bonding: switch to proc_create_seq_data
    rtc/proc: switch to proc_create_single_data
    drbd: switch to proc_create_single
    resource: switch to proc_create_seq_data
    staging/rtl8192u: simplify procfs code
    jfs: simplify procfs code
    ...

    Linus Torvalds
     

25 May, 2018

1 commit

  • Convert the S_ symbolic permissions to their octal equivalents as
    using octal and not symbolic permissions is preferred by many as more
    readable.

    see: https://lkml.org/lkml/2016/8/2/1945

    Done with automated conversion via:
    $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace

    Miscellanea:

    o Wrapped modified multi-line calls to a single line where appropriate
    o Realign modified multi-line calls to open parenthesis

    Signed-off-by: Joe Perches
    Signed-off-by: Jens Axboe

    Joe Perches
     

16 May, 2018

1 commit


26 Apr, 2018

1 commit

  • When the blk-mq inflight implementation was added, /proc/diskstats was
    converted to use it, but /sys/block/$dev/inflight was not. Fix it by
    adding another helper to count in-flight requests by data direction.

    Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant")
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

16 Mar, 2018

1 commit

  • register_blkdev() and __register_chrdev_region() treat the major
    number as an unsigned int. So print it the same way to avoid
    absurd error statements such as:
    "... major requested (-1) is greater than the maximum (511) ..."
    (and also fix off-by-one bugs in the error prints).

    While at it, also update the comment describing register_blkdev().

    Signed-off-by: Srivatsa S. Bhat
    Reviewed-by: Logan Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Srivatsa S. Bhat
     

27 Feb, 2018

4 commits

  • When two blkdev_open() calls for a partition race with device removal
    and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
    blkdev_open(). The race can happen as follows:

    CPU0 CPU1 CPU2
    del_gendisk()
    bdev_unhash_inode(part1);

    blkdev_open(part1, O_EXCL) blkdev_open(part1, O_EXCL)
    bdev = bd_acquire() bdev = bd_acquire()
    blkdev_get(bdev)
    bd_start_claiming(bdev)
    - finds old inode 'whole'
    bd_prepare_to_claim() -> 0
    bdev_unhash_inode(whole);


    blkdev_get(bdev);
    bd_start_claiming(bdev)
    - finds new inode 'whole'
    bd_prepare_to_claim()
    - this also succeeds as we have
    different 'whole' here...
    - bad things happen now as we
    have two exclusive openers of
    the same bdev

    The problem here is that block device opens can see various intermediate
    states while gendisk is shutting down and then being recreated.

    We fix the problem by introducing new lookup_sem in gendisk that
    synchronizes gendisk deletion with get_gendisk() and furthermore by
    making sure that get_gendisk() does not return gendisk that is being (or
    has been) deleted. This makes sure that once we ever manage to look up
    newly created bdev inode, we are also guaranteed that following
    get_gendisk() will either return failure (and we fail open) or it
    returns gendisk for the new device and following bdget_disk() will
    return new bdev inode (i.e., blkdev_open() follows the path as if it is
    completely run after new device is created).

    Reported-and-analyzed-by: Hou Tao
    Tested-by: Hou Tao
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Add a proper counterpart to get_disk_and_module() -
    put_disk_and_module(). Currently it is opencoded in several places.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Rename get_disk() to get_disk_and_module() to make sure what the
    function does. It's not a great name but at least it is now clear that
    put_disk() is not it's counterpart.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Commit 8ddcd653257c "block: introduce GENHD_FL_HIDDEN" added handling of
    hidden devices to get_gendisk() but forgot to drop module reference
    which is also acquired by get_disk(). Drop the reference as necessary.

    Arguably the function naming here is misleading as put_disk() is *not*
    the counterpart of get_disk() but let's fix that in the follow up
    commit since that will be more intrusive.

    Fixes: 8ddcd653257c18a669fcb75ee42c37054908e0d6
    CC: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

15 Jan, 2018

2 commits

  • Since I can remember DM has forced the block layer to allow the
    allocation and initialization of the request_queue to be distinct
    operations. Reason for this is block/genhd.c:add_disk() has requires
    that the request_queue (and associated bdi) be tied to the gendisk
    before add_disk() is called -- because add_disk() also deals with
    exposing the request_queue via blk_register_queue().

    DM's dynamic creation of arbitrary device types (and associated
    request_queue types) requires the DM device's gendisk be available so
    that DM table loads can establish a master/slave relationship with
    subordinate devices that are referenced by loaded DM tables -- using
    bd_link_disk_holder(). But until these DM tables, and their associated
    subordinate devices, are known DM cannot know what type of request_queue
    it needs -- nor what its queue_limits should be.

    This chicken and egg scenario has created all manner of problems for DM
    and, at times, the block layer.

    Summary of changes:

    - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
    that drivers may use to add a disk without also calling
    blk_register_queue(). Driver must call blk_register_queue() once its
    request_queue is fully initialized.

    - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
    is not set. It won't be set if driver used add_disk_no_queue_reg()
    but driver encounters an error and must del_gendisk() before calling
    blk_register_queue().

    - Export blk_register_queue().

    These changes allow DM to use add_disk_no_queue_reg() to anchor its
    gendisk as the "master" for master/slave relationships DM must establish
    with subordinate devices referenced in DM tables that get loaded. Once
    all "slave" devices for a DM device are known its request_queue can be
    properly initialized and then advertised via sysfs -- important
    improvement being that no request_queue resource initialization
    performed by blk_register_queue() is missed for DM devices anymore.

    Signed-off-by: Mike Snitzer
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • device_add_disk() will only call bdi_register_owner() if
    !GENHD_FL_HIDDEN, so it follows that del_gendisk() should only call
    bdi_unregister() if !GENHD_FL_HIDDEN.

    Found with code inspection. bdi_unregister() won't do any harm if
    bdi_register_owner() wasn't used but best to avoid the unnecessary
    call to bdi_unregister().

    Fixes: 8ddcd65325 ("block: introduce GENHD_FL_HIDDEN")
    Signed-off-by: Mike Snitzer
    Reviewed-by: Ming Lei
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

20 Nov, 2017

2 commits


15 Nov, 2017

1 commit

  • Pull core block layer updates from Jens Axboe:
    "This is the main pull request for block storage for 4.15-rc1.

    Nothing out of the ordinary in here, and no API changes or anything
    like that. Just various new features for drivers, core changes, etc.
    In particular, this pull request contains:

    - A patch series from Bart, closing the whole on blk/scsi-mq queue
    quescing.

    - A series from Christoph, building towards hidden gendisks (for
    multipath) and ability to move bio chains around.

    - NVMe
    - Support for native multipath for NVMe (Christoph).
    - Userspace notifications for AENs (Keith).
    - Command side-effects support (Keith).
    - SGL support (Chaitanya Kulkarni)
    - FC fixes and improvements (James Smart)
    - Lots of fixes and tweaks (Various)

    - bcache
    - New maintainer (Michael Lyle)
    - Writeback control improvements (Michael)
    - Various fixes (Coly, Elena, Eric, Liang, et al)

    - lightnvm updates, mostly centered around the pblk interface
    (Javier, Hans, and Rakesh).

    - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

    - Writeback series that fix the much discussed hundreds of millions
    of sync-all units. This goes all the way, as discussed previously
    (me).

    - Fix for missing wakeup on writeback timer adjustments (Yafang
    Shao).

    - Fix laptop mode on blk-mq (me).

    - {mq,name} tupple lookup for IO schedulers, allowing us to have
    alias names. This means you can use 'deadline' on both !mq and on
    mq (where it's called mq-deadline). (me).

    - blktrace race fix, oopsing on sg load (me).

    - blk-mq optimizations (me).

    - Obscure waitqueue race fix for kyber (Omar).

    - NBD fixes (Josef).

    - Disable writeback throttling by default on bfq, like we do on cfq
    (Luca Miccio).

    - Series from Ming that enable us to treat flush requests on blk-mq
    like any other request. This is a really nice cleanup.

    - Series from Ming that improves merging on blk-mq with schedulers,
    getting us closer to flipping the switch on scsi-mq again.

    - BFQ updates (Paolo).

    - blk-mq atomic flags memory ordering fixes (Peter Z).

    - Loop cgroup support (Shaohua).

    - Lots of minor fixes from lots of different folks, both for core and
    driver code"

    * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
    nvme: fix visibility of "uuid" ns attribute
    blk-mq: fixup some comment typos and lengths
    ide: ide-atapi: fix compile error with defining macro DEBUG
    blk-mq: improve tag waiting setup for non-shared tags
    brd: remove unused brd_mutex
    blk-mq: only run the hardware queue if IO is pending
    block: avoid null pointer dereference on null disk
    fs: guard_bio_eod() needs to consider partitions
    xtensa/simdisk: fix compile error
    nvme: expose subsys attribute to sysfs
    nvme: create 'slaves' and 'holders' entries for hidden controllers
    block: create 'slaves' and 'holders' entries for hidden gendisks
    nvme: also expose the namespace identification sysfs files for mpath nodes
    nvme: implement multipath access to nvme subsystems
    nvme: track shared namespaces
    nvme: introduce a nvme_ns_ids structure
    nvme: track subsystems
    block, nvme: Introduce blk_mq_req_flags_t
    block, scsi: Make SCSI quiesce and resume work reliably
    block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
    ...

    Linus Torvalds
     

11 Nov, 2017

2 commits

  • It is possible that the pointer disk can be null and hence
    we can get a null pointer deference when accessing disk->flags.
    Add a null pointer check to avoid the dereference.

    Detected by CoverityScan, CID#1461133 ("Explicit null dereferenced")

    Fixes: 8ddcd653257c ("block: introduce GENHD_FL_HIDDEN")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Colin Ian King
    Signed-off-by: Jens Axboe

    Colin Ian King
     
  • When creating nvme multipath devices we should populate the 'slaves' and
    'holders' directorys properly to aid userspace topology detection.

    Signed-off-by: Hannes Reinecke
    [hch: split from a larger patch]
    Reviewed-by: Keith Busch
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

04 Nov, 2017

2 commits

  • With this flag a driver can create a gendisk that can be used for I/O
    submission inside the kernel, but which is not registered as user
    facing block device. This will be useful for the NVMe multipath
    implementation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The hidden gendisks introduced in the next patch need to keep the dev
    field in their struct device empty so that udev won't try to create
    block device nodes for them. To support that rewrite disk_devt to
    look at the major and first_minor fields in the gendisk itself instead
    of looking into the struct device.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

26 Oct, 2017

1 commit

  • Darrick posted the following warning and Dave Chinner analyzed it:

    > ======================================================
    > WARNING: possible circular locking dependency detected
    > 4.14.0-rc1-fixes #1 Tainted: G W
    > ------------------------------------------------------
    > loop0/31693 is trying to acquire lock:
    > (&(&ip->i_mmaplock)->mr_lock){++++}, at: [] xfs_ilock+0x23c/0x330 [xfs]
    >
    > but now in release context of a crosslock acquired at the following:
    > ((complete)&ret.event){+.+.}, at: [] submit_bio_wait+0x7f/0xb0
    >
    > which lock already depends on the new lock.
    >
    > the existing dependency chain (in reverse order) is:
    >
    > -> #2 ((complete)&ret.event){+.+.}:
    > lock_acquire+0xab/0x200
    > wait_for_completion_io+0x4e/0x1a0
    > submit_bio_wait+0x7f/0xb0
    > blkdev_issue_zeroout+0x71/0xa0
    > xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs]
    > xfs_bmapi_write+0x374/0x11f0 [xfs]
    > xfs_iomap_write_direct+0x2ac/0x430 [xfs]
    > xfs_file_iomap_begin+0x20d/0xd50 [xfs]
    > iomap_apply+0x43/0xe0
    > dax_iomap_rw+0x89/0xf0
    > xfs_file_dax_write+0xcc/0x220 [xfs]
    > xfs_file_write_iter+0xf0/0x130 [xfs]
    > __vfs_write+0xd9/0x150
    > vfs_write+0xc8/0x1c0
    > SyS_write+0x45/0xa0
    > entry_SYSCALL_64_fastpath+0x1f/0xbe
    >
    > -> #1 (&xfs_nondir_ilock_class){++++}:
    > lock_acquire+0xab/0x200
    > down_write_nested+0x4a/0xb0
    > xfs_ilock+0x263/0x330 [xfs]
    > xfs_setattr_size+0x152/0x370 [xfs]
    > xfs_vn_setattr+0x6b/0x90 [xfs]
    > notify_change+0x27d/0x3f0
    > do_truncate+0x5b/0x90
    > path_openat+0x237/0xa90
    > do_filp_open+0x8a/0xf0
    > do_sys_open+0x11c/0x1f0
    > entry_SYSCALL_64_fastpath+0x1f/0xbe
    >
    > -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}:
    > up_write+0x1c/0x40
    > xfs_iunlock+0x1d0/0x310 [xfs]
    > xfs_file_fallocate+0x8a/0x310 [xfs]
    > loop_queue_work+0xb7/0x8d0
    > kthread_worker_fn+0xb9/0x1f0
    >
    > Chain exists of:
    > &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event
    >
    > Possible unsafe locking scenario by crosslock:
    >
    > CPU0 CPU1
    > ---- ----
    > lock(&xfs_nondir_ilock_class);
    > lock((complete)&ret.event);
    > lock(&(&ip->i_mmaplock)->mr_lock);
    > unlock((complete)&ret.event);
    >
    > *** DEADLOCK ***

    The warning is a false positive, caused by the fact that all
    wait_for_completion()s in submit_bio_wait() are waiting with the same
    lock class.

    However, some bios have nothing to do with others, for example in the case
    of loop devices, there's no direct connection between the bios of an upper
    device and the bios of a lower device(=loop device).

    The safest way to assign different lock classes to different devices is
    to do it for each gendisk. In other words, this patch assigns a
    lockdep_map per gendisk and uses it when initializing completion in
    submit_bio_wait().

    Analyzed-by: Dave Chinner
    Reported-by: Darrick J. Wong
    Signed-off-by: Byungchul Park
    Reviewed-by: Jens Axboe
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: amir73il@gmail.com
    Cc: axboe@kernel.dk
    Cc: david@fromorbit.com
    Cc: hch@infradead.org
    Cc: idryomov@gmail.com
    Cc: johan@kernel.org
    Cc: johannes.berg@intel.com
    Cc: kernel-team@lge.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: linux-xfs@vger.kernel.org
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/1508921765-15396-10-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     

08 Sep, 2017

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     

24 Aug, 2017

2 commits


18 Aug, 2017

1 commit

  • Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with
    rcu_dereference_protected(). This patch does not change the behavior
    of the modified code but ensures that sparse does not complain about
    disk->part_tbl manipulations nor about part_tbl->part accesses.
    Additionally, improve documentation of the locking requirements of
    the modified functions.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Cc: Tejun Heo
    Cc: Jan Kara
    Cc: Dan Williams
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

10 Aug, 2017

3 commits

  • We don't have to inc/dec some counter, since we can just
    iterate the tags. That makes inc/dec a noop, but means we
    have to iterate busy tags to get an in-flight count.

    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Instead of returning the count that matches the partition, pass
    in an array of two ints. Index 0 will be filled with the inflight
    count for the partition in question, and index 1 will filled
    with the root inflight count, if the partition passed in is not the
    root.

    This is in preparation for being able to calculate both in one
    go.

    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • No functional change in this patch, just in preparation for
    basing the inflight mechanism on the queue in question.

    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

17 Jul, 2017

1 commit

  • Presently, the order of the block devices listed in /proc/devices is not
    entirely sequential. If a block device has a major number greater than
    BLKDEV_MAJOR_HASH_SIZE (255), it will be ordered as if its major were
    module 255. For example, 511 appears after 1.

    This patch cleans that up and prints each major number in the correct
    order, regardless of where they are stored in the hash table.

    In order to do this, we introduce BLKDEV_MAJOR_MAX as an artificial
    limit (chosen to be 512). It will then print all devices in major
    order number from 0 to the maximum.

    Signed-off-by: Logan Gunthorpe
    Cc: Jens Axboe
    Cc: Jeff Layton
    Cc: "J. Bruce Fields"
    Signed-off-by: Greg Kroah-Hartman

    Logan Gunthorpe
     

21 Jun, 2017

1 commit

  • The variable 'disk_type' is never modified so constify it.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Omar Sandoval
    Cc: Ming Lei
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

03 May, 2017

1 commit

  • Pull documentation update from Jonathan Corbet:
    "A reasonably busy cycle for documentation this time around. There is a
    new guide for user-space API documents, rather sparsely populated at
    the moment, but it's a start. Markus improved the infrastructure for
    converting diagrams. Mauro has converted much of the USB documentation
    over to RST. Plus the usual set of fixes, improvements, and tweaks.

    There's a bit more than the usual amount of reaching out of
    Documentation/ to fix comments elsewhere in the tree; I have acks for
    those where I could get them"

    * tag 'docs-4.12' of git://git.lwn.net/linux: (74 commits)
    docs: Fix a couple typos
    docs: Fix a spelling error in vfio-mediated-device.txt
    docs: Fix a spelling error in ioctl-number.txt
    MAINTAINERS: update file entry for HSI subsystem
    Documentation: allow installing man pages to a user defined directory
    Doc/PM: Sync with intel_powerclamp code behavior
    zr364xx.rst: usb/devices is now at /sys/kernel/debug/
    usb.rst: move documentation from proc_usb_info.txt to USB ReST book
    convert philips.txt to ReST and add to media docs
    docs-rst: usb: update old usbfs-related documentation
    arm: Documentation: update a path name
    docs: process/4.Coding.rst: Fix a couple of document refs
    docs-rst: fix usb cross-references
    usb: gadget.h: be consistent at kernel doc macros
    usb: composite.h: fix two warnings when building docs
    usb: get rid of some ReST doc build errors
    usb.rst: get rid of some Sphinx errors
    usb/URB.txt: convert to ReST and update it
    usb/persist.txt: convert to ReST and add to driver-api book
    usb/hotplug.txt: convert to ReST and add to driver-api book
    ...

    Linus Torvalds
     

28 Apr, 2017

1 commit

  • Commit 99e6608c9e74 "block: Add badblock management for gendisks"
    allowed for drivers like pmem and software-raid to advertise a list of
    bad media areas. However, it inadvertently added a 'badblocks' to all
    block devices. Lets clean this up by having the 'badblocks' attribute
    not be visible when the driver has not populated a 'struct badblocks'
    instance in the gendisk.

    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Martin K. Petersen
    Reported-by: Vishal Verma
    Signed-off-by: Dan Williams
    Tested-by: Vishal Verma
    Signed-off-by: Jens Axboe

    Dan Williams
     

03 Apr, 2017

1 commit

  • ./lib/string.c:134: WARNING: Inline emphasis start-string without end-string.
    ./mm/filemap.c:522: WARNING: Inline interpreted text or phrase reference start-string without end-string.
    ./mm/filemap.c:1283: ERROR: Unexpected indentation.
    ./mm/filemap.c:3003: WARNING: Inline interpreted text or phrase reference start-string without end-string.
    ./mm/vmalloc.c:1544: WARNING: Inline emphasis start-string without end-string.
    ./mm/page_alloc.c:4245: ERROR: Unexpected indentation.
    ./ipc/util.c:676: ERROR: Unexpected indentation.
    ./drivers/pci/irq.c:35: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./security/security.c:109: ERROR: Unexpected indentation.
    ./security/security.c:110: WARNING: Definition list ends without a blank line; unexpected unindent.
    ./block/genhd.c:275: WARNING: Inline strong start-string without end-string.
    ./block/genhd.c:283: WARNING: Inline strong start-string without end-string.
    ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
    ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
    ./ipc/util.c:477: ERROR: Unknown target name: "s".

    Signed-off-by: Mauro Carvalho Chehab
    Acked-by: Bjorn Helgaas
    Signed-off-by: Jonathan Corbet

    mchehab@s-opensource.com
     

23 Mar, 2017

1 commit

  • When device open races with device shutdown, we can get the following
    oops in scsi_disk_get():

    [11863.044351] general protection fault: 0000 [#1] SMP
    [11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop]
    [11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W 4.10.0-rc2-xen+ #35
    [11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000
    [11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70
    [11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202
    [11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000
    [11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880
    [11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001
    [11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa
    [11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d
    [11863.059217] FS: 00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000
    [11863.059217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0
    [11863.059217] Call Trace:
    [11863.059217] ? disk_get_part+0x22/0x1f0
    [11863.059217] sd_open+0x39/0x130
    [11863.059217] __blkdev_get+0x69/0x430
    [11863.059217] ? bd_acquire+0x7f/0xc0
    [11863.059217] ? bd_acquire+0x96/0xc0
    [11863.059217] ? blkdev_get+0x350/0x350
    [11863.059217] blkdev_get+0x126/0x350
    [11863.059217] ? _raw_spin_unlock+0x2b/0x40
    [11863.059217] ? bd_acquire+0x7f/0xc0
    [11863.059217] ? blkdev_get+0x350/0x350
    [11863.059217] blkdev_open+0x65/0x80
    ...

    As you can see RAX value is already poisoned showing that gendisk we got
    is already freed. The problem is that get_gendisk() looks up device
    number in ext_devt_idr and then does get_disk() which does kobject_get()
    on the disks kobject. However the disk gets removed from ext_devt_idr
    only in disk_release() (through blk_free_devt()) at which moment it has
    already 0 refcount and is already on its way to be freed. Indeed we've
    got a warning from kobject_get() about 0 refcount shortly before the
    oops.

    We fix the problem by using kobject_get_unless_zero() in get_disk() so
    that get_disk() cannot get reference on a disk that is already being
    freed.

    Tested-by: Lekshmi Pillai
    Reviewed-by: Bart Van Assche
    Acked-by: Tejun Heo
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

09 Mar, 2017

2 commits

  • This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes
    leaking of device numbers for SCSI when SCSI registers multiple gendisks
    for one request_queue in succession. It can be easily reproduced using
    Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
    Furthermore the protection provided by this commit is not needed anymore
    as the problem it was fixing got also fixed by commit 165a5e22fafb
    "block: Move bdi_unregister() to del_gendisk()".

    [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2

    Signed-off-by: Jan Kara
    Acked-by: Dan Williams
    Tested-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()"
    added disk->queue dereference to del_gendisk(). Although del_gendisk()
    is not supposed to be called without disk->queue valid and
    blk_unregister_queue() warns in that case, this change will make it oops
    instead. Return to the old more robust behavior of just warning when
    del_gendisk() gets called for gendisk with disk->queue being NULL.

    Reported-by: Dan Carpenter
    Signed-off-by: Jan Kara
    Tested-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jan Kara
     

03 Mar, 2017

1 commit

  • Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
    unregistered." moved bdi unregistration (at that time through
    bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
    it needs to happen before blk_unregister_region() call in del_gendisk()
    for MD. SCSI though will free up the device number from sd_remove()
    called through a maze of callbacks from device_del() in
    __scsi_remove_device() before blk_cleanup_queue() and thus similar races
    as described in 6cd18e711dd8 can happen for SCSI as well as reported by
    Omar [1].

    Moving bdi_unregister() to del_gendisk() works for MD and fixes the
    problem for SCSI since del_gendisk() gets called from sd_remove() before
    freeing the device number.

    This also makes device_add_disk() (calling bdi_register_owner()) more
    symmetric with del_gendisk().

    [1] http://marc.info/?l=linux-block&m=148554717109098&w=2

    Tested-by: Lekshmi Pillai
    Acked-by: Tejun Heo
    Signed-off-by: Jan Kara
    Tested-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jan Kara
     

22 Feb, 2017

2 commits

  • Iteration over partitions in del_gendisk() omits part0. Add
    bdev_unhash_inode() call for the whole device. Otherwise if the device
    number gets reused, bdev inode will be still associated with the old
    (stale) bdi.

    Tested-by: Lekshmi Pillai
    Acked-by: Tejun Heo
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Move bdev_unhash_inode() after invalidate_partition() as
    invalidate_partition() looks up bdev and it cannot find the right bdev
    inode after bdev_unhash_inode() is called. Thus invalidate_partition()
    would not invalidate page cache of the previously used bdev. Also use
    part_devt() when calling bdev_unhash_inode() instead of manually
    creating the device number.

    Tested-by: Lekshmi Pillai
    Acked-by: Tejun Heo
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara