13 May, 2017

1 commit

  • Pull libnvdimm fixes from Dan Williams:
    "Incremental fixes and a small feature addition on top of the main
    libnvdimm 4.12 pull request:

    - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
    The size regression is fixed by moving all dax helpers into the
    dax-core and only specifying "select DAX" for FS_DAX and
    dax-capable drivers. He also asked for clarification of the
    NR_DEV_DAX config option which, on closer look, does not need to be
    a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
    for good measure.

    - Ben's attention to detail on -stable patch submissions caught a
    case where the recent fixes to arch_copy_from_iter_pmem() missed a
    condition where we strand dirty data in the cache. This is tagged
    for -stable and will also be included in the rework of the pmem api
    to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.

    - Vishal adds a feature that missed the initial pull due to pending
    review feedback. It allows the kernel to clear media errors when
    initializing a BTT (atomic sector update driver) instance on a pmem
    namespace.

    - Ross noticed that the dax_device + dax_operations conversion broke
    __dax_zero_page_range(). The nvdimm unit tests fail to check this
    path, but xfstests immediately trips over it. No excuse for missing
    this before submitting the 4.12 pull request.

    These all pass the nvdimm unit tests and an xfstests spot check. The
    set has received a build success notification from the kbuild robot"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    filesystem-dax: fix broken __dax_zero_page_range() conversion
    libnvdimm, btt: ensure that initializing metadata clears poison
    libnvdimm: add an atomic vs process context flag to rw_bytes
    x86, pmem: Fix cache flushing for iovec write < 8 bytes
    device-dax: kill NR_DEV_DAX
    block, dax: move "select DAX" from BLOCK to FS_DAX
    device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

    Linus Torvalds
     

09 May, 2017

1 commit

  • For configurations that do not enable DAX filesystems or drivers, do not
    require the DAX core to be built.

    Given that the 'direct_access' method has been removed from
    'block_device_operations', we can also go ahead and remove the
    block-related dax helper functions from fs/block_dev.c to
    drivers/dax/super.c. This keeps dax details out of the block layer and
    lets the DAX core be built as a module in the FS_DAX=n case.

    Filesystems need to include dax.h to call bdev_dax_supported().

    Cc: linux-xfs@vger.kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Matthew Wilcox
    Cc: Alexander Viro
    Cc: "Darrick J. Wong"
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Dan Williams

    Dan Williams
     

06 May, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "The bulk of this has been in multiple -next releases. There were a few
    late breaking fixes and small features that got added in the last
    couple days, but the whole set has received a build success
    notification from the kbuild robot.

    Change summary:

    - Region media error reporting: A libnvdimm region device is the
    parent to one or more namespaces. To date, media errors have been
    reported via the "badblocks" attribute attached to pmem block
    devices for namespaces in "raw" or "memory" mode. Given that
    namespaces can be in "device-dax" or "btt-sector" mode this new
    interface reports media errors generically, i.e. independent of
    namespace modes or state.

    This subsequently allows userspace tooling to craft "ACPI 6.1
    Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
    requests and submit them via the ioctl path for NVDIMM root bus
    devices.

    - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
    by a request from Linus and feedback from Christoph this allows for
    dax capable drivers to publish their own custom dax operations.
    This fixes the broken assumption that all dax operations are
    related to a persistent memory device, and makes it easier for
    other architectures and platforms to add customized persistent
    memory support.

    - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
    available for storage appliance applications to manually trigger
    memory controllers to drain write-pending buffers that would
    otherwise be flushed automatically by the platform ADR
    (asynchronous-DRAM-refresh) mechanism at a power loss event.
    Support for "locked" DIMMs is included to prevent namespaces from
    surfacing when the namespace label data area is locked. Finally,
    fixes for various reported deadlocks and crashes, also tagged for
    -stable.

    - ACPI / nfit driver updates: General updates of the nfit driver to
    add DSM command overrides, ACPI 6.1 health state flags support, DSM
    payload debug available by default, and various fixes.

    Acknowledgements that came after the branch was pushed:

    - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock":
    Tested-by: Yi Zhang

    - commit 23f498448362 "libnvdimm: rework region badblocks clearing"
    Tested-by: Toshi Kani "

    * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
    libnvdimm, pfn: fix 'npfns' vs section alignment
    libnvdimm: handle locked label storage areas
    libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
    brd: fix uninitialized use of brd->dax_dev
    block, dax: use correct format string in bdev_dax_supported
    device-dax: fix sysfs attribute deadlock
    libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
    libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
    libnvdimm: rework region badblocks clearing
    acpi, nfit: kill ACPI_NFIT_DEBUG
    libnvdimm: fix clear length of nvdimm_forget_poison()
    libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
    libnvdimm, region: sysfs trigger for nvdimm_flush()
    libnvdimm: fix phys_addr for nvdimm_clear_poison
    x86, dax, pmem: remove indirection around memcpy_from_pmem()
    block: remove block_device_operations ->direct_access()
    block, dax: convert bdev_dax_supported() to dax_direct_access()
    filesystem-dax: convert to dax_direct_access()
    Revert "block: use DAX for partition table reads"
    ext2, ext4, xfs: retrieve dax_device for iomap operations
    ...

    Linus Torvalds
     

04 May, 2017

1 commit

  • invalidate_bdev() calls cleancache_invalidate_inode() iff ->nrpages != 0
    which doen't make any sense.

    Make sure that invalidate_bdev() always calls cleancache_invalidate_inode()
    regardless of mapping->nrpages value.

    Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
    Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Reviewed-by: Jan Kara
    Acked-by: Konrad Rzeszutek Wilk
    Cc: Alexander Viro
    Cc: Ross Zwisler
    Cc: Jens Axboe
    Cc: Johannes Weiner
    Cc: Alexey Kuznetsov
    Cc: Christoph Hellwig
    Cc: Nikolay Borisov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

02 May, 2017

1 commit

  • The new message has an incorrect format string, causing a warning in some
    configurations:

    fs/block_dev.c: In function 'bdev_dax_supported':
    fs/block_dev.c:779:5: error: format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Werror=format=]
    "error: dax access failed (%d)", len);

    This changes it to use the correct %ld instead of %d.

    Fixes: 2093f2e9dfec ("block, dax: convert bdev_dax_supported() to dax_direct_access()")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Dan Williams

    Arnd Bergmann
     

26 Apr, 2017

2 commits


22 Apr, 2017

1 commit

  • Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    introduced blk_integrity_revalidate(), which seems to assume ownership
    of the stable pages flag and unilaterally clears it if no blk_integrity
    profile is registered:

    if (bi->profile)
    disk->queue->backing_dev_info->capabilities |=
    BDI_CAP_STABLE_WRITES;
    else
    disk->queue->backing_dev_info->capabilities &=
    ~BDI_CAP_STABLE_WRITES;

    It's called from revalidate_disk() and rescan_partitions(), making it
    impossible to enable stable pages for drivers that support partitions
    and don't use blk_integrity: while the call in revalidate_disk() can be
    trivially worked around (see zram, which doesn't support partitions and
    hence gets away with zram_revalidate_disk()), rescan_partitions() can
    be triggered from userspace at any time. This breaks rbd, where the
    ceph messenger is responsible for generating/verifying CRCs.

    Since blk_integrity_{un,}register() "must" be used for (un)registering
    the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
    setting there. This way drivers that call blk_integrity_register() and
    use integrity infrastructure won't interfere with drivers that don't
    but still want stable pages.

    Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    Cc: "Martin K. Petersen"
    Cc: Christoph Hellwig
    Cc: Mike Snitzer
    Cc: stable@vger.kernel.org # 4.4+, needs backporting
    Tested-by: Dan Williams
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Jens Axboe

    Ilya Dryomov
     

21 Apr, 2017

2 commits

  • Replace bdev_direct_access() with dax_direct_access() that uses
    dax_device and dax_operations instead of a block_device and
    block_device_operations for dax. Once all consumers of the old api have
    been converted bdev_direct_access() will be deleted.

    Given that block device partitioning decisions can cause dax page
    alignment constraints to be violated this also introduces the
    bdev_dax_pgoff() helper. It handles calculating a logical pgoff relative
    to the dax_device and also checks for page alignment.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • This is leftover dead code that has since been replaced by
    bdev_dax_supported().

    Signed-off-by: Dan Williams

    Dan Williams
     

09 Apr, 2017

2 commits


23 Mar, 2017

2 commits

  • When block device is closed, we call inode_detach_wb() in __blkdev_put()
    which sets inode->i_wb to NULL. That is contrary to expectations that
    inode->i_wb stays valid once set during the whole inode's lifetime and
    leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
    inode_to_wb() returned NULL.

    The reason why we called inode_detach_wb() is not valid anymore though.
    BDI is guaranteed to stay along until we call bdi_put() from
    bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
    moment.

    Also add a warning to catch if someone uses inode_detach_wb() in a
    dangerous way.

    Reported-by: Thiago Jung Bauermann
    Acked-by: Tejun Heo
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • When disk->fops->open() in __blkdev_get() returns -ERESTARTSYS, we
    restart the process of opening the block device. However we forget to
    switch bdev->bd_bdi back to noop_backing_dev_info and as a result bdev
    inode will be pointing to a stale bdi. Fix the problem by setting
    bdev->bd_bdi later when __blkdev_get() is already guaranteed to succeed.

    Acked-by: Tejun Heo
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

02 Mar, 2017

1 commit

  • So far we initialized bd_bdi only in bdget(). That is fine for normal
    bdev inodes however for the special case of the root inode of
    blockdev_superblock that function is never called and thus bd_bdi is
    left uninitialized. As a result bdev_evict_inode() may oops doing
    bdi_put(root->bd_bdi) on that inode as can be seen when doing:

    mount -t bdev none /mnt

    Fix the problem by initializing bd_bdi when first allocating the inode
    and then reinitializing bd_bdi in bdev_evict_inode().

    Thanks to syzkaller team for finding the problem.

    Reported-by: Dmitry Vyukov
    Fixes: b1d2dc5659b4 ("block: Make blk_get_backing_dev_info() safe without open bdev")
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

28 Feb, 2017

1 commit

  • Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
    branch.

    This patch also fixes multiple checkpatch warnings: WARNING: Prefer
    'unsigned int' to bare use of 'unsigned'

    Thanks to Andrew Morton for suggesting more appropriate function instead
    of macro.

    [geliangtang@gmail.com: truncate: use i_blocksize()]
    Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
    Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Signed-off-by: Geliang Tang
    Cc: Alexander Viro
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

22 Feb, 2017

1 commit

  • When a device gets removed, block device inode unhashed so that it is not
    used anymore (bdget() will not find it anymore). Later when a new device
    gets created with the same device number, we create new block device
    inode. However there may be file system device inodes whose i_bdev still
    points to the original block device inode and thus we get two active
    block device inodes for the same device. They will share the same
    gendisk so the only visible differences will be that page caches will
    not be coherent and BDIs will be different (the old block device inode
    still points to unregistered BDI).

    Fix the problem by checking in bd_acquire() whether i_bdev still points
    to active block device inode and re-lookup the block device if not. That
    way any open of a block device happening after the old device has been
    removed will get correct block device inode.

    Tested-by: Lekshmi Pillai
    Acked-by: Tejun Heo
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

18 Feb, 2017

1 commit


02 Feb, 2017

2 commits

  • Currenly blk_get_backing_dev_info() is not safe to be called when the
    block device is not open as bdev->bd_disk is NULL in that case. However
    inode_to_bdi() uses this function and may be call called from flusher
    worker or other writeback related functions without bdev being open
    which leads to crashes such as:

    [113031.075540] Unable to handle kernel paging request for data at address 0x00000000
    [113031.075614] Faulting instruction address: 0xc0000000003692e0
    0:mon> t
    [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
    [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
    [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
    [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
    [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
    [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
    [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
    [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c

    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Currently, block device inodes stay around after corresponding gendisk
    hash died until memory reclaim finds them and frees them. Since we will
    make block device inode pin the bdi, we want to free the block device
    inode as soon as the device goes away so that bdi does not stay around
    unnecessarily. Furthermore we need to avoid issues when new device with
    the same major,minor pair gets created since reusing the bdi structure
    would be rather difficult in this case.

    Unhashing block device inode on gendisk destruction nicely deals with
    these problems. Once last block device inode reference is dropped (which
    may be directly in del_gendisk()), the inode gets evicted. Furthermore if
    the major,minor pair gets reallocated, we are guaranteed to get new
    block device inode even if old block device inode is not yet evicted and
    thus we avoid issues with possible reuse of bdi.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

24 Jan, 2017

1 commit

  • We can't dereference the dio structure after submitting the last bio for
    this request, as I/O completion might have happened before the code is
    run. Introduce a local is_sync variable instead.

    Fixes: 542ff7bf ("block: new direct I/O implementation")
    Signed-off-by: Christoph Hellwig
    Reported-by: Matias Bjørling
    Tested-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

05 Jan, 2017

1 commit

  • Pull block layer fixes from Jens Axboe:
    "A set of fixes for the current series, one fixing a regression with
    block size < page cache size in the alias series from Jan. Outside of
    that, two small cleanups for wbt from Bart, a nvme pull request from
    Christoph, and a few small fixes of documentation updates"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix up io_poll documentation
    block: Avoid that sparse complains about context imbalance in __wbt_wait()
    block: Make wbt_wait() definition consistent with declaration
    clean_bdev_aliases: Prevent cleaning blocks that are not in block range
    genhd: remove dead and duplicated scsi code
    block: add back plugging in __blkdev_direct_IO
    nvmet/fcloop: remove some logically dead code performing redundant ret checks
    nvmet: fix KATO offset in Set Features
    nvme/fc: simplify error handling of nvme_fc_create_hw_io_queues
    nvme/fc: correct some printk information
    nvme/scsi: Remove START STOP emulation
    nvme/pci: Delete misleading queue-wrap comment
    nvme/pci: Fix whitespace problem
    nvme: simplify stripe quirk
    nvme: update maintainers information

    Linus Torvalds
     

25 Dec, 2016

1 commit


23 Dec, 2016

1 commit


14 Dec, 2016

2 commits

  • For sync direct IO, generic_file_direct_write/generic_file_read_iter
    will update file access position. Don't duplicate the update in
    .direct_IO. This cause my raid array can't assemble.

    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • bdev->bd_contains is not stable before calling __blkdev_get().
    When __blkdev_get() is called on a parition with ->bd_openers == 0
    it sets
    bdev->bd_contains = bdev;
    which is not correct for a partition.
    After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
    and then ->bd_contains is stable.

    When FMODE_EXCL is used, blkdev_get() calls
    bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim()

    This call happens before __blkdev_get() is called, so ->bd_contains
    is not stable. So bd_may_claim() cannot safely use ->bd_contains.
    It currently tries to use it, and this can lead to a BUG_ON().

    This happens when a whole device is already open with a bd_holder (in
    use by dm in my particular example) and two threads race to open a
    partition of that device for the first time, one opening with O_EXCL and
    one without.

    The thread that doesn't use O_EXCL gets through blkdev_get() to
    __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;

    Immediately thereafter the other thread, using FMODE_EXCL, calls
    bd_start_claiming() from blkdev_get(). This should fail because the
    whole device has a holder, but because bdev->bd_contains == bdev
    bd_may_claim() incorrectly reports success.
    This thread continues and blocks on bd_mutex.

    The first thread then sets bdev->bd_contains correctly and drops the mutex.
    The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
    again in:
    BUG_ON(!bd_may_claim(bdev, whole, holder));
    The BUG_ON fires.

    Fix this by removing the dependency on ->bd_contains in
    bd_may_claim(). As bd_may_claim() has direct access to the whole
    device, it can simply test if the target bdev is the whole device.

    Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block")
    Cc: stable@vger.kernel.org (v2.6.35+)
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     

01 Dec, 2016

1 commit

  • If a block device is closed while iterate_bdevs() is handling it, the
    following NULL pointer dereference occurs because bdev->b_disk is NULL
    in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
    turn called by the mapping_cap_writeback_dirty() call in
    __filemap_fdatawrite_range()):

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
    IP: [] blk_get_backing_dev_info+0x10/0x20
    PGD 9e62067 PUD 9ee8067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in:
    CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
    RIP: 0010:[] [] blk_get_backing_dev_info+0x10/0x20
    RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
    RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
    RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
    R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
    FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
    Stack:
    ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
    0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
    ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
    Call Trace:
    [] __filemap_fdatawrite_range+0x85/0x90
    [] filemap_fdatawrite+0x1f/0x30
    [] fdatawrite_one_bdev+0x16/0x20
    [] iterate_bdevs+0xf2/0x130
    [] sys_sync+0x63/0x90
    [] entry_SYSCALL_64_fastpath+0x12/0x76
    Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 8b 80 08 05 00 00 5d
    RIP [] blk_get_backing_dev_info+0x10/0x20
    RSP
    CR2: 0000000000000508
    ---[ end trace 2487336ceb3de62d ]---

    The crash is easily reproducible by running the following command, if an
    msleep(100) is inserted before the call to func() in iterate_devs():

    while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done

    Fix it by holding the bd_mutex across the func() call and only calling
    func() if the bdev is opened.

    Cc: stable@vger.kernel.org
    Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices")
    Reported-and-tested-by: Wei Fang
    Signed-off-by: Rabin Vincent
    Signed-off-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Rabin Vincent
     

22 Nov, 2016

3 commits

  • Some drivers often use external bvec table, so introduce
    this helper for this case. It is always safe to access the
    bio->bi_io_vec in this way for this case.

    After converting to this usage, it will becomes a bit easier
    to evaluate the remaining direct access to bio->bi_io_vec,
    so it can help to prepare for the following multipage bvec
    support.

    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig

    Fixed up the new O_DIRECT cases.

    Signed-off-by: Jens Axboe

    Ming Lei
     
  • We store the bits in the bdev sector size locally, but we don't use
    the calculation anymore. All we do with it is shift it back up to
    the bdev sector size. So let's just use that directly and kill the
    variable and bits calculation.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • A direct I/O alignment must be always checked against the device blocks size,
    but the I/O offset (bio->bi_iter.bi_sector must always use 512B sector unit, and
    not the actual logical block size.

    Signed-off-by: Damien Le Moal
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

18 Nov, 2016

4 commits


12 Oct, 2016

1 commit

  • After much discussion, it seems that the fallocate feature flag
    FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
    FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted
    for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is
    set. A length that goes past the end of the device will be clamped to the
    device size if KEEP_SIZE is set; or will return -EINVAL if not. Both
    start and length must be aligned to the device's logical block size.

    Since the semantics of fallocate are fairly well established already, wire
    up the two pieces. The other fallocate variants (collapse range, insert
    range, and allocate blocks) are not supported.

    Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Bart Van Assche
    Cc: Theodore Ts'o
    Cc: Martin K. Petersen
    Cc: Mike Snitzer # tweaked header
    Cc: Brian Foster
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     

06 Oct, 2016

1 commit

  • When triggering thaw-filesystems via magic sysrq, the system enters a
    loop in do_thaw_one(), as thaw_bdev() still returns success if
    bd_fsfreeze_count == 0. To fix this, let thaw_bdev() always return
    error (and simplify the code a bit at the same time).

    Reviewed-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Signed-off-by: Pierre Morel
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Pierre Morel
     

14 Sep, 2016

1 commit

  • DAX support for block devices was removed in commits 03cdad
    ("block: disable block device DAX by default") and 99a01cd
    ("block: remove BLK_DEV_DAX config option"), but we still kept a call to
    dax_do_io and some uneeded i_flags manipulations introduced in commit
    bbab37 ("block: Add support for DAX reads/writes to block devices").

    Remove those leftovers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Acked-by: Dan Williams
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

25 Aug, 2016

1 commit

  • Calling freeze_bdev() twice on the same block device without mounted
    filesystem get_super() will return NULL, which will lead to NULL-ptr
    dereference later in drop_super().

    Check get_super() result to fix that.

    Note, that this is a purely theoretical issue. We have only 3
    freeze_bdev() callers. 2 of them are in filesystem code and used on a
    device with mounted fs. The third one in lock_fs() has protection in
    upper-layer code against freezing block device the second time without
    thawing it first.

    Signed-off-by: Andrey Ryabinin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Andrey Ryabinin
     

22 Aug, 2016

1 commit

  • I got this:

    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    CPU: 0 PID: 5505 Comm: syz-executor Not tainted 4.8.0-rc2+ #161
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    task: ffff880113415940 task.stack: ffff880118350000
    RIP: 0010:[] [] bd_mount+0x52/0xa0
    RSP: 0018:ffff880118357ca0 EFLAGS: 00010207
    RAX: dffffc0000000000 RBX: ffffffffffffffff RCX: ffffc90000bb6000
    RDX: 0000000000000018 RSI: ffffffff846d6b20 RDI: 00000000000000c7
    RBP: ffff880118357cb0 R08: ffff880115967c68 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801188211e8
    R13: ffffffff847baa20 R14: ffff8801139cb000 R15: 0000000000000080
    FS: 00007fa3ff6c0700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc1d8cc7e78 CR3: 0000000109f20000 CR4: 00000000000006f0
    DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Stack:
    ffff880112cfd6c0 ffff8801188211e8 ffff880118357cf0 ffffffff8167f207
    ffffffff816d7a1e ffff880112a413c0 ffffffff847baa20 ffff8801188211e8
    0000000000000080 ffff880112cfd6c0 ffff880118357d38 ffffffff816dce0a
    Call Trace:
    [] mount_fs+0x97/0x2e0
    [] ? alloc_vfsmnt+0x55e/0x760
    [] vfs_kern_mount+0x7a/0x300
    [] ? _raw_read_unlock+0x2c/0x50
    [] do_mount+0x3d7/0x2730
    [] ? trace_do_page_fault+0x1f4/0x3a0
    [] ? copy_mount_string+0x40/0x40
    [] ? memset+0x31/0x40
    [] ? copy_mount_options+0x1ee/0x320
    [] SyS_mount+0xb2/0x120
    [] ? copy_mnt_ns+0x970/0x970
    [] do_syscall_64+0x1c4/0x4e0
    [] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 83 e8 63 1b fc ff 48 85 c0 48 89 c3 74 4c e8 56 35 d1 ff 48 8d bb c8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 3c 02 00 75 36 4c 8b a3 c8 00 00 00 48 b8 00 00 00 00 00 fc
    RIP [] bd_mount+0x52/0xa0
    RSP
    ---[ end trace 13690ad962168b98 ]---

    mount_pseudo() returns ERR_PTR(), not NULL, on error.

    Fixes: 3684aa7099e0 ("block-dev: enable writeback cgroup support")
    Cc: Shaohua Li
    Cc: Tejun Heo
    Cc: Jens Axboe
    Cc: stable@vger.kernel.org
    Signed-off-by: Vegard Nossum
    Signed-off-by: Jens Axboe

    Vegard Nossum
     

08 Aug, 2016

1 commit

  • Commit abf545484d31 changed it from an 'rw' flags type to the
    newer ops based interface, but now we're effectively leaking
    some bdev internals to the rest of the kernel. Since we only
    care about whether it's a read or a write at that level, just
    pass in a bool 'is_write' parameter instead.

    Then we can also move op_is_write() and friends back under
    CONFIG_BLOCK protection.

    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Jens Axboe