06 Oct, 2020

1 commit

  • All remaining callers of bdget() outside of fs/block_dev.c want to get a
    reference to the struct block_device for a given struct hd_struct. Add
    a helper just for that and then mark bdget static.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Sep, 2020

3 commits


10 Sep, 2020

2 commits


08 Sep, 2020

1 commit

  • Discarding blocks and buffers under a mounted filesystem is hardly
    anything admin wants to do. Usually it will confuse the filesystem and
    sometimes the loss of buffer_head state (including b_private field) can
    even cause crashes like:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    PGD 0 P4D 0
    Oops: 0002 [#1] SMP PTI
    CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G O --------- - - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
    Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
    RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
    ...
    Call Trace:
    __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
    jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
    kjournald2+0xbd/0x270 [jbd2]

    So if we don't have block device open with O_EXCL already, claim the
    block device while we truncate buffer cache. This makes sure any
    exclusive block device user (such as filesystem) cannot operate on the
    device while we are discarding buffer cache.

    Reported-by: Ye Bin
    Signed-off-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    [axboe: fix !CONFIG_BLOCK error in truncate_bdev_range()]
    Signed-off-by: Jens Axboe

    Jan Kara
     

02 Sep, 2020

7 commits

  • Remove the now unused helper.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Josef Bacik
    Reviewed-by: Johannes Thumshirn
    Acked-by: Song Liu
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • revalidate_disk is a relative awkward helper for driver use, as it first
    calls an optional driver method and then updates the block device size,
    while most callers either don't need the method call at all, or want to
    keep state between the caller and the called method.

    Add a revalidate_disk_size helper that just performs the update of the
    block device size from the gendisk one, and switch all drivers that do
    not implement ->revalidate_disk to use the new helper instead of
    revalidate_disk()

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Josef Bacik
    Reviewed-by: Johannes Thumshirn
    Acked-by: Song Liu
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
    variable to better describe the condition.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Josef Bacik
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • bd_invalidated is set by check_disk_change or in add_disk to initiate a
    partition scan. Move it from check_disk_size_change which is called
    from both revalidate_disk() and bdev_disk_changed() to only the latter,
    as that is what is called from the block device open code (and nbd) to
    deal with the bd_invalidated event. revalidate_disk() on the other hand
    is mostly used to propagate a size update from the gendisk to the block
    device, which is entirely unrelated.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Josef Bacik
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • devcgroup_inode_permission is never called for the recusive case, so
    move it out into blkdev_get.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Two different callers use two different mutexes for updating the
    block device size, which obviously doesn't help to actually protect
    against concurrent updates from the different callers. In addition
    one of the locks, bd_mutex is rather prone to deadlocks with other
    parts of the block stack that use it for high level synchronization.

    Switch to using a new spinlock protecting just the size updates, as
    that is all we need, and make sure everyone does the update through
    the proper helper.

    This fixes a bug reported with the nvme revalidating disks during a
    hot removal operation, which can currently deadlock on bd_mutex.

    Reported-by: Xianting Tian
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Replace bd_set_size with a version that takes the number of sectors
    instead, as that fits most of the current and future callers much better.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

04 Aug, 2020

1 commit

  • Pull io_uring updates from Jens Axboe:
    "Lots of cleanups in here, hardening the code and/or making it easier
    to read and fixing bugs, but a core feature/change too adding support
    for real async buffered reads. With the latter in place, we just need
    buffered write async support and we're done relying on kthreads for
    the fast path. In detail:

    - Cleanup how memory accounting is done on ring setup/free (Bijan)

    - sq array offset calculation fixup (Dmitry)

    - Consistently handle blocking off O_DIRECT submission path (me)

    - Support proper async buffered reads, instead of relying on kthread
    offload for that. This uses the page waitqueue to drive retries
    from task_work, like we handle poll based retry. (me)

    - IO completion optimizations (me)

    - Fix race with accounting and ring fd install (me)

    - Support EPOLLEXCLUSIVE (Jiufei)

    - Get rid of the io_kiocb unionizing, made possible by shrinking
    other bits (Pavel)

    - Completion side cleanups (Pavel)

    - Cleanup REQ_F_ flags handling, and kill off many of them (Pavel)

    - Request environment grabbing cleanups (Pavel)

    - File and socket read/write cleanups (Pavel)

    - Improve kiocb_set_rw_flags() (Pavel)

    - Tons of fixes and cleanups (Pavel)

    - IORING_SQ_NEED_WAKEUP clear fix (Xiaoguang)"

    * tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block: (127 commits)
    io_uring: flip if handling after io_setup_async_rw
    fs: optimise kiocb_set_rw_flags()
    io_uring: don't touch 'ctx' after installing file descriptor
    io_uring: get rid of atomic FAA for cq_timeouts
    io_uring: consolidate *_check_overflow accounting
    io_uring: fix stalled deferred requests
    io_uring: fix racy overflow count reporting
    io_uring: deduplicate __io_complete_rw()
    io_uring: de-unionise io_kiocb
    io-wq: update hash bits
    io_uring: fix missing io_queue_linked_timeout()
    io_uring: mark ->work uninitialised after cleanup
    io_uring: deduplicate io_grab_files() calls
    io_uring: don't do opcode prep twice
    io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works
    io_uring: batch put_task_struct()
    tasks: add put_task_struct_many()
    io_uring: return locked and pinned page accounting
    io_uring: don't miscount pinned memory
    io_uring: don't open-code recv kbuf managment
    ...

    Linus Torvalds
     

16 Jul, 2020

4 commits


09 Jul, 2020

1 commit

  • flush_disk has only two callers, so open code it there. That also helps
    clarifying the error message for the particular case, and allows to remove
    setting bd_invalidated in check_disk_size_change, which will be cleared
    again instantly.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Jul, 2020

4 commits


24 Jun, 2020

1 commit


22 Jun, 2020

1 commit


18 Jun, 2020

1 commit


17 Jun, 2020

1 commit

  • In blkdev_get() we call __blkdev_get() to do some internal jobs and if
    there is some errors in __blkdev_get(), the bdput() is called which
    means we have released the refcount of the bdev (actually the refcount of
    the bdev inode). This means we cannot access bdev after that point. But
    acctually bdev is still accessed in blkdev_get() after calling
    __blkdev_get(). This results in use-after-free if the refcount is the
    last one we released in __blkdev_get(). Let's take a look at the
    following scenerio:

    CPU0 CPU1 CPU2
    blkdev_open blkdev_open Remove disk
    bd_acquire
    blkdev_get
    __blkdev_get del_gendisk
    bdev_unhash_inode
    bd_acquire bdev_get_gendisk
    bd_forget failed because of unhashed
    bdput
    bdput (the last one)
    bdev_evict_inode

    access bdev => use after free

    [ 459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
    [ 459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132
    [ 459.352347]
    [ 459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
    [ 459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 459.354947] Call Trace:
    [ 459.355337] dump_stack+0x111/0x19e
    [ 459.355879] ? __lock_acquire+0x24c1/0x31b0
    [ 459.356523] print_address_description+0x60/0x223
    [ 459.357248] ? __lock_acquire+0x24c1/0x31b0
    [ 459.357887] kasan_report.cold+0xae/0x2d8
    [ 459.358503] __lock_acquire+0x24c1/0x31b0
    [ 459.359120] ? _raw_spin_unlock_irq+0x24/0x40
    [ 459.359784] ? lockdep_hardirqs_on+0x37b/0x580
    [ 459.360465] ? _raw_spin_unlock_irq+0x24/0x40
    [ 459.361123] ? finish_task_switch+0x125/0x600
    [ 459.361812] ? finish_task_switch+0xee/0x600
    [ 459.362471] ? mark_held_locks+0xf0/0xf0
    [ 459.363108] ? __schedule+0x96f/0x21d0
    [ 459.363716] lock_acquire+0x111/0x320
    [ 459.364285] ? blkdev_get+0xce/0xbe0
    [ 459.364846] ? blkdev_get+0xce/0xbe0
    [ 459.365390] __mutex_lock+0xf9/0x12a0
    [ 459.365948] ? blkdev_get+0xce/0xbe0
    [ 459.366493] ? bdev_evict_inode+0x1f0/0x1f0
    [ 459.367130] ? blkdev_get+0xce/0xbe0
    [ 459.367678] ? destroy_inode+0xbc/0x110
    [ 459.368261] ? mutex_trylock+0x1a0/0x1a0
    [ 459.368867] ? __blkdev_get+0x3e6/0x1280
    [ 459.369463] ? bdev_disk_changed+0x1d0/0x1d0
    [ 459.370114] ? blkdev_get+0xce/0xbe0
    [ 459.370656] blkdev_get+0xce/0xbe0
    [ 459.371178] ? find_held_lock+0x2c/0x110
    [ 459.371774] ? __blkdev_get+0x1280/0x1280
    [ 459.372383] ? lock_downgrade+0x680/0x680
    [ 459.373002] ? lock_acquire+0x111/0x320
    [ 459.373587] ? bd_acquire+0x21/0x2c0
    [ 459.374134] ? do_raw_spin_unlock+0x4f/0x250
    [ 459.374780] blkdev_open+0x202/0x290
    [ 459.375325] do_dentry_open+0x49e/0x1050
    [ 459.375924] ? blkdev_get_by_dev+0x70/0x70
    [ 459.376543] ? __x64_sys_fchdir+0x1f0/0x1f0
    [ 459.377192] ? inode_permission+0xbe/0x3a0
    [ 459.377818] path_openat+0x148c/0x3f50
    [ 459.378392] ? kmem_cache_alloc+0xd5/0x280
    [ 459.379016] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 459.379802] ? path_lookupat.isra.0+0x900/0x900
    [ 459.380489] ? __lock_is_held+0xad/0x140
    [ 459.381093] do_filp_open+0x1a1/0x280
    [ 459.381654] ? may_open_dev+0xf0/0xf0
    [ 459.382214] ? find_held_lock+0x2c/0x110
    [ 459.382816] ? lock_downgrade+0x680/0x680
    [ 459.383425] ? __lock_is_held+0xad/0x140
    [ 459.384024] ? do_raw_spin_unlock+0x4f/0x250
    [ 459.384668] ? _raw_spin_unlock+0x1f/0x30
    [ 459.385280] ? __alloc_fd+0x448/0x560
    [ 459.385841] do_sys_open+0x3c3/0x500
    [ 459.386386] ? filp_open+0x70/0x70
    [ 459.386911] ? trace_hardirqs_on_thunk+0x1a/0x1c
    [ 459.387610] ? trace_hardirqs_off_caller+0x55/0x1c0
    [ 459.388342] ? do_syscall_64+0x1a/0x520
    [ 459.388930] do_syscall_64+0xc3/0x520
    [ 459.389490] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 459.390248] RIP: 0033:0x416211
    [ 459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
    04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
    05 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
    01
    [ 459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
    [ 459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211
    [ 459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00
    [ 459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a
    [ 459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff
    [ 459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c
    [ 459.400168]
    [ 459.400430] Allocated by task 20132:
    [ 459.401038] kasan_kmalloc+0xbf/0xe0
    [ 459.401652] kmem_cache_alloc+0xd5/0x280
    [ 459.402330] bdev_alloc_inode+0x18/0x40
    [ 459.402970] alloc_inode+0x5f/0x180
    [ 459.403510] iget5_locked+0x57/0xd0
    [ 459.404095] bdget+0x94/0x4e0
    [ 459.404607] bd_acquire+0xfa/0x2c0
    [ 459.405113] blkdev_open+0x110/0x290
    [ 459.405702] do_dentry_open+0x49e/0x1050
    [ 459.406340] path_openat+0x148c/0x3f50
    [ 459.406926] do_filp_open+0x1a1/0x280
    [ 459.407471] do_sys_open+0x3c3/0x500
    [ 459.408010] do_syscall_64+0xc3/0x520
    [ 459.408572] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 459.409415]
    [ 459.409679] Freed by task 1262:
    [ 459.410212] __kasan_slab_free+0x129/0x170
    [ 459.410919] kmem_cache_free+0xb2/0x2a0
    [ 459.411564] rcu_process_callbacks+0xbb2/0x2320
    [ 459.412318] __do_softirq+0x225/0x8ac

    Fix this by delaying bdput() to the end of blkdev_get() which means we
    have finished accessing bdev.

    Fixes: 77ea887e433a ("implement in-kernel gendisk events handling")
    Reported-by: Hulk Robot
    Signed-off-by: Jason Yan
    Tested-by: Sedat Dilek
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Dan Carpenter
    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Cc: Ming Lei
    Cc: Jan Kara
    Cc: Dan Carpenter
    Signed-off-by: Jens Axboe

    Jason Yan
     

03 Jun, 2020

4 commits

  • Pull block driver updates from Jens Axboe:
    "On top of the core changes, here are the block driver changes for this
    merge window:

    - NVMe changes:
    - NVMe over Fibre Channel protocol updates, which also reach
    over to drivers/scsi/lpfc (James Smart)
    - namespace revalidation support on the target (Anthony
    Iliopoulos)
    - gcc zero length array fix (Arnd Bergmann)
    - nvmet cleanups (Chaitanya Kulkarni)
    - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
    - use a SRQ per completion vector (Max Gurtovoy)
    - fix handling of runtime changes to the queue count (Weiping
    Zhang)
    - t10 protection information support for nvme-rdma and
    nvmet-rdma (Israel Rukshin and Max Gurtovoy)
    - target side AEN improvements (Chaitanya Kulkarni)
    - various fixes and minor improvements all over, icluding the
    nvme part of the lpfc driver"

    - Floppy code cleanup series (Willy, Denis)

    - Floppy contention fix (Jiri)

    - Loop CONFIGURE support (Martijn)

    - bcache fixes/improvements (Coly, Joe, Colin)

    - q->queuedata cleanups (Christoph)

    - Get rid of ioctl_by_bdev (Christoph, Stefan)

    - md/raid5 allocation fixes (Coly)

    - zero length array fixes (Gustavo)

    - swim3 task state fix (Xu)"

    * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
    bcache: configure the asynchronous registertion to be experimental
    bcache: asynchronous devices registration
    bcache: fix refcount underflow in bcache_device_free()
    bcache: Convert pr_ uses to a more typical style
    bcache: remove redundant variables i and n
    lpfc: Fix return value in __lpfc_nvme_ls_abort
    lpfc: fix axchg pointer reference after free and double frees
    lpfc: Fix pointer checks and comments in LS receive refactoring
    nvme: set dma alignment to qword
    nvmet: cleanups the loop in nvmet_async_events_process
    nvmet: fix memory leak when removing namespaces and controllers concurrently
    nvmet-rdma: add metadata/T10-PI support
    nvmet: add metadata support for block devices
    nvmet: add metadata/T10-PI support
    nvme: add Metadata Capabilities enumerations
    nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
    nvmet: rename nvmet_rw_len to nvmet_rw_data_len
    nvmet: add metadata characteristics for a namespace
    nvme-rdma: add metadata/T10-PI support
    nvme-rdma: introduce nvme_rdma_sgl structure
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "Core block changes that have been queued up for this release:

    - Remove dead blk-throttle and blk-wbt code (Guoqing)

    - Include pid in blktrace note traces (Jan)

    - Don't spew I/O errors on wouldblock termination (me)

    - Zone append addition (Johannes, Keith, Damien)

    - IO accounting improvements (Konstantin, Christoph)

    - blk-mq hardware map update improvements (Ming)

    - Scheduler dispatch improvement (Salman)

    - Inline block encryption support (Satya)

    - Request map fixes and improvements (Weiping)

    - blk-iocost tweaks (Tejun)

    - Fix for timeout failing with error injection (Keith)

    - Queue re-run fixes (Douglas)

    - CPU hotplug improvements (Christoph)

    - Queue entry/exit improvements (Christoph)

    - Move DMA drain handling to the few drivers that use it (Christoph)

    - Partition handling cleanups (Christoph)"

    * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
    block: mark bio_wouldblock_error() bio with BIO_QUIET
    blk-wbt: rename __wbt_update_limits to wbt_update_limits
    blk-wbt: remove wbt_update_limits
    blk-throttle: remove tg_drain_bios
    blk-throttle: remove blk_throtl_drain
    null_blk: force complete for timeout request
    blk-mq: drain I/O when all CPUs in a hctx are offline
    blk-mq: add blk_mq_all_tag_iter
    blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
    blk-mq: use BLK_MQ_NO_TAG in more places
    blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
    blk-mq: move more request initialization to blk_mq_rq_ctx_init
    blk-mq: simplify the blk_mq_get_request calling convention
    blk-mq: remove the bio argument to ->prepare_request
    nvme: force complete cancelled requests
    blk-mq: blk-mq: provide forced completion method
    block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
    block: blk-crypto-fallback: remove redundant initialization of variable err
    block: reduce part_stat_lock() scope
    block: use __this_cpu_add() instead of access by smp_processor_id()
    ...

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:
    "These rework the system-wide PM driver flags, make runtime switching
    of cpuidle governors easier, improve the user space hibernation
    interface code, add intel-speed-select interface documentation, add
    more debug messages to the ACPI code handling suspend to idle, update
    the cpufreq core and drivers, fix a minor issue in the cpuidle core
    and update two cpuidle drivers, improve the PM-runtime framework,
    update the Intel RAPL power capping driver, update devfreq core and
    drivers, and clean up the cpupower utility.

    Specifics:

    - Rework the system-wide PM driver flags to make them easier to
    understand and use and update their documentation (Rafael Wysocki,
    Alan Stern).

    - Allow cpuidle governors to be switched at run time regardless of
    the kernel configuration and update the related documentation
    accordingly (Hanjun Guo).

    - Improve the resume device handling in the user space hibernarion
    interface code (Domenico Andreoli).

    - Document the intel-speed-select sysfs interface (Srinivas
    Pandruvada).

    - Make the ACPI code handing suspend to idle print more debug
    messages to help diagnose issues with it (Rafael Wysocki).

    - Fix a helper routine in the cpufreq core and correct a typo in the
    struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang
    Wenhu).

    - Update cpufreq drivers:

    - Make the intel_pstate driver start in the passive mode by
    default on systems without HWP (Rafael Wysocki).

    - Add i.MX7ULP support to the imx-cpufreq-dt driver and add
    i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan).

    - Convert the qoriq cpufreq driver to a platform one, make the
    platform code create a suitable device object for it and add
    platform dependencies to it (Mian Yousaf Kaukab, Geert
    Uytterhoeven).

    - Fix wrong compatible binding in the qcom driver (Ansuel Smith).

    - Build the omap driver by default for ARCH_OMAP2PLUS (Anders
    Roxell).

    - Add r8a7742 SoC support to the dt cpufreq driver (Lad
    Prabhakar).

    - Update cpuidle core and drivers:

    - Fix three reference count leaks in error code paths in the
    cpuidle core (Qiushi Wu).

    - Convert Qualcomm SPM to a generic cpuidle driver (Stephan
    Gerhold).

    - Fix up the execution order when entering a domain idle state in
    the PSCI driver (Ulf Hansson).

    - Fix a reference counting issue related to clock management and
    clean up two oddities in the PM-runtime framework (Rafael Wysocki,
    Andy Shevchenko).

    - Add ElkhartLake support to the Intel RAPL power capping driver and
    remove an unused local MSR definition from it (Jacob Pan, Sumeet
    Pawnikar).

    - Update devfreq core and drivers:

    - Replace strncpy() with strscpy() in the devfreq core and use
    lockdep asserts instead of manual checks for a locked mutex in
    it (Dmitry Osipenko, Krzysztof Kozlowski).

    - Add a generic imx bus scaling driver and make it register an
    interconnect device (Leonard Crestez, Gustavo A. R. Silva).

    - Make the cpufreq notifier in the tegra30 driver take boosting
    into account and delete an unuseful error message from that
    driver (Dmitry Osipenko, Markus Elfring).

    - Remove unneeded semicolon from the cpupower code (Zou Wei)"

    * tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits)
    cpuidle: Fix three reference count leaks
    PM: runtime: Replace pm_runtime_callbacks_present()
    PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex
    PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR
    PM / devfreq: Replace strncpy with strscpy
    PM / devfreq: imx: Register interconnect device
    PM / devfreq: Add generic imx bus scaling driver
    PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe()
    PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting
    PM: hibernate: Restrict writes to the resume device
    PM: runtime: clk: Fix clk_pm_runtime_get() error path
    cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver
    ACPI: EC: PM: s2idle: Extend GPE dispatching debug message
    ACPI: PM: s2idle: Print type of wakeup debug messages
    powercap: RAPL: remove unused local MSR define
    PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend()
    Documentation: admin-guide: pm: Document intel-speed-select
    PM: hibernate: Split off snapshot dev option
    PM: hibernate: Incorporate concurrency handling
    Documentation: ABI: make current_governer_ro as a candidate for removal
    ...

    Linus Torvalds
     
  • Implement the new readahead aop and convert all callers (block_dev,
    exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
    reiserfs & udf).

    The callers are all trivial except for GFS2 & OCFS2.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Reviewed-by: Junxiao Bi # ocfs2
    Reviewed-by: Joseph Qi # ocfs2
    Reviewed-by: Dave Chinner
    Reviewed-by: John Hubbard
    Reviewed-by: Christoph Hellwig
    Reviewed-by: William Kucharski
    Cc: Chao Yu
    Cc: Cong Wang
    Cc: Darrick J. Wong
    Cc: Eric Biggers
    Cc: Gao Xiang
    Cc: Jaegeuk Kim
    Cc: Michal Hocko
    Cc: Zi Yan
    Cc: Johannes Thumshirn
    Cc: Miklos Szeredi
    Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

27 May, 2020

1 commit

  • Hibernation via snapshot device requires write permission to the swap
    block device, the one that more often (but not necessarily) is used to
    store the hibernation image.

    With this patch, such permissions are granted iff:

    1) snapshot device config option is enabled
    2) swap partition is used as resume device

    In other circumstances the swap device is not writable from userspace.

    In order to achieve this, every write attempt to a swap device is
    checked against the device configured as part of the uswsusp API [0]
    using a pointer to the inode struct in memory. If the swap device being
    written was not configured for resuming, the write request is denied.

    NOTE: this implementation works only for swap block devices, where the
    inode configured by swapon (which sets S_SWAPFILE) is the same used
    by SNAPSHOT_SET_SWAP_AREA.

    In case of swap file, SNAPSHOT_SET_SWAP_AREA indeed receives the inode
    of the block device containing the filesystem where the swap file is
    located (+ offset in it) which is never passed to swapon and then has
    not set S_SWAPFILE.

    As result, the swap file itself (as a file) has never an option to be
    written from userspace. Instead it remains writable if accessed directly
    from the containing block device, which is always writeable from root.

    [0] Documentation/power/userland-swsusp.rst

    v2:
    - rename is_hibernate_snapshot_dev() to is_hibernate_resume_dev()
    - fix description so to correctly refer to the resume device

    Signed-off-by: Domenico Andreoli
    Acked-by: Darrick J. Wong
    Signed-off-by: Rafael J. Wysocki

    Domenico Andreoli
     

22 May, 2020

1 commit


21 May, 2020

1 commit


13 May, 2020

1 commit

  • Sync dio could be big, or may take long time in discard or in case of
    IO failure.

    We have prevented task hung in submit_bio_wait() and blk_execute_rq(),
    so apply the same trick for prevent task hung from happening in sync dio.

    Add helper of blk_io_schedule() and use io_schedule_timeout() to prevent
    task hung warning.

    Signed-off-by: Ming Lei
    Reviewed-by: Bart Van Assche
    Cc: Salman Qazi
    Cc: Jesse Barnes
    Cc: Christoph Hellwig
    Cc: Bart Van Assche
    Cc: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Ming Lei
     

10 May, 2020

1 commit

  • Pull in block-5.7 fixes for 5.8. Mostly to resolve a conflict with
    the blk-iocost changes, but we also need the base of the bdi
    use-after-free as well as we build on top of it.

    * block-5.7:
    nvme: fix possible hang when ns scanning fails during error recovery
    nvme-pci: fix "slimmer CQ head update"
    bdi: add a ->dev_name field to struct backing_dev_info
    bdi: use bdi_dev_name() to get device name
    bdi: move bdi_dev_name out of line
    vboxsf: don't use the source name in the bdi name
    iocost: protect iocg->abs_vdebt with iocg->waitq.lock
    block: remove the bd_openers checks in blk_drop_partitions
    nvme: prevent double free in nvme_alloc_ns() error handling
    null_blk: Cleanup zoned device initialization
    null_blk: Fix zoned command handling
    block: remove unused header
    blk-iocost: Fix error on iocost_ioc_vrate_adj
    bdev: Reduce time holding bd_mutex in sync in blkdev_close()
    buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

24 Apr, 2020

1 commit


21 Apr, 2020

2 commits