Eric Lee / smarc-fsl-linux-kernel

30 Dec, 2019

1 commit

429120f3d block: fix splitting segments on boundary masks ... Browse Code »

We ran into a problem with a mpt3sas based controller, where we would
see random (and hard to reproduce) file corruption). The issue seemed
specific to this controller, but wasn't specific to the file system.
After a lot of debugging, we find out that it's caused by segments
spanning a 4G memory boundary. This shouldn't happen, as the default
setting for segment boundary masks is 4G.

Turns out there are two issues in get_max_segment_size():

1) The default segment boundary mask is bypassed

2) The segment start address isn't taken into account when checking
segment boundary limit

Fix these two issues by removing the bypass of the segment boundary
check even if the mask is set to the default value, and taking into
account the actual start address of the request when checking if a
segment needs splitting.

Cc: stable@vger.kernel.org # v5.1+
Reviewed-by: Chris Mason
Tested-by: Chris Mason
Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Signed-off-by: Ming Lei

Dropped const on the page pointer, ppc page_to_phys() doesn't mark the
page as const...

Signed-off-by: Jens Axboe

Ming Lei
2019-12-30 23:51:18 +0800

29 Dec, 2019

1 commit

85a8ce62c block: add bio_truncate to fix guard_bio_eod ... Browse Code »

Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.

Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.

Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.

Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:

- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.

- zero all truncated segments for read bio

Cc: Carlos Maiolino
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-12-29 00:44:56 +0800

21 Dec, 2019

7 commits

b2c0fcd28 compat_ioctl: block: handle Persistent Reservations ... Browse Code »

These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.

Cc: # v4.4+
Fixes: bbd3e064362e ("block: add an API for Persistent Reservations")
Signed-off-by: Arnd Bergmann

Fold in followup patch from Arnd with missing pr.h header include.

Signed-off-by: Jens Axboe

Arnd Bergmann
2019-12-21 22:26:56 +0800
4b43f31d6 compat_ioctl: block: handle add zone open, close and finish ioctl ... Browse Code »

These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.

Fixes: e876df1fe0ad ("block: add zone open, close and finish ioctl support")
Reviewed-by: Damien Le Moal
Signed-off-by: Arnd Bergmann
Signed-off-by: Jens Axboe

Arnd Bergmann
2019-12-21 22:26:41 +0800
21d373409 compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES ... Browse Code »

These were added to blkdev_ioctl() in v4.20 but not blkdev_compat_ioctl,
so add them now.

Cc: # v4.20+
Fixes: 72cd87576d1d ("block: Introduce BLKGETZONESZ ioctl")
Fixes: 65e4e3eee83d ("block: Introduce BLKGETNRZONES ioctl")
Reviewed-by: Damien Le Moal
Signed-off-by: Arnd Bergmann
Signed-off-by: Jens Axboe

Arnd Bergmann
2019-12-21 22:26:41 +0800
673bdf8ce compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE ... Browse Code »

These were added to blkdev_ioctl() but not blkdev_compat_ioctl,
so add them now.

Cc: # v4.10+
Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls")
Reviewed-by: Damien Le Moal
Signed-off-by: Arnd Bergmann
Signed-off-by: Jens Axboe

Arnd Bergmann
2019-12-21 22:26:40 +0800
3b7995a98 block: fix memleak when __blk_rq_map_user_iov() is failed ... Browse Code »

When I doing fuzzy test, get the memleak report:

BUG: memory leak
unreferenced object 0xffff88837af80000 (size 4096):
comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00 ...............
backtrace:
[] bio_alloc_bioset+0x393/0x590
[] bio_copy_user_iov+0x300/0xcd0
[] blk_rq_map_user_iov+0x2f1/0x5f0
[] blk_rq_map_user+0xf2/0x160
[] sg_common_write.isra.21+0x1094/0x1870
[] sg_write.part.25+0x5d9/0x950
[] sg_write+0x5f/0x8c
[] __vfs_write+0x7c/0x100
[] vfs_write+0x1c3/0x500
[] ksys_write+0xf9/0x200
[] do_syscall_64+0x9f/0x4f0
[] entry_SYSCALL_64_after_hwframe+0x49/0xbe

If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().

Reviewed-by: Bob Liu
Reported-by: Hulk Robot
Signed-off-by: Yang Yingliang
Signed-off-by: Jens Axboe

Yang Yingliang
2019-12-21 02:52:01 +0800
b3c6a5997 block: Fix a lockdep complaint triggered by request queue flushing ... Browse Code »

Avoid that running test nvme/012 from the blktests suite triggers the
following false positive lockdep complaint:

============================================
WARNING: possible recursive locking detected
5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
--------------------------------------------
ksoftirqd/1/16 is trying to acquire lock:
000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

but task is already holding lock:
00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(&fq->mq_flush_lock)->rlock);
lock(&(&fq->mq_flush_lock)->rlock);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by ksoftirqd/1/16:
#0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.45+0x2b4/0x313
lock_acquire+0x98/0x160
_raw_spin_lock_irqsave+0x3b/0x80
flush_end_io+0x4e/0x1d0
blk_mq_complete_request+0x76/0x110
nvmet_req_complete+0x15/0x110 [nvmet]
nvmet_bio_done+0x27/0x50 [nvmet]
blk_update_request+0xd7/0x2d0
blk_mq_end_request+0x1a/0x100
blk_flush_complete_seq+0xe5/0x350
flush_end_io+0x12f/0x1d0
blk_done_softirq+0x9f/0xd0
__do_softirq+0xca/0x440
run_ksoftirqd+0x24/0x50
smpboot_thread_fn+0x113/0x1e0
kthread+0x121/0x140
ret_from_fork+0x3a/0x50

Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-12-21 02:52:01 +0800
c44a4edb2 block: Fix the type of 'sts' in bsg_queue_rq() ... Browse Code »

This patch fixes the following sparse warnings:

block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
block/bsg-lib.c:269:19: expected int sts
block/bsg-lib.c:269:19: got restricted blk_status_t [usertype]
block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
block/bsg-lib.c:286:16: expected restricted blk_status_t
block/bsg-lib.c:286:16: got int [assigned] sts

Cc: Martin Wilck
Fixes: d46fe2cb2dce ("block: drop device references in bsg_queue_rq()")
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-12-21 02:52:01 +0800

18 Dec, 2019

1 commit

c58c1f834 block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT ... Browse Code »

Non-mq devs do not honor REQ_NOWAIT so give a chance to the caller to repeat
request gracefully on -EAGAIN error.

The problem is well reproduced using io_uring:

mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt

# Preallocate a file
dd if=/dev/zero of=/mnt/file bs=1M count=1

# Start fio with io_uring and get -EIO
fio --rw=write --ioengine=io_uring --size=1M --direct=1 --name=job --filename=/mnt/file

Signed-off-by: Roman Penyaev
Signed-off-by: Jens Axboe

Roman Penyaev
2019-12-18 00:01:43 +0800

17 Dec, 2019

1 commit

d7bd15a13 iocost: over-budget forced IOs should schedule async delay ... Browse Code »

When over-budget IOs are force-issued through root cgroup,
iocg_kick_delay() adjusts the async delay accordingly but doesn't
actually schedule async throttle for the issuing task. This bug is
pretty well masked because sooner or later the offending threads are
gonna get directly throttled on regular IOs or have async delay
scheduled by mem_cgroup_throttle_swaprate().

However, it can affect control quality on filesystem metadata heavy
operations. Let's fix it by invoking blkcg_schedule_throttle() when
iocg_kick_delay() says async delay is needed.

Signed-off-by: Tejun Heo
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org
Reported-by: Josef Bacik
Signed-off-by: Jens Axboe

Tejun Heo
2019-12-17 07:10:17 +0800

14 Dec, 2019

1 commit

f1fcd7786 Merge tag 'for-linus-20191212' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- stable fix for the bi_size overflow. Not a corruption issue, but a
case wher we could merge but disallowed (Andreas)

- NVMe pull request via Keith, with various fixes.

- MD pull request from Song.

- Merge window regression fix for the rq passthrough stats (Logan)

- Remove unused blkcg_drain_queue() function (Guoqing)

* tag 'for-linus-20191212' of git://git.kernel.dk/linux-block:
blk-cgroup: remove blkcg_drain_queue
block: fix NULL pointer dereference in account statistics with IDE
md: make sure desc_nr less than MD_SB_DISKS
md: raid1: check rdev before reference in raid1_sync_request func
raid5: need to set STRIPE_HANDLE for batch head
block: fix "check bi_size overflow before merge"
nvme/pci: Fix read queue count
nvme/pci Limit write queue sizes to possible cpus
nvme/pci: Fix write and poll queue types
nvme/pci: Remove last_cq_head
nvme: Namepace identification descriptor list is optional
nvme-fc: fix double-free scenarios on hw queues
nvme: else following return is not needed
nvme: add error message on mismatching controller ids
nvme_fc: add module to ops template to allow module references
nvmet-loop: Avoid preallocating big SGL for data
nvme-fc: Avoid preallocating big SGL for data
nvme-rdma: Avoid preallocating big SGL for data

Linus Torvalds
2019-12-14 06:27:19 +0800

13 Dec, 2019

1 commit

5addeae1b blk-cgroup: remove blkcg_drain_queue ... Browse Code »

Since blk_drain_queue had already been removed, so this function
is not needed anymore.

Signed-off-by: Guoqing Jiang
Signed-off-by: Jens Axboe

Guoqing Jiang
2019-12-13 00:26:55 +0800

12 Dec, 2019

1 commit

ecb6186cf block: fix NULL pointer dereference in account statistics with IDE ... Browse Code »

The IDE driver creates some passthru requests which never get
submitted to the block layer in such a way that blk_account_io_start()
gets called. However, the driver still calls __blk_mq_end_request() in
ide_end_rq() which will call blk_account_io_completion() which tries
to dereferences req->part which is never set. See ide_prep_sense() for
an example of where these requests come from.

To fix this, blk_account_io_completion() and blk_account_io_done()
should do nothing if req->part is not set.

The back trace of this bug is:

BUG: kernel NULL pointer dereference, address: 000002ac
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
*pde = 00000000
Oops: 0002 [#1]
CPU: 0 PID: 237 Comm: kworker/0:1H Not tainted
5.4.0-rc2-00011-g48d9b0d43105e #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
04/01/2014
Workqueue: kblockd drive_rq_insert_work
EIP: blk_account_io_completion+0x7a/0xf0
Code: 89 54 24 08 31 d2 89 4c 24 04 31 c9 c7 04 24 02 00 00 00 c1 ee
09 e8 f5 21 a6 ff e8 70 5c a7 ff 8b 53 60 8d 04 bd 00 00 00 00 b4
02 ac 02 00 00 8b 9a 88 02 00 00 85 db 74 11 85 d2 74 51 8b
EAX: 00000000 EBX: f5b80000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: f3031e70 ESP: f3031e54
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010046
CR0: 80050033 CR2: 000002ac CR3: 03c25000 CR4: 000406d0
Call Trace:

blk_update_request+0x85/0x420
ide_end_rq+0x38/0xa0
ide_complete_rq+0x3d/0x70
cdrom_newpc_intr+0x258/0xba0
ide_intr+0x135/0x250
__handle_irq_event_percpu+0x3e/0x250
handle_irq_event_percpu+0x1f/0x50
handle_irq_event+0x32/0x60
handle_level_irq+0x6c/0x110
handle_irq+0x72/0xa0

do_IRQ+0x45/0xad
common_interrupt+0x115/0x11c

Fixes: 48d9b0d43105 ("block: account statistics for passthrough requests")
Reported-by: kernel test robot
Signed-off-by: Logan Gunthorpe
Signed-off-by: Jens Axboe

Logan Gunthorpe
2019-12-12 23:12:50 +0800

10 Dec, 2019

2 commits

cc90bc684 block: fix "check bi_size overflow before merge" ... Browse Code »

This partially reverts commit e3a5d8e386c3fb973fa75f2403622a8f3640ec06.

Commit e3a5d8e386c3 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached. Instead, what we want here is only
the bi_size overflow check.

Fixes: e3a5d8e386c3 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Ming Lei
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Jens Axboe

Andreas Gruenbacher
2019-12-10 13:04:35 +0800
c593642c8 treewide: Use sizeof_field() macro ... Browse Code »

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Acked-by: David Miller # for net

Pankaj Bharadiya
2019-12-10 02:36:44 +0800

06 Dec, 2019

1 commit

ece841abb block: fix memleak of bio integrity data ... Browse Code »

7c20f11680a4 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.

So memory leak of bio integrity data is caused by commit 7c20f11680a4.

Fixes this issue by re-adding bio_integrity_free() to bio_uninit().

Fixes: 7c20f11680a4 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: Christoph Hellwig
Signed-off-by Justin Tee

Add commit log, and simplify/fix the original patch wroten by Justin.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Justin Tee
2019-12-06 02:38:36 +0800

05 Dec, 2019

1 commit

08802ed66 bfq-iosched: Ensure bio->bi_blkg is valid before using it ... Browse Code »

bio->bi_blkg will be NULL when the issue of the request
has bypassed the block layer as shown in the following oops:

Internal error: Oops: 96000005 [#1] SMP
CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4
Call trace:
percpu_counter_add_batch+0x38/0x4c8
bfqg_stats_update_legacy_io+0x9c/0x280
bfq_insert_requests+0xbac/0x2190
blk_mq_sched_insert_request+0x288/0x670
blk_execute_rq_nowait+0x140/0x178
blk_execute_rq+0x8c/0x140
sg_io+0x604/0x9c0
scsi_cmd_ioctl+0xe38/0x10a8
scsi_cmd_blk_ioctl+0xac/0xe8
sd_ioctl+0xe4/0x238
blkdev_ioctl+0x590/0x20e0
block_ioctl+0x60/0x98
do_vfs_ioctl+0xe0/0x1b58
ksys_ioctl+0x80/0xd8
__arm64_sys_ioctl+0x40/0x78
el0_svc_handler+0xc4/0x270

so ensure its validity before using it.

Fixes: fd41e60331b1 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
Signed-off-by: Hou Tao
Signed-off-by: Jens Axboe

Hou Tao
2019-12-05 22:10:09 +0800

04 Dec, 2019

1 commit

6c6b35491 block: set the zone size in blk_revalidate_disk_zones atomically ... Browse Code »

The current zone revalidation code has a major problem in that it
doesn't update the zone size and q->nr_zones atomically, leading
to a short window where an out of bounds access to the zone arrays
is possible.

To fix this move the setting of the zone size into the crticial
sections blk_revalidate_disk_zones so that it gets updated together
with the zone bitmaps and q->nr_zones. This also slightly simplifies
the caller as it deducts the zone size from the report_zones.

This change also allows to check for a power of two zone size in generic
code.

Reported-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-12-04 01:18:22 +0800

03 Dec, 2019

5 commits

ae58954d8 block: don't handle bio based drivers in blk_revalidate_disk_zones ... Browse Code »

bio based drivers only need to update q->nr_zones. Do that manually
instead of overloading blk_revalidate_disk_zones to keep that function
simpler for the next round of changes that will rely even more on the
request based functionality.

Reviewed-by: Javier González
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-12-03 23:51:25 +0800
e94f58194 block: allocate the zone bitmaps lazily ... Browse Code »

Allocate the conventional zone bitmap and the sequential zone locking
bitmap only when we find a zone of the respective type. This avoids
wasting memory on the conventional zone bitmap for devices that only
have sequential zones, and will also prepare for other future changes.

Reviewed-by: Javier González
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-12-03 23:51:25 +0800
f216fdd77 block: replace seq_zones_bitmap with conv_zones_bitmap ... Browse Code »

Invert the meaning of seq_zones_bitmap by keeping a bitmap of
conventional zones. This allows not having a bitmap for devices
that do not have conventional zones.

Reviewed-by: Javier González
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-12-03 23:51:25 +0800
9b38bb4b1 block: simplify blkdev_nr_zones ... Browse Code »

Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity. This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-12-03 23:51:24 +0800
bb5562828 block: remove the empty line at the end of blk-zoned.c ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-12-03 23:51:24 +0800

02 Dec, 2019

1 commit

0da522107 Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground ... Browse Code »

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.

In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.

After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.

This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...

Linus Torvalds
2019-12-02 05:46:15 +0800

26 Nov, 2019

3 commits

7e5192b93 Merge tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block ... Browse Code »

Pull disk revalidation updates from Jens Axboe:
"This continues the work that Jan Kara started to thoroughly cleanup
and consolidate how we handle rescans and revalidations"

* tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block:
block: move clearing bd_invalidated into check_disk_size_change
block: remove (__)blkdev_reread_part as an exported API
block: fix bdev_disk_changed for non-partitioned devices
block: move rescan_partitions to fs/block_dev.c
block: merge invalidate_partitions into rescan_partitions
block: refactor rescan_partitions

Linus Torvalds
2019-11-26 03:37:01 +0800
464a47f45 Merge tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block ... Browse Code »

Pull zoned block device update from Jens Axboe:
"Enhancements and improvements to the zoned device support"

* tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block:
scsi: sd_zbc: Remove set but not used variable 'buflen'
block: rework zone reporting
scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
null_blk: Add zone_nr_conv to features
null_blk: clean up report zones
null_blk: clean up the block device operations
block: Remove partition support for zoned block devices
block: Simplify report zones execution
block: cleanup the !zoned case in blk_revalidate_disk_zones
block: Enhance blk_revalidate_disk_zones()

Linus Torvalds
2019-11-26 03:22:37 +0800
ff6814b07 Merge tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"Due to more granular branches, this one is small and will be followed
with other core branches that add specific features. I meant to just
have a core and drivers branch, but external dependencies we ended up
adding a few more that are also core.

The changes are:

- Fixes and improvements for the zoned device support (Ajay, Damien)

- sed-opal table writing and datastore UID (Revanth)

- blk-cgroup (and bfq) blk-cgroup stat fixes (Tejun)

- Improvements to the block stats tracking (Pavel)

- Fix for overruning sysfs buffer for large number of CPUs (Ming)

- Optimization for small IO (Ming, Christoph)

- Fix typo in RWH lifetime hint (Eugene)

- Dead code removal and documentation (Bart)

- Reduction in memory usage for queue and tag set (Bart)

- Kerneldoc header documentation (André)

- Device/partition revalidation fixes (Jan)

- Stats tracking for flush requests (Konstantin)

- Various other little fixes here and there (et al)"

* tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block: (48 commits)
Revert "block: split bio if the only bvec's length is > SZ_4K"
block: add iostat counters for flush requests
block,bfq: Skip tracing hooks if possible
block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64'
blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1
block: Don't disable interrupts in trigger_softirq()
sbitmap: Delete sbitmap_any_bit_clear()
blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue()
block: split bio if the only bvec's length is > SZ_4K
block: still try to split bio if the bvec crosses pages
blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT
blk-cgroup: reimplement basic IO stats using cgroup rstat
blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive()
blk-throtl: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: relocate bfqg_*rwstat*() helpers
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
...

Linus Torvalds
2019-11-26 02:59:41 +0800

22 Nov, 2019

2 commits

1e279153d Revert "block: split bio if the only bvec's length is > SZ_4K" ... Browse Code »

We really don't need this, as the slow path will do the right thing
anyway.

This reverts commit 6952a7f8446ee85ea9d10ab87b64797a031eaae3.

Signed-off-by: Jens Axboe

Jens Axboe
2019-11-22 01:16:12 +0800
b68663186 block: add iostat counters for flush requests ... Browse Code »

Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.

Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.

This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2019-11-22 00:06:47 +0800

21 Nov, 2019

1 commit

40d47c155 block,bfq: Skip tracing hooks if possible ... Browse Code »

In most cases blk_tracing is not active, but bfq_log_bfqq macro
generate pid_str unconditionally, which result in significant overhead.

## Test
modprobe null_blk
echo bfq > /sys/block/nullb0/queue/scheduler
fio --name=t --ioengine=libaio --direct=1 --filename=/dev/nullb0 \
--runtime=30 --time_based=1 --rw=write --iodepth=128 --bs=4k

# Results
| | baseline | w/ patch | gain |
| iops | 113.19K | 126.42K | +11% |

Acked-by: Paolo Valente
Signed-off-by: Dmitry Monakhov
Signed-off-by: Jens Axboe

Dmitry Monakhov
2019-11-21 07:10:29 +0800

19 Nov, 2019

1 commit

c6da429ea block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64' ... Browse Code »

In function 'activate_lsp', rather than hard-coding the short atom
header(0x83), we need to let the function 'add_short_atom_header' append
the header based on the parameter being appended.

The parameter has been defined in Section 3.1.2.1 of
https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_Single_User_Mode_v1-00_r1-00-Final.pdf

Reviewed-by: Jon Derrick
Signed-off-by: Revanth Rajashekar
Signed-off-by: Jens Axboe

Revanth Rajashekar
2019-11-19 00:49:15 +0800

18 Nov, 2019

1 commit

de678bc63 block: Don't disable interrupts in trigger_softirq() ... Browse Code »

trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.

Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.

Reviewed-by: Ming Lei
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Jens Axboe

Sebastian Andrzej Siewior
2019-11-18 22:29:22 +0800

15 Nov, 2019

1 commit

8b37bc277 iocost: check active_list of all the ancestors in iocg_activate() ... Browse Code »

There is a bug that checking the same active_list over and over again
in iocg_activate(). The intention of the code was checking whether all
the ancestors and self have already been activated. So fix it.

Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo
Signed-off-by: Jiufei Xue
Signed-off-by: Jens Axboe

Jiufei Xue
2019-11-15 04:56:54 +0800

14 Nov, 2019

6 commits

f0b870df8 block: remove (__)blkdev_reread_part as an exported API ... Browse Code »

In general drivers should never mess with partition tables directly.
Unfortunately s390 and loop do for somewhat historic reasons, but they
can use bdev_disk_changed directly instead when we export it as they
satisfy the sanity checks we have in __blkdev_reread_part.

Signed-off-by: Christoph Hellwig
Reviewed-by: Stefan Haberland [dasd]
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-11-14 22:43:59 +0800
142fe8f4b block: fix bdev_disk_changed for non-partitioned devices ... Browse Code »

We still have to set the capacity to 0 if invalidating or call
revalidate_disk if not even if the disk has no partitions. Fix
that by merging rescan_partitions into bdev_disk_changed and just
stubbing out blk_add_partitions and blk_drop_partitions for
non-partitioned devices.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-11-14 22:43:53 +0800
a1548b674 block: move rescan_partitions to fs/block_dev.c ... Browse Code »

Large parts of rescan_partitions aren't about partitions, and
moving it to block_dev.c will allow for some further cleanups by
merging it into its only caller.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-11-14 22:43:21 +0800
6917d0689 block: merge invalidate_partitions into rescan_partitions ... Browse Code »

A lot of the logic in invalidate_partitions and rescan_partitions is
shared. Merge the two functions to simplify things. There is a small
behavior change in that we now send the kevent change notice also if we
were not invalidating but no partitions were found, which seems like
the right thing to do.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-11-14 22:42:41 +0800
f902b0260 block: refactor rescan_partitions ... Browse Code »

Split out a helper that adds one single partition, and another one
calling that dealing with the parsed_partitions state. This makes
it much more obvious how we clean up all state and start again when
using the rescan label.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-11-14 22:40:55 +0800
478de3380 block, bfq: deschedule empty bfq_queues not referred by any process ... Browse Code »

Since commit 3726112ec731 ("block, bfq: re-schedule empty queues if
they deserve I/O plugging"), to prevent the service guarantees of a
bfq_queue from being violated, the bfq_queue may be left busy, i.e.,
scheduled for service, even if empty (see comments in
__bfq_bfqq_expire() for details). But, if no process will send
requests to the bfq_queue any longer, then there is no point in
keeping the bfq_queue scheduled for service.

In addition, keeping the bfq_queue scheduled for service, but with no
process reference any longer, may cause the bfq_queue to be freed when
descheduled from service. But this is assumed to never happen, and
causes a UAF if it happens. This, in turn, caused crashes [1, 2].

This commit fixes this issue by descheduling an empty bfq_queue when
it remains with not process reference.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1767539
[2] https://bugzilla.kernel.org/show_bug.cgi?id=205447

Fixes: 3726112ec731 ("block, bfq: re-schedule empty queues if they deserve I/O plugging")
Reported-by: Chris Evich
Reported-by: Patrick Dung
Reported-by: Thorsten Schubert
Tested-by: Thorsten Schubert
Tested-by: Oleksandr Natalenko
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2019-11-14 22:00:54 +0800