Eric Lee / smarc-fsl-linux-kernel

29 Feb, 2020

1 commit

1eb78bc92 floppy: check FDC index for errors before assigning it ... Browse Code »

commit 2e90ca68b0d2f5548804f22f0dd61145516171e3 upstream.

Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
wait_til_ready().

Which on the face of it can't happen, since as Willy Tarreau points out,
the function does no particular memory access. Except through the FDCS
macro, which just indexes a static allocation through teh current fdc,
which is always checked against N_FDC.

Except the checking happens after we've already assigned the value.

The floppy driver is a disgrace (a lot of it going back to my original
horrd "design"), and has no real maintainer. Nobody has the hardware,
and nobody really cares. But it still gets used in virtual environment
because it's one of those things that everybody supports.

The whole thing should be re-written, or at least parts of it should be
seriously cleaned up. The 'current fdc' index, which is used by the
FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
prime example of how not to write code.

But because nobody has the hardware or the motivation, let's just fix up
the immediate problem with a nasty band-aid: test the fdc index before
actually assigning it to the static 'fdc' variable.

Reported-by: Jordy Zomer
Cc: Willy Tarreau
Cc: Dan Carpenter
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2020-02-29 00:22:14 +0800

24 Feb, 2020

4 commits

17bddc85f brd: check and limit max_part par ... Browse Code »

[ Upstream commit c8ab422553c81a0eb070329c63725df1cd1425bc ]

In brd_init func, rd_nr num of brd_device are firstly allocated
and add in brd_devices, then brd_devices are traversed to add each
brd_device by calling add_disk func. When allocating brd_device,
the disk->first_minor is set to i * max_part, if rd_nr * max_part
is larger than MINORMASK, two different brd_device may have the same
devt, then only one of them can be successfully added.
when rmmod brd.ko, it will cause oops when calling brd_exit.

Follow those steps:
# modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
# rmmod brd
then, the oops will appear.

Oops log:
[ 726.613722] Call trace:
[ 726.614175] kernfs_find_ns+0x24/0x130
[ 726.614852] kernfs_find_and_get_ns+0x44/0x68
[ 726.615749] sysfs_remove_group+0x38/0xb0
[ 726.616520] blk_trace_remove_sysfs+0x1c/0x28
[ 726.617320] blk_unregister_queue+0x98/0x100
[ 726.618105] del_gendisk+0x144/0x2b8
[ 726.618759] brd_exit+0x68/0x560 [brd]
[ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0
[ 726.620384] el0_svc_common+0x78/0x130
[ 726.621057] el0_svc_handler+0x38/0x78
[ 726.621738] el0_svc+0x8/0xc
[ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)

Here, we add brd_check_and_reset_par func to check and limit max_part par.

--
V5->V6:
- remove useless code

V4->V5:(suggested by Ming Lei)
- make sure max_part is not larger than DISK_MAX_PARTS

V3->V4:(suggested by Ming Lei)
- remove useless change
- add one limit of max_part

V2->V3: (suggested by Ming Lei)
- clear .minors when running out of consecutive minor space in brd_alloc
- remove limit of rd_nr

V1->V2:
- add more checks in brd_check_par_valid as suggested by Ming Lei.

Signed-off-by: Zhiqiang Liu
Reviewed-by: Bob Liu
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Zhiqiang Liu
2020-02-24 15:37:02 +0800
1687b204a rbd: work around -Wuninitialized warning ... Browse Code »

[ Upstream commit a55e601b2f02df5db7070e9a37bd655c9c576a52 ]

gcc -O3 warns about a dummy variable that is passed
down into rbd_img_fill_nodata without being initialized:

drivers/block/rbd.c: In function 'rbd_img_fill_nodata':
drivers/block/rbd.c:2573:13: error: 'dummy' is used uninitialized in this function [-Werror=uninitialized]
fctx->iter = *fctx->pos;

Since this is a dummy, I assume the warning is harmless, but
it's better to initialize it anyway and avoid the warning.

Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
Signed-off-by: Arnd Bergmann
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Arnd Bergmann
2020-02-24 15:36:59 +0800
b0d5c881d drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store ... Browse Code »

[ Upstream commit 3b82a051c10143639a378dcd12019f2353cc9054 ]

Currently when an error code -EIO or -ENOSPC in the for-loop of
writeback_store the error code is being overwritten by a ret = len
assignment at the end of the function and the error codes are being
lost. Fix this by assigning ret = len at the start of the function and
remove the assignment from the end, hence allowing ret to be preserved
when error codes are assigned to it.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com
Fixes: a939888ec38b ("zram: support idle/huge page writeback")
Signed-off-by: Colin Ian King
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Colin Ian King
2020-02-24 15:36:31 +0800
25cbba5d4 nbd: add a flush_workqueue in nbd_start_device ... Browse Code »

[ Upstream commit 5c0dd228b5fc30a3b732c7ae2657e0161ec7ed80 ]

When kzalloc fail, may cause trying to destroy the
workqueue from inside the workqueue.

If num_connections is m (2 < m), and NO.1 ~ NO.n
(1 < n < m) kzalloc are successful. The NO.(n + 1)
failed. Then, nbd_start_device will return ENOMEM
to nbd_start_device_ioctl, and nbd_start_device_ioctl
will return immediately without running flush_workqueue.
However, we still have n recv threads. If nbd_release
run first, recv threads may have to drop the last
config_refs and try to destroy the workqueue from
inside the workqueue.

To fix it, add a flush_workqueue in nbd_start_device.

Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Sun Ke
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Sun Ke
2020-02-24 15:36:31 +0800

23 Jan, 2020

1 commit

200f8b968 xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk ... Browse Code »

commit 589b72894f53124a39d1bb3c0cecaf9dcabac417 upstream.

Clang warns:

../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
nr_parts = PARTS_PER_DISK;
^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
if (err)
^

This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.

While we are here, the previous line has some trailing whitespace; clean
that up as well.

Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
Link: https://github.com/ClangBuiltLinux/linux/issues/791
Signed-off-by: Nathan Chancellor
Reviewed-by: Juergen Gross
Acked-by: Roger Pau Monné
Signed-off-by: Juergen Gross
Signed-off-by: Greg Kroah-Hartman

Nathan Chancellor
2020-01-23 15:22:54 +0800

09 Jan, 2020

2 commits

50de69fd6 xen/blkback: Avoid unmapping unmapped grant pages ... Browse Code »

[ Upstream commit f9bd84a8a845d82f9b5a081a7ae68c98a11d2e84 ]

For each I/O request, blkback first maps the foreign pages for the
request to its local pages. If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.

However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped. Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure. On a
system that fails the allocation frequently, this problem leads to
following kernel crash.

[ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[ 372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40
[ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
[ 372.012562] Oops: 0002 [#1] SMP
[ 372.012566] Modules linked in: act_police sch_ingress cls_u32
...
[ 372.012746] Call Trace:
[ 372.012752] [] gnttab_unmap_refs+0x34/0x40
[ 372.012759] [] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
...
[ 372.012802] [] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
...
Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Initializing cgroup subsys cpuset

This commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.

Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")

Reviewed-by: David Woodhouse
Reviewed-by: Maximilian Heyne
Reviewed-by: Paul Durrant
Reviewed-by: Roger Pau Monné
Signed-off-by: SeongJae Park
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

SeongJae Park
2020-01-09 17:20:07 +0800
ec177a46e xen-blkback: prevent premature module unload ... Browse Code »

[ Upstream commit fa2ac657f9783f0891b2935490afe9a7fd29d3fa ]

Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
cache. This cache is destoyed when xen-blkif is unloaded so it is
necessary to wait for the deferred free routine used for such objects to
complete. This necessity was missed in commit 14855954f636 "xen-blkback:
allow module to be cleanly unloaded". This patch fixes the problem by
taking/releasing extra module references in xen_blkif_alloc/free()
respectively.

Signed-off-by: Paul Durrant
Reviewed-by: Roger Pau Monné
Signed-off-by: Juergen Gross
Signed-off-by: Sasha Levin

Paul Durrant
2020-01-09 17:19:51 +0800

31 Dec, 2019

2 commits

b3ead320d nbd: fix shutdown and recv work deadlock v2 ... Browse Code »

commit 1c05839aa973cfae8c3db964a21f9c0eef8fcc21 upstream.

This fixes a regression added with:

commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
Author: Mike Christie
Date: Sun Aug 4 14:10:06 2019 -0500

nbd: fix max number of supported devs

where we can deadlock during device shutdown. The problem occurs if
the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has
returned and the userspace app has droppped its reference via closing
the device and running nbd_release. The recv_work nbd_config_put call
would then drop the refcount to zero and try to destroy the config which
would try to do destroy_workqueue from the recv work.

This patch just has nbd_start_device_ioctl do a flush_workqueue when it
wakes so we know after the ioctl returns running works have exited. This
also fixes a possible race where we could try to reuse the device while
old recv_works are still running.

Cc: stable@vger.kernel.org
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Mike Christie
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Mike Christie
2019-12-31 23:46:34 +0800
1fca50561 loop: fix no-unmap write-zeroes request behavior ... Browse Code »

[ Upstream commit efcfec579f6139528c9e6925eca2bc4a36da65c6 ]

Currently, if the loop device receives a WRITE_ZEROES request, it asks
the underlying filesystem to punch out the range. This behavior is
correct if unmapping is allowed. However, a NOUNMAP request means that
the caller doesn't want us to free the storage backing the range, so
punching out the range is incorrect behavior.

To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
the fallocate documentation) required to ensure that the entire range is
backed by real storage, which suffices for our purposes.

Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Darrick J. Wong
2019-12-31 23:44:31 +0800

29 Nov, 2019

1 commit

abf404dfa nbd: prevent memory leak ... Browse Code »

commit 03bf73c315edca28f47451913177e14cd040a216 upstream.

In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the
reallocted memory is leak. The correct behaviour should be assigning the
reallocted memory to config->socks right after success.

Reviewed-by: Josef Bacik
Signed-off-by: Navid Emamdoost
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Navid Emamdoost
2019-11-29 17:09:47 +0800

22 Nov, 2019

1 commit

be5fa3aac Merge tag 'for-linus-20191121' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fix from Jens Axboe:
"Just a single fix for an issue in nbd introduced in this cycle"

* tag 'for-linus-20191121' of git://git.kernel.dk/linux-block:
nbd:fix memory leak in nbd_get_socket()

Linus Torvalds
2019-11-22 04:04:50 +0800

20 Nov, 2019

1 commit

dff10bbea nbd:fix memory leak in nbd_get_socket() ... Browse Code »

Before returning NULL, put the sock first.

Cc: stable@vger.kernel.org
Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup")
Reviewed-by: Josef Bacik
Reviewed-by: Mike Christie
Signed-off-by: Sun Ke
Signed-off-by: Jens Axboe

Sun Ke
2019-11-20 00:23:26 +0800

16 Nov, 2019

1 commit

b226c9e1f Merge tag 'for-linus-20191115' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes that should make it into this release. This contains:

- io_uring:
- The timeout command assumes sequence == 0 means that we want
one completion, but this kind of overloading is unfortunate as
it prevents users from doing a pure time based wait. Since
this operation was introduced in this cycle, let's correct it
now, while we can. (me)
- One-liner to fix an issue with dependent links and fixed
buffer reads. The actual IO completed fine, but the link got
severed since we stored the wrong expected value. (me)
- Add TIMEOUT to list of opcodes that don't need a file. (Pavel)

- rsxx missing workqueue destry calls. Old bug. (Chuhong)

- Fix blk-iocost active list check (Jiufei)

- Fix impossible-to-hit overflow merge condition, that still hit some
folks very rarely (Junichi)

- Fix bfq hang issue from 5.3. This didn't get marked for stable, but
will go into stable post this merge (Paolo)"

* tag 'for-linus-20191115' of git://git.kernel.dk/linux-block:
rsxx: add missed destroy_workqueue calls in remove
iocost: check active_list of all the ancestors in iocg_activate()
block, bfq: deschedule empty bfq_queues not referred by any process
io_uring: ensure registered buffer import returns the IO length
io_uring: Fix getting file for timeout
block: check bi_size overflow before merge
io_uring: make timeout sequence == 0 mean no sequence

Linus Torvalds
2019-11-16 05:02:34 +0800

15 Nov, 2019

2 commits

dcb77e4b2 rsxx: add missed destroy_workqueue calls in remove ... Browse Code »

The driver misses calling destroy_workqueue in remove like what is done
when probe fails.
Add the missed calls to fix it.

Signed-off-by: Chuhong Yuan
Signed-off-by: Jens Axboe

Chuhong Yuan
2019-11-15 04:59:49 +0800
633739b2f rbd: silence bogus uninitialized warning in rbd_object_map_update_finish() ... Browse Code »

Some versions of gcc (so far 6.3 and 7.4) throw a warning:

drivers/block/rbd.c: In function 'rbd_object_map_callback':
drivers/block/rbd.c:2124:21: warning: 'current_state' may be used uninitialized in this function [-Wmaybe-uninitialized]
(current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN))
drivers/block/rbd.c:2092:23: note: 'current_state' was declared here
u8 state, new_state, current_state;
^~~~~~~~~~~~~

It's bogus because all current_state accesses are guarded by
has_current_state.

Reported-by: kbuild test robot
Signed-off-by: Ilya Dryomov
Reviewed-by: Dongsheng Yang

Ilya Dryomov
2019-11-15 02:00:53 +0800

08 Nov, 2019

1 commit

8e9c52301 block: drbd: remove a stray unlock in __drbd_send_protocol() ... Browse Code »

There are two callers of this function and they both unlock the mutex so
this ends up being a double unlock.

Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Signed-off-by: Dan Carpenter
Signed-off-by: Jens Axboe

Dan Carpenter
2019-11-08 21:55:22 +0800

26 Oct, 2019

3 commits

cf1b2326b nbd: verify socket is supported during setup ... Browse Code »

nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.

This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.

Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Tested-by: Richard W.M. Jones
Signed-off-by: Mike Christie
Signed-off-by: Jens Axboe

Mike Christie
2019-10-26 04:37:21 +0800
7ce23e8e0 nbd: handle racing with error'ed out commands ... Browse Code »

We hit the following warning in production

print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
blk_mq_free_request+0xb7/0xf0
blk_mq_complete_request+0x62/0xf0
recv_work+0x29/0xa1 [nbd]
process_one_work+0x1f5/0x3f0
worker_thread+0x2d/0x3d0
? rescuer_thread+0x340/0x340
kthread+0x111/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---

This was preceded by us timing out everything and shutting down the
sockets for the device. The problem is we had a request in the queue at
the same time, so we completed the request twice. This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.

Fix this by checking cmd->status in nbd_read_stat. We only change this
under the cmd->lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.

Reviewed-by: Mike Christie
Signed-off-by: Josef Bacik

Signed-off-by: Jens Axboe

Josef Bacik
2019-10-26 04:20:03 +0800
de6346ecb nbd: protect cmd->status with cmd->lock ... Browse Code »

We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.

For the clear_req cause this is initiated by the user, so again is safe.

Reviewed-by: Mike Christie
Signed-off-by: Josef Bacik
Signed-off-by: Jens Axboe

Josef Bacik
2019-10-26 04:20:01 +0800

19 Oct, 2019

1 commit

f7daefe42 zram: fix race between backing_dev_show and backing_dev_store ... Browse Code »

CPU0: CPU1:
backing_dev_show backing_dev_store
...... ......
file = zram->backing_dev;
down_read(&zram->init_lock); down_read(&zram->init_init_lock)
file_path(file, ...); zram->backing_dev = backing_dev;
up_read(&zram->init_lock); up_read(&zram->init_lock);

gets the value of zram->backing_dev too early in backing_dev_show, which
resultin the value being NULL at the beginning, and not NULL later.

backtrace:
d_path+0xcc/0x174
file_path+0x10/0x18
backing_dev_show+0x40/0xb4
dev_attr_show+0x20/0x54
sysfs_kf_seq_show+0x9c/0x10c
kernfs_seq_show+0x28/0x30
seq_read+0x184/0x488
kernfs_fop_read+0x5c/0x1a4
__vfs_read+0x44/0x128
vfs_read+0xa0/0x138
SyS_read+0x54/0xb4

Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com
Signed-off-by: Chenwandun
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Jens Axboe
Cc: [4.14+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chenwandun
2019-10-19 18:32:32 +0800

15 Oct, 2019

1 commit

25e6be212 rbd: cancel lock_dwork if the wait is interrupted ... Browse Code »

There is a warning message in my test with below steps:

# rbd bench --io-type write --io-size 4K --io-threads 1 --io-pattern rand test &
# sleep 5
# pkill -9 rbd
# rbd map test &
# sleep 5
# pkill rbd

The reason is that the rbd_add_acquire_lock() is interruptable,
that means, when we kill the waiting on ->acquire_wait, the lock_dwork
could be still running.

1. do_rbd_add() 2. lock_dwork
rbd_add_acquire_lock()
- queue_delayed_work()
lock_dwork queued
- wait_for_completion_killable_timeout() lock_dwork)

Then when we reach the rbd_dev_free(), WARN_ON is triggered because
lock_state is not RBD_LOCK_STATE_UNLOCKED.

To fix it, this commit make sure the lock_dwork was finished before
calling rbd_dev_image_unlock().

On the other hand, this would not happend in do_rbd_remove(), because
after rbd mapped, lock_dwork will only be queued for IO request, and
request will continue unless lock_dwork finished. when we call
rbd_dev_image_unlock() in do_rbd_remove(), all requests are done.
That means, lock_state should not be locked again after
rbd_dev_image_unlock().

[ Cancel lock_dwork in rbd_add_acquire_lock(), only if the wait is
interrupted. ]

Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Dongsheng Yang
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Dongsheng Yang
2019-10-15 23:43:15 +0800

11 Oct, 2019

1 commit

297cbcccc Merge tag 'for-linus-20191010' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- Fix wbt performance regression introduced with the blk-rq-qos
refactoring (Harshad)

- Fix io_uring fileset removal inadvertently killing the workqueue (me)

- Fix io_uring typo in linked command nonblock submission (Pavel)

- Remove spurious io_uring wakeups on request free (Pavel)

- Fix null_blk zoned command error return (Keith)

- Don't use freezable workqueues for backing_dev, also means we can
revert a previous libata hack (Mika)

- Fix nbd sysfs mutex dropped too soon at removal time (Xiubo)

* tag 'for-linus-20191010' of git://git.kernel.dk/linux-block:
nbd: fix possible sysfs duplicate warning
null_blk: Fix zoned command return code
io_uring: only flush workqueues on fileset removal
io_uring: remove wait loop spurious wakeups
blk-wbt: fix performance regression in wbt scale_up/scale_down
Revert "libata, freezer: avoid block device removal while system is frozen"
bdi: Do not use freezable workqueue
io_uring: fix reversed nonblock flag for link submission

Linus Torvalds
2019-10-11 23:45:32 +0800

10 Oct, 2019

2 commits

862488105 nbd: fix possible sysfs duplicate warning ... Browse Code »

1. nbd_put takes the mutex and drops nbd->ref to 0. It then does
idr_remove and drops the mutex.

2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails
to find an existing device, so it does nbd_dev_add.

3. just before the nbd_put could call nbd_dev_remove or not finished
totally, but if nbd_dev_add try to add_disk, we can hit:

debugfs: Directory 'nbd1' with parent 'block' already present!

This patch will make sure all the disk add/remove stuff are done
by holding the nbd_index_mutex lock.

Reported-by: Mike Christie
Reviewed-by: Josef Bacik
Signed-off-by: Xiubo Li
Signed-off-by: Jens Axboe

Xiubo Li
2019-10-10 23:44:56 +0800
79a85e214 null_blk: Fix zoned command return code ... Browse Code »

The return code from null_handle_zoned() sets the cmd->error value.
Returning OK status when an error occured overwrites the intended
cmd->error. Return the appropriate error code instead of setting the
error in the cmd.

Fixes: fceb5d1b19cbe626 ("null_blk: create a helper for zoned devices")
Cc: Chaitanya Kulkarni
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2019-10-10 11:00:20 +0800

05 Oct, 2019

1 commit

c4bd70e8c Merge tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- Mandate timespec64 for the io_uring timeout ABI (Arnd)

- Set of NVMe changes via Sagi:
- controller removal race fix from Balbir
- quirk additions from Gabriel and Jian-Hong
- nvme-pci power state save fix from Mario
- Add 64bit user commands (for 64bit registers) from Marta
- nvme-rdma/nvme-tcp fixes from Max, Mark and Me
- Minor cleanups and nits from James, Dan and John

- Two s390 dasd fixes (Jan, Stefan)

- Have loop change block size in DIO mode (Martijn)

- paride pg header ifdef guard (Masahiro)

- Two blk-mq queue scheduler tweaks, fixing an ordering issue on zoned
devices and suboptimal performance on others (Ming)

* tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block: (22 commits)
block: sed-opal: fix sparse warning: convert __be64 data
block: sed-opal: fix sparse warning: obsolete array init.
block: pg: add header include guard
Revert "s390/dasd: Add discard support for ESE volumes"
s390/dasd: Fix error handling during online processing
io_uring: use __kernel_timespec in timeout ABI
loop: change queue block size to match when using DIO
blk-mq: apply normal plugging for HDD
blk-mq: honor IO scheduler for multiqueue devices
nvme-rdma: fix possible use-after-free in connect timeout
nvme: Move ctrl sqsize to generic space
nvme: Add ctrl attributes for queue_count and sqsize
nvme: allow 64-bit results in passthru commands
nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
nvmet-tcp: remove superflous check on request sgl
Added QUIRKs for ADATA XPG SX8200 Pro 512GB
nvme-rdma: Fix max_hw_sectors calculation
nvme: fix an error code in nvme_init_subsystem()
nvme-pci: Save PCI state before putting drive into deepest state
nvme-tcp: fix wrong stop condition in io_work
...

Linus Torvalds
2019-10-05 00:56:51 +0800

01 Oct, 2019

1 commit

85560117d loop: change queue block size to match when using DIO ... Browse Code »

The loop driver assumes that if the passed in fd is opened with
O_DIRECT, the caller wants to use direct I/O on the loop device.
However, if the underlying block device has a different block size than
the loop block queue, direct I/O can't be enabled. Instead of requiring
userspace to manually change the blocksize and re-enable direct I/O,
just change the queue block sizes to match, as well as the io_min size.

Reviewed-by: Christoph Hellwig
Signed-off-by: Martijn Coenen
Signed-off-by: Jens Axboe

Martijn Coenen
2019-10-01 23:36:01 +0800

27 Sep, 2019

1 commit

cbafe18c7 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- almost all of the rest of -mm

- various other subsystems

Subsystems affected by this patch series:
memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork,
cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise,
cleanups, pagemap

* emailed patches from Andrew Morton : (77 commits)
arch/sparc/include/asm/pgtable_64.h: fix build
mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
ntfs: remove (un)?likely() from IS_ERR() conditions
IB/hfi1: remove unlikely() from IS_ERR*() condition
xfs: remove unlikely() from WARN_ON() condition
wimax/i2400m: remove unlikely() from WARN*() condition
fs: remove unlikely() from WARN_ON() condition
xen/events: remove unlikely() from WARN() condition
checkpatch: check for nested (un)?likely() calls
hexagon: drop empty and unused free_initrd_mem
mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
mm: introduce MADV_PAGEOUT
mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
mm: introduce MADV_COLD
mm: untag user pointers in mmap/munmap/mremap/brk
vfio/type1: untag user pointers in vaddr_get_pfn
tee/shm: untag user pointers in tee_shm_register
media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
drm/amdgpu: untag user pointers
...

Linus Torvalds
2019-09-27 01:29:42 +0800

26 Sep, 2019

2 commits

315cc066b augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macro ... Browse Code »

Add RB_DECLARE_CALLBACKS_MAX, which generates augmented rbtree callbacks
for the case where the augmented value is a scalar whose definition
follows a max(f(node)) pattern. This actually covers all present uses of
RB_DECLARE_CALLBACKS, and saves some (source) code duplication in the
various RBCOMPUTE function definitions.

[walken@google.com: fix mm/vmalloc.c]
Link: http://lkml.kernel.org/r/CANN689FXgK13wDYNh1zKxdipeTuALG4eKvKpsdZqKFJ-rvtGiQ@mail.gmail.com
[walken@google.com: re-add check to check_augmented()]
Link: http://lkml.kernel.org/r/20190727022027.GA86863@google.com
Link: http://lkml.kernel.org/r/20190703040156.56953-3-walken@google.com
Signed-off-by: Michel Lespinasse
Acked-by: Peter Zijlstra (Intel)
Cc: David Howells
Cc: Davidlohr Bueso
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2019-09-26 08:51:39 +0800
f41def397 Merge tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"The highlights are:

- automatic recovery of a blacklisted filesystem session (Zheng Yan).
This is disabled by default and can be enabled by mounting with the
new "recover_session=clean" option.

- serialize buffered reads and O_DIRECT writes (Jeff Layton). Care is
taken to avoid serializing O_DIRECT reads and writes with each
other, this is based on the exclusion scheme from NFS.

- handle large osdmaps better in the face of fragmented memory
(myself)

- don't limit what security.* xattrs can be get or set (Jeff Layton).
We were overly restrictive here, unnecessarily preventing things
like file capability sets stored in security.capability from
working.

- allow copy_file_range() within the same inode and across different
filesystems within the same cluster (Luis Henriques)"

* tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client: (41 commits)
ceph: call ceph_mdsc_destroy from destroy_fs_client
libceph: use ceph_kvmalloc() for osdmap arrays
libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc()
ceph: allow object copies across different filesystems in the same cluster
ceph: include ceph_debug.h in cache.c
ceph: move static keyword to the front of declarations
rbd: pull rbd_img_request_create() dout out into the callers
ceph: reconnect connection if session hang in opening state
libceph: drop unused con parameter of calc_target()
ceph: use release_pages() directly
rbd: fix response length parameter for encoded strings
ceph: allow arbitrary security.* xattrs
ceph: only set CEPH_I_SEC_INITED if we got a MAC label
ceph: turn ceph_security_invalidate_secctx into static inline
ceph: add buffered/direct exclusionary locking for reads and writes
libceph: handle OSD op ceph_pagelist_append() errors
ceph: don't return a value from void function
ceph: don't freeze during write page faults
ceph: update the mtime when truncating up
ceph: fix indentation in __get_snap_name()
...

Linus Torvalds
2019-09-26 01:21:13 +0800

23 Sep, 2019

1 commit

eb09b3cc4 pktcdvd: remove warning on attempting to register non-passthrough dev ... Browse Code »

Anatoly reports that he gets the below warning when booting -git on
a sparc64 box on debian unstable:

...
[ 13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES
implementation
[ 13.428002] ------------[ cut here ]------------
[ 13.428081] WARNING: CPU: 21 PID: 586 at
drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
[ 13.428147] Attempt to register a non-SCSI queue
[ 13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64
n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash
sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4
ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear
md_mod crc32c_sparc64
[ 13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted
5.3.0-10169-g574cc4539762 #1234
[ 13.428507] Call Trace:
[ 13.428542] [00000000004635c0] __warn+0xc0/0x100
[ 13.428582] [0000000000463634] warn_slowpath_fmt+0x34/0x60
[ 13.428626] [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
[ 13.428674] [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd]
[ 13.428724] [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0
[ 13.428764] [00000000006b96c8] ksys_ioctl+0x48/0x80
[ 13.428803] [00000000006b9714] sys_ioctl+0x14/0x40
[ 13.428847] [0000000000406294] linux_sparc_syscall+0x34/0x44
[ 13.428890] irq event stamp: 4181
[ 13.428924] hardirqs last enabled at (4189): []
console_unlock+0x634/0x6c0
[ 13.428984] hardirqs last disabled at (4196): []
console_unlock+0x100/0x6c0
[ 13.429048] softirqs last enabled at (3978): []
__do_softirq+0x498/0x520
[ 13.429110] softirqs last disabled at (3967): []
do_softirq_own_stack+0x34/0x60
[ 13.429172] ---[ end trace 2220ca468f32967d ]---
[ 13.430018] pktcdvd: setup of pktcdvd device failed
[ 13.455589] des_sparc64: Using sparc64 des opcodes optimized DES
implementation
[ 13.515334] camellia_sparc64: Using sparc64 camellia opcodes
optimized CAMELLIA implementation
[ 13.522856] pktcdvd: setup of pktcdvd device failed
[ 13.529327] pktcdvd: setup of pktcdvd device failed
[ 13.532932] pktcdvd: setup of pktcdvd device failed
[ 13.536165] pktcdvd: setup of pktcdvd device failed
[ 13.539372] pktcdvd: setup of pktcdvd device failed
[ 13.542834] pktcdvd: setup of pktcdvd device failed
[ 13.546536] pktcdvd: setup of pktcdvd device failed
[ 15.431071] XFS (dm-0): Mounting V5 Filesystem
...

Apparently debian auto-attaches any cdrom like device to pktcdvd, which
can lead to the above warning. There's really no reason to warn for this
situation, kill it.

Reported-by: Anatoly Pugachev
Signed-off-by: Jens Axboe

Jens Axboe
2019-09-23 00:01:05 +0800

18 Sep, 2019

3 commits

8454d6856 nbd: fix possible page fault for nbd disk ... Browse Code »

When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same
time when the socket is closed due to the server daemon is restarted,
just before the last DISCONNET is totally done if we start a new connection
by using the old nbd_index, there will be crashing randomly, like:

[ 110.151949] block nbd1: Receive control failed (result -32)
[ 110.152024] BUG: unable to handle page fault for address: 0000058000000840
[ 110.152063] #PF: supervisor read access in kernel mode
[ 110.152083] #PF: error_code(0x0000) - not-present page
[ 110.152094] PGD 0 P4D 0
[ 110.152106] Oops: 0000 [#1] SMP PTI
[ 110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2
[ 110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 110.152166] Workqueue: knbd-recv recv_work [nbd]
[ 110.152187] RIP: 0010:__dev_printk+0xd/0x67
[ 110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...]
[ 110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206
[ 110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000
[ 110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f
[ 110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56
[ 110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900
[ 110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8
[ 110.152329] FS: 0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000
[ 110.152362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0
[ 110.152401] Call Trace:
[ 110.152422] _dev_err+0x6c/0x83
[ 110.152435] nbd_read_stat.cold+0xda/0x578 [nbd]
[ 110.152448] ? __switch_to_asm+0x34/0x70
[ 110.152468] ? __switch_to_asm+0x40/0x70
[ 110.152478] ? __switch_to_asm+0x34/0x70
[ 110.152491] ? __switch_to_asm+0x40/0x70
[ 110.152501] ? __switch_to_asm+0x34/0x70
[ 110.152511] ? __switch_to_asm+0x40/0x70
[ 110.152522] ? __switch_to_asm+0x34/0x70
[ 110.152533] recv_work+0x35/0x9e [nbd]
[ 110.152547] process_one_work+0x19d/0x340
[ 110.152558] worker_thread+0x50/0x3b0
[ 110.152568] kthread+0xfb/0x130
[ 110.152577] ? process_one_work+0x340/0x340
[ 110.152609] ? kthread_park+0x80/0x80
[ 110.152637] ret_from_fork+0x35/0x40

This is very easy to reproduce by running the nbd-runner.

Reviewed-by: Josef Bacik
Signed-off-by: Xiubo Li
Signed-off-by: Jens Axboe

Xiubo Li
2019-09-18 10:03:49 +0800
ec76a7b92 nbd: rename the runtime flags as NBD_RT_ prefixed ... Browse Code »

Preparing for the destory when disconnecting crash fixing.

Reviewed-by: Josef Bacik
Signed-off-by: Xiubo Li
Signed-off-by: Jens Axboe

Xiubo Li
2019-09-18 10:03:49 +0800
7ad67ca55 Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Two NVMe pull requests:
- ana log parse fix from Anton
- nvme quirks support for Apple devices from Ben
- fix missing bio completion tracing for multipath stack devices
from Hannes and Mikhail
- IP TOS settings for nvme rdma and tcp transports from Israel
- rq_dma_dir cleanups from Israel
- tracing for Get LBA Status command from Minwoo
- Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
- Some consolidation between the fabrics transports for handling
the CAP register
- reset race with ns scanning fix for fabrics (move fabrics
commands to a dedicated request queue with a different lifetime
from the admin request queue)."
- controller reset and namespace scan races fixes
- nvme discovery log change uevent support
- naming improvements from Keith
- multiple discovery controllers reject fix from James
- some regular cleanups from various people

- Series fixing (and re-fixing) null_blk debug printing and nr_devices
checks (André)

- A few pull requests from Song, with fixes from Andy, Guoqing,
Guilherme, Neil, Nigel, and Yufen.

- REQ_OP_ZONE_RESET_ALL support (Chaitanya)

- Bio merge handling unification (Christoph)

- Pick default elevator correctly for devices with special needs
(Damien)

- Block stats fixes (Hou)

- Timeout and support devices nbd fixes (Mike)

- Series fixing races around elevator switching and device add/remove
(Ming)

- sed-opal cleanups (Revanth)

- Per device weight support for BFQ (Fam)

- Support for blk-iocost, a new model that can properly account cost of
IO workloads. (Tejun)

- blk-cgroup writeback fixes (Tejun)

- paride queue init fixes (zhengbin)

- blk_set_runtime_active() cleanup (Stanley)

- Block segment mapping optimizations (Bart)

- lightnvm fixes (Hans/Minwoo/YueHaibing)

- Various little fixes and cleanups

* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
null_blk: format pr_* logs with pr_fmt
null_blk: match the type of parameter nr_devices
null_blk: do not fail the module load with zero devices
block: also check RQF_STATS in blk_mq_need_time_stamp()
block: make rq sector size accessible for block stats
bfq: Fix bfq linkage error
raid5: use bio_end_sector in r5_next_bio
raid5: remove STRIPE_OPS_REQ_PENDING
md: add feature flag MD_FEATURE_RAID0_LAYOUT
md/raid0: avoid RAID0 data corruption due to layout confusion.
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
raid5: don't increment read_errors on EILSEQ return
nvmet: fix a wrong error status returned in error log page
nvme: send discovery log page change events to userspace
nvme: add uevent variables for controller devices
nvme: enable aen regardless of the presence of I/O queues
nvme-fabrics: allow discovery subsystems accept a kato
nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
nvme: Remove redundant assignment of cq vector
nvme: Assign subsys instance from first ctrl
...

Linus Torvalds
2019-09-18 07:57:47 +0800

16 Sep, 2019

5 commits

9c7eddf1b null_blk: format pr_* logs with pr_fmt ... Browse Code »

Instead of writing "null_blk: " at the beginning of each
pr_err/info/warn log message, format messages using pr_fmt() macro.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: André Almeida
Signed-off-by: Jens Axboe

André Almeida
2019-09-16 22:38:29 +0800
701dfc428 null_blk: match the type of parameter nr_devices ... Browse Code »

Since the variable nr_devices is an unsigned int, the module_param()
should also use this type. Change the type so they can match.

Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices")
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: André Almeida
Signed-off-by: Jens Axboe

André Almeida
2019-09-16 22:38:27 +0800
446745350 null_blk: do not fail the module load with zero devices ... Browse Code »

The module load should fail only if there is something wrong with the
configuration or if an error prevents it to work properly. The module
should be able to be loaded with (nr_device == 0), since it will not
trigger errors or be in malfunction state. Preventing loading with zero
devices also breaks applications that configures this module using
configfs API. Remove the nr_device check to fix this.

Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices")
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: André Almeida
Signed-off-by: Jens Axboe

André Almeida
2019-09-16 22:38:26 +0800
21ed05a8b rbd: pull rbd_img_request_create() dout out into the callers ... Browse Code »

Make it more informative: log op_type, offset and length for block
layer requests and initiating obj_req for child requests.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2019-09-16 18:06:25 +0800
5435d2069 rbd: fix response length parameter for encoded strings ... Browse Code »

rbd_dev_image_id() allocates space for length but passes a smaller
value to rbd_obj_method_sync(). rbd_dev_v2_object_prefix() doesn't
allocate space for length. Fix both to be consistent.

Signed-off-by: Dongsheng Yang
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Dongsheng Yang
2019-09-16 18:06:25 +0800

12 Sep, 2019

1 commit

f7c4ce890 null_blk: validate the number of devices ... Browse Code »

A negative number of devices is nonsensical, so change the type to
unsigned. If the number of devices is 0, it is impossible for userspace
to interact with the module, so refuse loading the driver for that case.

Signed-off-by: André Almeida
Signed-off-by: Jens Axboe

André Almeida
2019-09-12 06:04:25 +0800