Eric Lee / smarc-fsl-linux-kernel

08 Nov, 2018

1 commit

310ca162d block/loop: Use global lock for ioctl() operation. ... Browse Code »

syzbot is reporting NULL pointer dereference [1] which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo->lo_ctl_mutex locks.

Since ioctl() request on loop devices is not frequent operation, we don't
need fine grained locking. Let's use global lock in order to allow safe
traversal at loop_validate_file().

Note that syzbot is also reporting circular locking dependency between
bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling
blkdev_reread_part() with lock held. This patch does not address it.

[1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3
[2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

Signed-off-by: Tetsuo Handa
Reported-by: syzbot
Reviewed-by: Jan Kara
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Tetsuo Handa
2018-11-08 21:30:11 +0800

08 May, 2018

1 commit

d3349b6b3 loop: remember whether sysfs_create_group() was done ... Browse Code »

syzbot is hitting WARN() triggered by memory allocation fault
injection [1] because loop module is calling sysfs_remove_group()
when sysfs_create_group() failed.
Fix this by remembering whether sysfs_create_group() succeeded.

[1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90b

Signed-off-by: Tetsuo Handa
Reported-by: syzbot
Reviewed-by: Greg Kroah-Hartman

Renamed sysfs_ready -> sysfs_inited.

Signed-off-by: Jens Axboe

Tetsuo Handa
2018-05-08 05:26:36 +0800

15 Apr, 2018

1 commit

1894e9165 loop: remove cmd->rq member ... Browse Code »

We can always get at the request from the payload, no need to store
a pointer to it.

Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Jens Axboe
2018-04-15 12:34:27 +0800

26 Sep, 2017

1 commit

d4478e92d block/loop: make loop cgroup aware ... Browse Code »

loop block device handles IO in a separate thread. The actual IO
dispatched isn't cloned from the IO loop device received, so the
dispatched IO loses the cgroup context.

I'm ignoring buffer IO case now, which is quite complicated. Making the
loop thread aware cgroup context doesn't really help. The loop device
only writes to a single file. In current writeback cgroup
implementation, the file can only belong to one cgroup.

For direct IO case, we could workaround the issue in theory. For
example, say we assign cgroup1 5M/s BW for loop device and cgroup2
10M/s. We can create a special cgroup for loop thread and assign at
least 15M/s for the underlayer disk. In this way, we correctly throttle
the two cgroups. But this is tricky to setup.

This patch tries to address the issue. We record bio's css in loop
command. When loop thread is handling the command, we then use the API
provided in patch 1 to set the css for current task. The bio layer will
use the css for new IO (from patch 3).

Acked-by: Tejun Heo
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-26 21:41:22 +0800

25 Sep, 2017

1 commit

e5313c141 loop: remove union of use_aio and ref in struct loop_cmd ... Browse Code »

When the request is completed, lo_complete_rq() checks cmd->use_aio.
However, if this is in fact an aio request, cmd->use_aio will have
already been reused as cmd->ref by lo_rw_aio*. Fix it by not using a
union. On x86_64, there's a hole after the union anyways, so this
doesn't make struct loop_cmd any bigger.

Fixes: 92d773324b7e ("block/loop: fix use after free")
Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-09-25 22:56:05 +0800

02 Sep, 2017

2 commits

bc75705d0 block/loop: remove unused field ... Browse Code »

nobody uses the list.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-02 03:57:35 +0800
92d773324 block/loop: fix use after free ... Browse Code »

lo_rw_aio->call_read_iter->
1 aops->direct_IO
2 iov_iter_revert
lo_rw_aio_complete could happen between 1 and 2, the bio and bvec could
be freed before 2, which accesses bvec.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-02 03:57:33 +0800

01 Sep, 2017

2 commits

40326d8a3 block/loop: allow request merge for directio mode ... Browse Code »

Currently loop disables merge. While it makes sense for buffer IO mode,
directio mode can benefit from request merge. Without merge, loop could
send small size IO to underlayer disk and harm performance.

Reviewed-by: Omar Sandoval
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2017-09-01 22:44:34 +0800
8a0740c41 loop: get rid of lo_blocksize ... Browse Code »

This is only used for setting the soft block size on the struct
block_device once and then never used again.

Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-09-01 03:51:10 +0800

24 Aug, 2017

1 commit

1e6ec9ea8 Revert "loop: support 4k physical blocksize" ... Browse Code »

There's some stuff still up in the air, let's not get stuck with a
subpar ABI. I'll follow up with something better for 4.14.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-08-24 05:57:55 +0800

08 Jun, 2017

1 commit

f2c6df7db loop: support 4k physical blocksize ... Browse Code »

When generating bootable VM images certain systems (most notably
s390x) require devices with 4k blocksize. This patch implements
a new flag 'LO_FLAGS_BLOCKSIZE' which will set the physical
blocksize to that of the underlying device, and allow to change
the logical blocksize for up to the physical blocksize.

Signed-off-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Hannes Reinecke
2017-06-08 22:40:00 +0800

21 Apr, 2017

1 commit

fe2cb2905 loop: zero-fill bio on the submitting cpu ... Browse Code »

In thruth I've just audited which blk-mq drivers don't currently have a
complete callback, but I think this change is at least borderline useful.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-04-21 02:16:10 +0800

24 Sep, 2015

3 commits

bc07c10a3 block: loop: support DIO & AIO ... Browse Code »

There are at least 3 advantages to use direct I/O and AIO on
read/write loop's backing file:

1) double cache can be avoided, then memory usage gets
decreased a lot

2) not like user space direct I/O, there isn't cost of
pinning pages

3) avoid context switch for obtaining good throughput
- in buffered file read, random I/O top throughput is often obtained
only if they are submitted concurrently from lots of tasks; but for
sequential I/O, most of times they can be hit from page cache, so
concurrent submissions often introduce unnecessary context switch
and can't improve throughput much. There was such discussion[1]
to use non-blocking I/O to improve the problem for application.
- with direct I/O and AIO, concurrent submissions can be
avoided and random read throughput can't be affected meantime

xfstests(-g auto, ext4) is basically passed when running with
direct I/O(aio), one exception is generic/232, but it failed in
loop buffered I/O(4.2-rc6-next-20150814) too.

Follows the fio test result for performance purpose:
4 jobs fio test inside ext4 file system over loop block

1) How to run
- KVM: 4 VCPUs, 2G RAM
- linux kernel: 4.2-rc6-next-20150814(base) with the patchset
- the loop block is over one image on SSD.
- linux psync, 4 jobs, size 1500M, ext4 over loop block
- test result: IOPS from fio output

2) Throughput(IOPS) becomes a bit better with direct I/O(aio)
-------------------------------------------------------------
test cases |randread |read |randwrite |write |
-------------------------------------------------------------
base |8015 |113811 |67442 |106978
-------------------------------------------------------------
base+loop aio |8136 |125040 |67811 |111376
-------------------------------------------------------------

- somehow, it should be caused by more page cache avaiable for
application or one extra page copy is avoided in case of direct I/O

3) context switch
- context switch decreased by ~50% with loop direct I/O(aio)
compared with loop buffered I/O(4.2-rc6-next-20150814)

4) memory usage from /proc/meminfo
-------------------------------------------------------------
| Buffers | Cached
-------------------------------------------------------------
base | > 760MB | ~950MB
-------------------------------------------------------------
base+loop direct I/O(aio) | < 5MB | ~1.6GB
-------------------------------------------------------------

- so there are much more page caches available for application with
direct I/O

[1] https://lwn.net/Articles/612483/

Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lei
2015-09-24 01:01:16 +0800
2e5ab5f37 block: loop: prepare for supporing direct IO ... Browse Code »

This patches provides one interface for enabling direct IO
from user space:

- userspace(such as losetup) can pass 'file' which is
opened/fcntl as O_DIRECT

Also __loop_update_dio() is introduced to check if direct I/O
can be used on current loop setting.

The last big change is to introduce LO_FLAGS_DIRECT_IO flag
for userspace to know if direct IO is used to access backing
file.

Cc: linux-api@vger.kernel.org
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lei
2015-09-24 01:01:16 +0800
e03a3d7a9 block: loop: use kthread_work ... Browse Code »

The following patch will use dio/aio to submit IO to backing file,
then it needn't to schedule IO concurrently from work, so
use kthread_work for decreasing context switch cost a lot.

For non-AIO case, single thread has been used for long long time,
and it was just converted to work in v4.0, which has caused performance
regression for fedora live booting already. In discussion[1], even
though submitting I/O via work concurrently can improve random read IO
throughput, meantime it might hurt sequential read IO performance, so
better to restore to single thread behaviour.

For the following AIO support, it is better to use multi hw-queue
with per-hwq kthread than current work approach suppose there is so
high performance requirement for loop.

[1] http://marc.info/?t=143082678400002&r=1&w=2

Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lei
2015-09-24 01:01:16 +0800

20 May, 2015

1 commit

f89336679 block: loop: don't hold lo_ctl_mutex in lo_open ... Browse Code »

The lo_ctl_mutex is held for running all ioctl handlers, and
in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
rereading partitions, which requires bd_mutex.

So it is easy to cause failure because trylock(bd_mutex) may
fail inside blkdev_reread_part(), and follows the lock context:

blkid or other application:
->open()
->mutex_lock(bd_mutex)
->lo_open()
->mutex_lock(lo_ctl_mutex)

losetup(set fd ioctl):
->mutex_lock(lo_ctl_mutex)
->ioctl_by_bdev(BLKRRPART)
->trylock(bd_mutex)

This patch trys to eliminate the ABBA lock dependency by removing
lo_ctl_mutext in lo_open() with the following approach:

1) make lo_refcnt as atomic_t and avoid acquiring lo_ctl_mutex in lo_open():
- for open vs. add/del loop, no any problem because of loop_index_mutex
- freeze request queue during clr_fd, so I/O can't come until
clearing fd is completed, like the effect of holding lo_ctl_mutex
in lo_open
- both open() and release() have been serialized by bd_mutex already

2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
lo_release(), then lo_ctl_mutex is only required for the last release.

Reviewed-by: Christoph Hellwig
Tested-by: Jarod Wilson
Acked-by: Jarod Wilson
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2015-05-20 23:06:09 +0800

06 May, 2015

1 commit

f4aa4c7bb block: loop: convert to per-device workqueue ... Browse Code »

Documentation/workqueue.txt:
If there is dependency among multiple work items used
during memory reclaim, they should be queued to separate
wq each with WQ_MEM_RECLAIM.

Loop devices can be stacked, so we have to convert to per-device
workqueue. One example is Fedora live CD.

Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778
Cc: stable@vger.kernel.org (v4.0)
Cc: Justin M. Forbes
Signed-off-by: Ming Lei
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Ming Lei
2015-05-06 03:46:53 +0800

03 Jan, 2015

2 commits

78e367a36 loop: add blk-mq.h include ... Browse Code »

Looks like we pull it in through other ways on x86, but we fail
on sparc:

In file included from drivers/block/cryptoloop.c:30:0:
drivers/block/loop.h:63:24: error: field 'tag_set' has incomplete type
struct blk_mq_tag_set tag_set;

Add the include to loop.h, kill it from loop.c.

Signed-off-by: Jens Axboe

Jens Axboe
2015-01-03 06:20:25 +0800
b5dd2f604 block: loop: improve performance via blk-mq ... Browse Code »

The conversion is a bit straightforward, and use work queue to
dispatch requests of loop block, and one big change is that requests
is submitted to backend file/device concurrently with work queue,
so throughput may get improved much. Given write requests over same
file are often run exclusively, so don't handle them concurrently for
avoiding extra context switch cost, possible lock contention and work
schedule cost. Also with blk-mq, there is opportunity to get loop I/O
merged before submitting to backend file/device.

In the following test:
- base: v3.19-rc2-2041231
- loop over file in ext4 file system on SSD disk
- bs: 4k, libaio, io depth: 64, O_DIRECT, num of jobs: 1
- throughput: IOPS

------------------------------------------------------
| | base | base with loop-mq | delta |
------------------------------------------------------
| randread | 1740 | 25318 | +1355%|
------------------------------------------------------
| read | 42196 | 51771 | +22.6%|
-----------------------------------------------------
| randwrite | 35709 | 34624 | -3% |
-----------------------------------------------------
| write | 39137 | 40326 | +3% |
-----------------------------------------------------

So loop-mq can improve throughput for both read and randread, meantime,
performance of write and randwrite isn't hurted basically.

Another benefit is that loop driver code gets simplified
much after blk-mq conversion, and the patch can be thought as
cleanup too.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2015-01-03 06:07:49 +0800

29 Jun, 2013

1 commit

83a876114 move linux/loop.h to drivers/block ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:46:45 +0800