Eric Lee / smarc-fsl-linux-kernel

06 Mar, 2018

1 commit

0f3e9c97e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

All of the conflicts were cases of overlapping changes.

In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.

Signed-off-by: David S. Miller

David S. Miller
2018-03-06 14:20:46 +0800

05 Mar, 2018

1 commit

af8c08162 Merge tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:

- when NR_CPUS is large, a SRCU structure can significantly inflate
size of the main filesystem structure that would not be possible to
allocate by kmalloc, so the kvalloc fallback is used

- improved error handling

- fix endiannes when printing some filesystem attributes via sysfs,
this is could happen when a filesystem is moved between different
endianity hosts

- send fixes: the NO_HOLE mode should not send a write operation for a
file hole

- fix log replay for for special files followed by file hardlinks

- fix log replay failure after unlink and link combination

- fix max chunk size calculation for DUP allocation

* tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix log replay failure after unlink and link combination
Btrfs: fix log replay failure after linking special file and fsync
Btrfs: send, fix issuing write op when processing hole in no data mode
btrfs: use proper endianness accessors for super_copy
btrfs: alloc_chunk: fix DUP stripe size handling
btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
btrfs: handle failure of add_pending_csums
btrfs: use kvzalloc to allocate btrfs_fs_info

Linus Torvalds
2018-03-05 03:04:27 +0800

03 Mar, 2018

2 commits

2833419a6 Merge tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph fixes from Ilya Dryomov:
"A cap handling fix from Zhi that ensures that metadata writeback isn't
delayed and three error path memory leak fixups from Chengguang"

* tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client:
ceph: fix potential memory leak in init_caches()
ceph: fix dentry leak when failing to init debugfs
libceph, ceph: avoid memory leak when specifying same option several times
ceph: flush dirty caps of unlinked inode ASAP

Linus Torvalds
2018-03-03 02:05:10 +0800
fb6d47a59 Merge tag 'for-linus-20180302' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A collection of fixes for this series. This is a little larger than
usual at this time, but that's mainly because I was out on vacation
last week. Nothing in here is major in any way, it's just two weeks of
fixes. This contains:

- NVMe pull from Keith, with a set of fixes from the usual suspects.

- mq-deadline zone unlock fix from Damien, fixing an issue with the
SMR zone locking added for 4.16.

- two bcache fixes sent in by Michael, with changes from Coly and
Tang.

- comment typo fix from Eric for blktrace.

- return-value error handling fix for nbd, from Gustavo.

- fix a direct-io case where we don't defer to a completion handler,
making us sleep from IRQ device completion. From Jan.

- a small series from Jan fixing up holes around handling of bdev
references.

- small set of regression fixes from Jiufei, mostly fixing problems
around the gendisk pointer -> partition index change.

- regression fix from Ming, fixing a boundary issue with the discard
page cache invalidation.

- two-patch series from Ming, fixing both a core blk-mq-sched and
kyber issue around token freeing on a requeue condition"

* tag 'for-linus-20180302' of git://git.kernel.dk/linux-block: (24 commits)
block: fix a typo
block: display the correct diskname for bio
block: fix the count of PGPGOUT for WRITE_SAME
mq-deadline: Make sure to always unlock zones
nvmet: fix PSDT field check in command format
nvme-multipath: fix sysfs dangerously created links
nbd: fix return value in error handling path
bcache: fix kcrashes with fio in RAID5 backend dev
bcache: correct flash only vols (check all uuids)
blktrace_api.h: fix comment for struct blk_user_trace_setup
blockdev: Avoid two active bdev inodes for one device
genhd: Fix BUG in blkdev_open()
genhd: Fix use after free in __blkdev_get()
genhd: Add helper put_disk_and_module()
genhd: Rename get_disk() to get_disk_and_module()
genhd: Fix leaked module reference for NVME devices
direct-io: Fix sleep in atomic due to sync AIO
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
block: kyber: fix domain token leak during requeue
blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
...

Linus Torvalds
2018-03-03 01:35:36 +0800

01 Mar, 2018

10 commits

1c7892495 ceph: fix potential memory leak in init_caches() ... Browse Code »

There is lack of cache destroy operation for ceph_file_cachep
when failing from fscache register.

Signed-off-by: Chengguang Xu
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Chengguang Xu
2018-03-01 23:39:47 +0800
1f250e929 Btrfs: fix log replay failure after unlink and link combination ... Browse Code »

If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.

Example:

$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt

$ mkdir /mnt/testdir
$ touch /mnt/testdir/foo
$ ln /mnt/testdir/foo /mnt/testdir/bar

$ sync

$ unlink /mnt/testdir/bar
$ touch /mnt/testdir/bar
$ xfs_io -c "fsync" /mnt/testdir/bar

$ mount /dev/sdb /mnt
mount: mount(2) failed: /mnt: No such file or directory

When replaying the log, for that example, we also see the following in
dmesg/syslog:

[71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
[71813.674204] ------------[ cut here ]------------
[71813.675694] BTRFS: Transaction aborted (error -2)
[71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
[71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
[71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1
[71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
[71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
[71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
[71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
[71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
[71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
[71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
[71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
[71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
[71813.679669] Call Trace:
[71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs]
[71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs]
[71813.679669] add_inode_ref+0x71e/0x851 [btrfs]
[71813.679669] ? __lock_is_held+0x39/0x71
[71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs]
[71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs]
[71813.679669] ? rcu_read_unlock+0x3a/0x57
[71813.679669] ? __lock_is_held+0x39/0x71
[71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs]
[71813.679669] walk_log_tree+0xad/0x188 [btrfs]
[71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
[71813.679669] ? replay_one_extent+0x544/0x544 [btrfs]
[71813.679669] open_ctree+0x1cf6/0x2209 [btrfs]
[71813.679669] btrfs_mount_root+0x368/0x482 [btrfs]
[71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
[71813.679669] ? __lockdep_init_map+0x176/0x1c2
[71813.679669] ? mount_fs+0x64/0x10b
[71813.679669] mount_fs+0x64/0x10b
[71813.679669] vfs_kern_mount+0x68/0xce
[71813.679669] btrfs_mount+0x13e/0x772 [btrfs]
[71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
[71813.679669] ? __lockdep_init_map+0x176/0x1c2
[71813.679669] ? mount_fs+0x64/0x10b
[71813.679669] mount_fs+0x64/0x10b
[71813.679669] vfs_kern_mount+0x68/0xce
[71813.679669] do_mount+0x6e5/0x973
[71813.679669] ? memdup_user+0x3e/0x5c
[71813.679669] SyS_mount+0x72/0x98
[71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b
[71813.679669] RIP: 0033:0x7f7cedf150ba
[71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
[71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
[71813.679669] ---[ end trace 83bd473fc5b4663b ]---
[71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
[71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
[71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
[71814.128078] BTRFS error (device dm-0): open_ctree failed

This happens because the log has inode reference items for both inode 258
(the first file we created) and inode 259 (the second file created), and
when processing the reference item for inode 258, we replace the
corresponding item in the subvolume tree (which has two names, "foo" and
"bar") witht he one in the log (which only has one name, "foo") without
removing the corresponding dir index keys from the parent directory.
Later, when processing the inode reference item for inode 259, which has
a name of "bar" associated to it, we notice that dir index entries exist
for that name and for a different inode, so we attempt to unlink that
name, which fails because the inode reference item for inode 258 no longer
has the name "bar" associated to it, making a call to btrfs_unlink_inode()
fail with a -ENOENT error.

Fix this by unlinking all the names in an inode reference item from a
subvolume tree that are not present in the inode reference item found in
the log tree, before overwriting it with the item from the log tree.

Signed-off-by: Filipe Manana
Signed-off-by: David Sterba

Filipe Manana
2018-03-01 23:18:40 +0800
9a6509c4d Btrfs: fix log replay failure after linking special file and fsync ... Browse Code »

If in the same transaction we rename a special file (fifo, character/block
device or symbolic link), create a hard link for it having its old name
then sync the log, we will end up with a log that can not be replayed and
at when attempting to replay it, an EEXIST error is returned and mounting
the filesystem fails. Example scenario:

$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/testdir
$ mkfifo /mnt/testdir/foo
# Make sure everything done so far is durably persisted.
$ sync

# Create some unrelated file and fsync it, this is just to create a log
# tree. The file must be in the same directory as our special file.
$ touch /mnt/testdir/f1
$ xfs_io -c "fsync" /mnt/testdir/f1

# Rename our special file and then create a hard link with its old name.
$ mv /mnt/testdir/foo /mnt/testdir/bar
$ ln /mnt/testdir/bar /mnt/testdir/foo

# Create some other unrelated file and fsync it, this is just to persist
# the log tree which was modified by the previous rename and link
# operations. Alternatively we could have modified file f1 and fsync it.
$ touch /mnt/f2
$ xfs_io -c "fsync" /mnt/f2

$ mount /dev/sdc /mnt
mount: mount /dev/sdc on /mnt failed: File exists

This happens because when both the log tree and the subvolume's tree have
an entry in the directory "testdir" with the same name, that is, there
is one key (258 INODE_REF 257) in the subvolume tree and another one in
the log tree (where 258 is the inode number of our special file and 257
is the inode for directory "testdir"). Only the data of those two keys
differs, in the subvolume tree the index field for inode reference has
a value of 3 while the log tree it has a value of 5. Because the same key
exists in both trees, but have different index, the log replay fails with
an -EEXIST error when attempting to replay the inode reference from the
log tree.

Fix this by setting the last_unlink_trans field of the inode (our special
file) to the current transaction id when a hard link is created, as this
forces logging the parent directory inode, solving the conflict at log
replay time.

A new generic test case for fstests was also submitted.

Signed-off-by: Filipe Manana
Signed-off-by: David Sterba

Filipe Manana
2018-03-01 23:18:34 +0800
d4dfc0f4d Btrfs: send, fix issuing write op when processing hole in no data mode ... Browse Code »

When doing an incremental send of a filesystem with the no-holes feature
enabled, we end up issuing a write operation when using the no data mode
send flag, instead of issuing an update extent operation. Fix this by
issuing the update extent operation instead.

Trivial reproducer:

$ mkfs.btrfs -f -O no-holes /dev/sdc
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdc /mnt/sdc
$ mount /dev/sdd /mnt/sdd

$ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar
$ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1

$ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar
$ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2

$ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd
$ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \
| btrfs receive -vv /mnt/sdd

Before this change the output of the second receive command is:

receiving snapshot snap2 uuid=f6922049-8c22-e544-9ff9-fc6755918447...
utimes
write foobar, offset 8192, len 8192
utimes foobar
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=f6922049-8c22-e544-9ff9-...

After this change it is:

receiving snapshot snap2 uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...
utimes
update_extent foobar: offset=8192, len=8192
utimes foobar
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...

Signed-off-by: Filipe Manana
Signed-off-by: David Sterba

Filipe Manana
2018-03-01 23:18:07 +0800
3c181c12c btrfs: use proper endianness accessors for super_copy ... Browse Code »

The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value. This was missing in update_super_roots and in sysfs readers.

Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.

Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.

CC: stable@vger.kernel.org
Fixes: df93589a17378 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain
Reviewed-by: Liu Bo
Reviewed-by: David Sterba
[ update changelog ]
Signed-off-by: David Sterba

Anand Jain
2018-03-01 23:17:27 +0800
92e222df7 btrfs: alloc_chunk: fix DUP stripe size handling ... Browse Code »

In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.

The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.

Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.

The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.

Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.

Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.

Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.

The real problem was already introduced with the rest of the code in
73c5de0051.

The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.

Reported-by: Naohiro Aota
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg
Reviewed-by: David Sterba
[ update comment ]
Signed-off-by: David Sterba

Hans van Kranenburg
2018-03-01 23:16:47 +0800
765f3cebf btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster ... Browse Code »

Essentially duplicate the error handling from the above block which
handles the !PageUptodate(page) case and additionally clear
EXTENT_BOUNDARY.

Signed-off-by: Nikolay Borisov
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba

Nikolay Borisov
2018-03-01 23:16:12 +0800
ac01f26a2 btrfs: handle failure of add_pending_csums ... Browse Code »

add_pending_csums was added as part of the new data=ordered
implementation in e6dcd2dc9c48 ("Btrfs: New data=ordered
implementation"). Even back then it called the btrfs_csum_file_blocks
which can fail but it never bothered handling the failure. In ENOMEM
situation this could lead to the filesystem failing to write the
checksums for a particular extent and not detect this. On read this
could lead to the filesystem erroring out due to crc mismatch. Fix it by
propagating failure from add_pending_csums and handling them.

Signed-off-by: Nikolay Borisov
Reviewed-by: Josef Bacik
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2018-03-01 23:16:00 +0800
a8fd1f717 btrfs: use kvzalloc to allocate btrfs_fs_info ... Browse Code »

The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On
kernels built with NR_CPUS=8192, this can result in kmalloc failures
that prevent mounting.

There is work in progress to try to resolve this for every user of
srcu_struct but using kvzalloc will work around the failures until
that is complete.

As an example with NR_CPUS=512 on x86_64: the overall size of
subvol_srcu is 3460 bytes, fs_info is 6496.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2018-03-01 23:15:36 +0800
c02be2334 Merge tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:

- fix some compiler warnings

- fix block reservations for transactions created during log recovery

- fix resource leaks when respecifying mount options

* tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix potential memory leak in mount option parsing
xfs: reserve blocks for refcount / rmap log item recovery
xfs: use memset to initialize xfs_scrub_agfl_info

Linus Torvalds
2018-03-01 03:40:51 +0800

28 Feb, 2018

2 commits

02df428ca net: Convert simple pernet_operations ... Browse Code »

These pernet_operations make pretty simple actions
like variable initialization on init, debug checks
on exit, and so on, and they obviously are able
to be executed in parallel with any others:

vrf_net_ops
lockd_net_ops
grace_net_ops
xfrm6_tunnel_net_ops
kcm_net_ops
tcf_net_ops

Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller

Kirill Tkhai
2018-02-28 00:01:35 +0800
7300bd94e net: Convert nfs_net_ops ... Browse Code »

These pernet_operations just create and destroy /proc entries
and net_generic()->cb_ident_idr IDR. So, we are able to mark
them async.

Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller

Kirill Tkhai
2018-02-28 00:01:35 +0800

27 Feb, 2018

5 commits

5b4c845ea xfs: fix potential memory leak in mount option parsing ... Browse Code »

When specifying string type mount option (e.g., logdev)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case.

Signed-off-by: Chengguang Xu
Reviewed-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Chengguang Xu
2018-02-27 02:02:13 +0800
560e7cb2f blockdev: Avoid two active bdev inodes for one device ... Browse Code »

When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:

CPU0 CPU1
blkdev_open()
bdev = bd_acquire()
del_gendisk()
bdev_unhash_inode(bdev);
remove device
create new device with the same number
__blkdev_get()
disk = get_gendisk()
- gets reference to gendisk of the new device

Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.

Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.

Tested-by: Hou Tao
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800
897366537 genhd: Fix use after free in __blkdev_get() ... Browse Code »

When two blkdev_open() calls race with device removal and recreation,
__blkdev_get() can use looked up gendisk after it is freed:

CPU0 CPU1 CPU2
del_gendisk(disk);
bdev_unhash_inode(inode);
blkdev_open() blkdev_open()
bdev = bd_acquire(inode);
- creates and returns new inode
bdev = bd_acquire(inode);
- returns the same inode
__blkdev_get(devt) __blkdev_get(devt)
disk = get_gendisk(devt);
- got structure of device going away

disk = get_gendisk(devt);
- got new device structure
if (!bdev->bd_openers) {
does the first open
}
if (!bdev->bd_openers)
- false
} else {
put_disk_and_module(disk)
- remember this was old device - this was last ref and disk is
now freed
}
disk_unblock_events(disk); -> oops

Fix the problem by making sure we drop reference to disk in
__blkdev_get() only after we are really done with it.

Reported-by: Hou Tao
Tested-by: Hou Tao
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800
9df6c2991 genhd: Add helper put_disk_and_module() ... Browse Code »

Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.

Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800
d9c10e5b8 direct-io: Fix sleep in atomic due to sync AIO ... Browse Code »

Commit e864f39569f4 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
way for direct IO to become synchronous and thus trigger fsync from the
IO completion handler. Then commit 9830f4be159b "fs: Use RWF_* flags for
AIO operations" allowed these flags to be set for AIO as well. However
that commit forgot to update the condition checking whether the IO
completion handling should be defered to a workqueue and thus AIO DIO
with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
sleep in atomic.

Fix the problem by checking directly iocb flags (the same way as it is
done in dio_complete()) instead of checking all conditions that could
lead to IO being synchronous.

CC: Christoph Hellwig
CC: Goldwyn Rodrigues
CC: stable@vger.kernel.org
Reported-by: Mark Rutland
Tested-by: Mark Rutland
Fixes: 9830f4be159b29399d107bffb99e0132bc5aedd4
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:05:35 +0800

26 Feb, 2018

4 commits

18106734b ceph: fix dentry leak when failing to init debugfs ... Browse Code »

When failing from ceph_fs_debugfs_init() in ceph_real_mount(),
there is lack of dput of root_dentry and it causes slab errors,
so change the calling order of ceph_fs_debugfs_init() and
open_root_dentry() and do some cleanups to avoid this issue.

Signed-off-by: Chengguang Xu
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Chengguang Xu
2018-02-26 23:20:07 +0800
937441f3a libceph, ceph: avoid memory leak when specifying same option several times ... Browse Code »

When parsing string option, in order to avoid memory leak we need to
carefully free it first in case of specifying same option several times.

Signed-off-by: Chengguang Xu
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Chengguang Xu
2018-02-26 23:19:30 +0800
6ef0bc6dd ceph: flush dirty caps of unlinked inode ASAP ... Browse Code »

Client should release unlinked inode from its cache ASAP. But client
can't release inode with dirty caps.

Link: http://tracker.ceph.com/issues/22886
Signed-off-by: Zhi Zhang
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Zhi Zhang
2018-02-26 23:19:16 +0800
c89be5242 Merge tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:

- fix a broken cast in nfs4_callback_recallany()

- fix an Oops during NFSv4 migration events

- make struct nlmclnt_fl_close_lock_ops static

* tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: make struct nlmclnt_fl_close_lock_ops static
nfs: system crashes after NFS4ERR_MOVED recovery
NFSv4: Fix broken cast in nfs4_callback_recallany()

Linus Torvalds
2018-02-26 05:43:18 +0800

24 Feb, 2018

1 commit

f74290fdb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2018-02-24 13:04:20 +0800

23 Feb, 2018

7 commits

bae6cfe8a Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/eb… ... Browse Code »

…iederm/user-namespace

Pull siginfo fix from Eric Biederman:
"This fixes a build error that only shows up on blackfin"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fs/signalfd: fix build error for BUS_MCEERR_AR

Linus Torvalds
2018-02-23 09:04:06 +0800
b31c2bdcd xfs: reserve blocks for refcount / rmap log item recovery ... Browse Code »

During log recovery, the per-AG reservations aren't yet set up, so log
recovery has to reserve enough blocks to handle all possible btree
splits.

Reported-by: Dave Chinner
Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner

Darrick J. Wong
2018-02-23 06:41:25 +0800
86516eff3 xfs: use memset to initialize xfs_scrub_agfl_info ... Browse Code »

Apparently different gcc versions have competing and
incompatible notions of how to initialize at declaration,
so just give up and fall back to the time-tested memset().

Signed-off-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Eric Sandeen
2018-02-23 06:41:25 +0800
9026e820c fs/signalfd: fix build error for BUS_MCEERR_AR ... Browse Code »

Fix build error in fs/signalfd.c by using same method that is used in
kernel/signal.c: separate blocks for different signal si_code values.

./fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared (first use in this function)

Reported-by: Geert Uytterhoeven
Signed-off-by: Randy Dunlap
Cc: Alexander Viro
Signed-off-by: Eric W. Biederman

Randy Dunlap
2018-02-23 05:00:07 +0800
bef3efbeb efivarfs: Limit the rate for non-root to read files ... Browse Code »

Each read from a file in efivarfs results in two calls to EFI
(one to get the file size, another to get the actual data).

On X86 these EFI calls result in broadcast system management
interrupts (SMI) which affect performance of the whole system.
A malicious user can loop performing reads from efivarfs bringing
the system to its knees.

Linus suggested per-user rate limit to solve this.

So we add a ratelimit structure to "user_struct" and initialize
it for the root user for no limit. When allocating user_struct for
other users we set the limit to 100 per second. This could be used
for other places that want to limit the rate of some detrimental
user action.

In efivarfs if the limit is exceeded when reading, we take an
interruptible nap for 50ms and check the rate limit again.

Signed-off-by: Tony Luck
Acked-by: Ard Biesheuvel
Signed-off-by: Linus Torvalds

Luck, Tony
2018-02-23 02:21:02 +0800
1b7204064 NFS: make struct nlmclnt_fl_close_lock_ops static ... Browse Code »

The structure nlmclnt_fl_close_lock_ops s local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not
declared. Should it be static?

Signed-off-by: Colin Ian King
Signed-off-by: Trond Myklebust

Colin Ian King
2018-02-23 01:23:01 +0800
ad86f605c nfs: system crashes after NFS4ERR_MOVED recovery ... Browse Code »

nfs4_update_server unconditionally releases the nfs_client for the
source server. If migration fails, this can cause the source server's
nfs_client struct to be left with a low reference count, resulting in
use-after-free. Also, adjust reference count handling for ELOOP.

NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
nfs_put_client+0xfa/0x110 [nfs]
nfs4_run_state_manager+0x30/0x40 [nfsv4]
kthread+0xd8/0xf0

BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
nfs4_xdr_enc_write+0x6b/0x160 [nfsv4]
rpcauth_wrap_req+0xac/0xf0 [sunrpc]
call_transmit+0x18c/0x2c0 [sunrpc]
__rpc_execute+0xa6/0x490 [sunrpc]
rpc_async_schedule+0x15/0x20 [sunrpc]
process_one_work+0x160/0x470
worker_thread+0x112/0x540
? rescuer_thread+0x3f0/0x3f0
kthread+0xd8/0xf0

This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"),
but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops")

Reported-by: Helen Chao
Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops")
Signed-off-by: Bill Baker
Reviewed-by: Chuck Lever
Signed-off-by: Trond Myklebust

Bill.Baker@oracle.com
2018-02-23 01:17:42 +0800

22 Feb, 2018

1 commit

6d243a235 NFSv4: Fix broken cast in nfs4_callback_recallany() ... Browse Code »

Passing a pointer to a unsigned integer to test_bit() is broken.

Signed-off-by: Trond Myklebust

Trond Myklebust
2018-02-22 05:35:50 +0800

20 Feb, 2018

1 commit

f5c0c6f42 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2018-02-20 07:46:11 +0800

17 Feb, 2018

1 commit

da370f1d6 Merge tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:
"We have a few assorted fixes, some of them show up during fstests so I
gave them more testing"

* tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
Btrfs: fix null pointer dereference when replacing missing device
btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
btrfs: Ignore errors from btrfs_qgroup_trace_extent_post
Btrfs: fix unexpected -EEXIST when creating new inode
Btrfs: fix use-after-free on root->orphan_block_rsv
Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctly
Btrfs: fix extent state leak from tree log
Btrfs: fix crash due to not cleaning up tree log block's dirty bits
Btrfs: fix deadlock in run_delalloc_nocow

Linus Torvalds
2018-02-17 01:26:18 +0800

16 Feb, 2018

1 commit

24dce0800 net: Export open_related_ns() ... Browse Code »

This function will be used to obtain net of tun device.

Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller

Kirill Tkhai
2018-02-16 04:34:42 +0800

15 Feb, 2018

2 commits

e525de3ab Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"Misc fixes all across the map:

- /proc/kcore vsyscall related fixes
- LTO fix
- build warning fix
- CPU hotplug fix
- Kconfig NR_CPUS cleanups
- cpu_has() cleanups/robustification
- .gitignore fix
- memory-failure unmapping fix
- UV platform fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
x86/error_inject: Make just_return_func() globally visible
x86/platform/UV: Fix GAM Range Table entries less than 1GB
x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
x86/Kconfig: Further simplify the NR_CPUS config
x86/Kconfig: Simplify NR_CPUS config
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
x86/cpufeature: Update _static_cpu_has() to use all named variables
x86/cpufeature: Reindent _static_cpu_has()

Linus Torvalds
2018-02-15 09:31:51 +0800
6556677a8 Merge tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 ... Browse Code »

Pull gfs2 fix from Bob Peterson:
"Fix regressions in the gfs2 iomap for block_map implementation we
recently discovered in commit 3974320ca6"

* tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fixes to "Implement iomap for block_map"

Linus Torvalds
2018-02-15 02:14:59 +0800

14 Feb, 2018

1 commit

49edd5bf4 gfs2: Fixes to "Implement iomap for block_map" ... Browse Code »

It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:

In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense. After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.

In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately. Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.

Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson

Andreas Gruenbacher
2018-02-14 04:38:10 +0800