06 Mar, 2018

1 commit


05 Mar, 2018

1 commit

  • Pull btrfs fixes from David Sterba:

    - when NR_CPUS is large, a SRCU structure can significantly inflate
    size of the main filesystem structure that would not be possible to
    allocate by kmalloc, so the kvalloc fallback is used

    - improved error handling

    - fix endiannes when printing some filesystem attributes via sysfs,
    this is could happen when a filesystem is moved between different
    endianity hosts

    - send fixes: the NO_HOLE mode should not send a write operation for a
    file hole

    - fix log replay for for special files followed by file hardlinks

    - fix log replay failure after unlink and link combination

    - fix max chunk size calculation for DUP allocation

    * tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    Btrfs: fix log replay failure after unlink and link combination
    Btrfs: fix log replay failure after linking special file and fsync
    Btrfs: send, fix issuing write op when processing hole in no data mode
    btrfs: use proper endianness accessors for super_copy
    btrfs: alloc_chunk: fix DUP stripe size handling
    btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
    btrfs: handle failure of add_pending_csums
    btrfs: use kvzalloc to allocate btrfs_fs_info

    Linus Torvalds
     

03 Mar, 2018

2 commits

  • Pull ceph fixes from Ilya Dryomov:
    "A cap handling fix from Zhi that ensures that metadata writeback isn't
    delayed and three error path memory leak fixups from Chengguang"

    * tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client:
    ceph: fix potential memory leak in init_caches()
    ceph: fix dentry leak when failing to init debugfs
    libceph, ceph: avoid memory leak when specifying same option several times
    ceph: flush dirty caps of unlinked inode ASAP

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "A collection of fixes for this series. This is a little larger than
    usual at this time, but that's mainly because I was out on vacation
    last week. Nothing in here is major in any way, it's just two weeks of
    fixes. This contains:

    - NVMe pull from Keith, with a set of fixes from the usual suspects.

    - mq-deadline zone unlock fix from Damien, fixing an issue with the
    SMR zone locking added for 4.16.

    - two bcache fixes sent in by Michael, with changes from Coly and
    Tang.

    - comment typo fix from Eric for blktrace.

    - return-value error handling fix for nbd, from Gustavo.

    - fix a direct-io case where we don't defer to a completion handler,
    making us sleep from IRQ device completion. From Jan.

    - a small series from Jan fixing up holes around handling of bdev
    references.

    - small set of regression fixes from Jiufei, mostly fixing problems
    around the gendisk pointer -> partition index change.

    - regression fix from Ming, fixing a boundary issue with the discard
    page cache invalidation.

    - two-patch series from Ming, fixing both a core blk-mq-sched and
    kyber issue around token freeing on a requeue condition"

    * tag 'for-linus-20180302' of git://git.kernel.dk/linux-block: (24 commits)
    block: fix a typo
    block: display the correct diskname for bio
    block: fix the count of PGPGOUT for WRITE_SAME
    mq-deadline: Make sure to always unlock zones
    nvmet: fix PSDT field check in command format
    nvme-multipath: fix sysfs dangerously created links
    nbd: fix return value in error handling path
    bcache: fix kcrashes with fio in RAID5 backend dev
    bcache: correct flash only vols (check all uuids)
    blktrace_api.h: fix comment for struct blk_user_trace_setup
    blockdev: Avoid two active bdev inodes for one device
    genhd: Fix BUG in blkdev_open()
    genhd: Fix use after free in __blkdev_get()
    genhd: Add helper put_disk_and_module()
    genhd: Rename get_disk() to get_disk_and_module()
    genhd: Fix leaked module reference for NVME devices
    direct-io: Fix sleep in atomic due to sync AIO
    nvme-pci: Fix nvme queue cleanup if IRQ setup fails
    block: kyber: fix domain token leak during requeue
    blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
    ...

    Linus Torvalds
     

01 Mar, 2018

10 commits

  • There is lack of cache destroy operation for ceph_file_cachep
    when failing from fscache register.

    Signed-off-by: Chengguang Xu
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Chengguang Xu
     
  • If we have a file with 2 (or more) hard links in the same directory,
    remove one of the hard links, create a new file (or link an existing file)
    in the same directory with the name of the removed hard link, and then
    finally fsync the new file, we end up with a log that fails to replay,
    causing a mount failure.

    Example:

    $ mkfs.btrfs -f /dev/sdb
    $ mount /dev/sdb /mnt

    $ mkdir /mnt/testdir
    $ touch /mnt/testdir/foo
    $ ln /mnt/testdir/foo /mnt/testdir/bar

    $ sync

    $ unlink /mnt/testdir/bar
    $ touch /mnt/testdir/bar
    $ xfs_io -c "fsync" /mnt/testdir/bar

    $ mount /dev/sdb /mnt
    mount: mount(2) failed: /mnt: No such file or directory

    When replaying the log, for that example, we also see the following in
    dmesg/syslog:

    [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
    [71813.674204] ------------[ cut here ]------------
    [71813.675694] BTRFS: Transaction aborted (error -2)
    [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
    [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
    [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1
    [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
    [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
    [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
    [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
    [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
    [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
    [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
    [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
    [71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
    [71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
    [71813.679669] Call Trace:
    [71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs]
    [71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs]
    [71813.679669] add_inode_ref+0x71e/0x851 [btrfs]
    [71813.679669] ? __lock_is_held+0x39/0x71
    [71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs]
    [71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs]
    [71813.679669] ? rcu_read_unlock+0x3a/0x57
    [71813.679669] ? __lock_is_held+0x39/0x71
    [71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs]
    [71813.679669] walk_log_tree+0xad/0x188 [btrfs]
    [71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
    [71813.679669] ? replay_one_extent+0x544/0x544 [btrfs]
    [71813.679669] open_ctree+0x1cf6/0x2209 [btrfs]
    [71813.679669] btrfs_mount_root+0x368/0x482 [btrfs]
    [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
    [71813.679669] ? __lockdep_init_map+0x176/0x1c2
    [71813.679669] ? mount_fs+0x64/0x10b
    [71813.679669] mount_fs+0x64/0x10b
    [71813.679669] vfs_kern_mount+0x68/0xce
    [71813.679669] btrfs_mount+0x13e/0x772 [btrfs]
    [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
    [71813.679669] ? __lockdep_init_map+0x176/0x1c2
    [71813.679669] ? mount_fs+0x64/0x10b
    [71813.679669] mount_fs+0x64/0x10b
    [71813.679669] vfs_kern_mount+0x68/0xce
    [71813.679669] do_mount+0x6e5/0x973
    [71813.679669] ? memdup_user+0x3e/0x5c
    [71813.679669] SyS_mount+0x72/0x98
    [71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b
    [71813.679669] RIP: 0033:0x7f7cedf150ba
    [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
    [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
    [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
    [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
    [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
    [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
    [71814.128078] BTRFS error (device dm-0): open_ctree failed

    This happens because the log has inode reference items for both inode 258
    (the first file we created) and inode 259 (the second file created), and
    when processing the reference item for inode 258, we replace the
    corresponding item in the subvolume tree (which has two names, "foo" and
    "bar") witht he one in the log (which only has one name, "foo") without
    removing the corresponding dir index keys from the parent directory.
    Later, when processing the inode reference item for inode 259, which has
    a name of "bar" associated to it, we notice that dir index entries exist
    for that name and for a different inode, so we attempt to unlink that
    name, which fails because the inode reference item for inode 258 no longer
    has the name "bar" associated to it, making a call to btrfs_unlink_inode()
    fail with a -ENOENT error.

    Fix this by unlinking all the names in an inode reference item from a
    subvolume tree that are not present in the inode reference item found in
    the log tree, before overwriting it with the item from the log tree.

    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     
  • If in the same transaction we rename a special file (fifo, character/block
    device or symbolic link), create a hard link for it having its old name
    then sync the log, we will end up with a log that can not be replayed and
    at when attempting to replay it, an EEXIST error is returned and mounting
    the filesystem fails. Example scenario:

    $ mkfs.btrfs -f /dev/sdc
    $ mount /dev/sdc /mnt
    $ mkdir /mnt/testdir
    $ mkfifo /mnt/testdir/foo
    # Make sure everything done so far is durably persisted.
    $ sync

    # Create some unrelated file and fsync it, this is just to create a log
    # tree. The file must be in the same directory as our special file.
    $ touch /mnt/testdir/f1
    $ xfs_io -c "fsync" /mnt/testdir/f1

    # Rename our special file and then create a hard link with its old name.
    $ mv /mnt/testdir/foo /mnt/testdir/bar
    $ ln /mnt/testdir/bar /mnt/testdir/foo

    # Create some other unrelated file and fsync it, this is just to persist
    # the log tree which was modified by the previous rename and link
    # operations. Alternatively we could have modified file f1 and fsync it.
    $ touch /mnt/f2
    $ xfs_io -c "fsync" /mnt/f2

    $ mount /dev/sdc /mnt
    mount: mount /dev/sdc on /mnt failed: File exists

    This happens because when both the log tree and the subvolume's tree have
    an entry in the directory "testdir" with the same name, that is, there
    is one key (258 INODE_REF 257) in the subvolume tree and another one in
    the log tree (where 258 is the inode number of our special file and 257
    is the inode for directory "testdir"). Only the data of those two keys
    differs, in the subvolume tree the index field for inode reference has
    a value of 3 while the log tree it has a value of 5. Because the same key
    exists in both trees, but have different index, the log replay fails with
    an -EEXIST error when attempting to replay the inode reference from the
    log tree.

    Fix this by setting the last_unlink_trans field of the inode (our special
    file) to the current transaction id when a hard link is created, as this
    forces logging the parent directory inode, solving the conflict at log
    replay time.

    A new generic test case for fstests was also submitted.

    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     
  • When doing an incremental send of a filesystem with the no-holes feature
    enabled, we end up issuing a write operation when using the no data mode
    send flag, instead of issuing an update extent operation. Fix this by
    issuing the update extent operation instead.

    Trivial reproducer:

    $ mkfs.btrfs -f -O no-holes /dev/sdc
    $ mkfs.btrfs -f /dev/sdd
    $ mount /dev/sdc /mnt/sdc
    $ mount /dev/sdd /mnt/sdd

    $ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar
    $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1

    $ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar
    $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2

    $ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd
    $ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \
    | btrfs receive -vv /mnt/sdd

    Before this change the output of the second receive command is:

    receiving snapshot snap2 uuid=f6922049-8c22-e544-9ff9-fc6755918447...
    utimes
    write foobar, offset 8192, len 8192
    utimes foobar
    BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=f6922049-8c22-e544-9ff9-...

    After this change it is:

    receiving snapshot snap2 uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...
    utimes
    update_extent foobar: offset=8192, len=8192
    utimes foobar
    BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...

    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     
  • The fs_info::super_copy is a byte copy of the on-disk structure and all
    members must use the accessor macros/functions to obtain the right
    value. This was missing in update_super_roots and in sysfs readers.

    Moving between opposite endianness hosts will report bogus numbers in
    sysfs, and mount may fail as the root will not be restored correctly. If
    the filesystem is always used on a same endian host, this will not be a
    problem.

    Fix this by using the btrfs_set_super...() functions to set
    fs_info::super_copy values, and for the sysfs, use the cached
    fs_info::nodesize/sectorsize values.

    CC: stable@vger.kernel.org
    Fixes: df93589a17378 ("btrfs: export more from FS_INFO to sysfs")
    Signed-off-by: Anand Jain
    Reviewed-by: Liu Bo
    Reviewed-by: David Sterba
    [ update changelog ]
    Signed-off-by: David Sterba

    Anand Jain
     
  • In case of using DUP, we search for enough unallocated disk space on a
    device to hold two stripes.

    The devices_info[ndevs-1].max_avail that holds the amount of unallocated
    space found is directly assigned to stripe_size, while it's actually
    twice the stripe size.

    Later on in the code, an unconditional division of stripe_size by
    dev_stripes corrects the value, but in the meantime there's a check to
    see if the stripe_size does not exceed max_chunk_size. Since during this
    check stripe_size is twice the amount as intended, the check will reduce
    the stripe_size to max_chunk_size if the actual correct to be used
    stripe_size is more than half the amount of max_chunk_size.

    The unconditional division later tries to correct stripe_size, but will
    actually make sure we can't allocate more than half the max_chunk_size.

    Fix this by moving the division by dev_stripes before the max chunk size
    check, so it always contains the right value, instead of putting a duct
    tape division in further on to get it fixed again.

    Since in all other cases than DUP, dev_stripes is 1, this change only
    affects DUP.

    Other attempts in the past were made to fix this:
    * 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
    to fix the same problem, but still resulted in part of the code acting
    on a wrongly doubled stripe_size value.
    * 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
    broke this fix again.

    The real problem was already introduced with the rest of the code in
    73c5de0051.

    The user visible result however will be that the max chunk size for DUP
    will suddenly double, while it's actually acting according to the limits
    in the code again like it was 5 years ago.

    Reported-by: Naohiro Aota
    Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
    Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
    Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
    Signed-off-by: Hans van Kranenburg
    Reviewed-by: David Sterba
    [ update comment ]
    Signed-off-by: David Sterba

    Hans van Kranenburg
     
  • Essentially duplicate the error handling from the above block which
    handles the !PageUptodate(page) case and additionally clear
    EXTENT_BOUNDARY.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: Josef Bacik
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • add_pending_csums was added as part of the new data=ordered
    implementation in e6dcd2dc9c48 ("Btrfs: New data=ordered
    implementation"). Even back then it called the btrfs_csum_file_blocks
    which can fail but it never bothered handling the failure. In ENOMEM
    situation this could lead to the filesystem failing to write the
    checksums for a particular extent and not detect this. On read this
    could lead to the filesystem erroring out due to crc mismatch. Fix it by
    propagating failure from add_pending_csums and handling them.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On
    kernels built with NR_CPUS=8192, this can result in kmalloc failures
    that prevent mounting.

    There is work in progress to try to resolve this for every user of
    srcu_struct but using kvzalloc will work around the failures until
    that is complete.

    As an example with NR_CPUS=512 on x86_64: the overall size of
    subvol_srcu is 3460 bytes, fs_info is 6496.

    Signed-off-by: Jeff Mahoney
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Jeff Mahoney
     
  • Pull xfs fixes from Darrick Wong:

    - fix some compiler warnings

    - fix block reservations for transactions created during log recovery

    - fix resource leaks when respecifying mount options

    * tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    xfs: fix potential memory leak in mount option parsing
    xfs: reserve blocks for refcount / rmap log item recovery
    xfs: use memset to initialize xfs_scrub_agfl_info

    Linus Torvalds
     

28 Feb, 2018

2 commits

  • These pernet_operations make pretty simple actions
    like variable initialization on init, debug checks
    on exit, and so on, and they obviously are able
    to be executed in parallel with any others:

    vrf_net_ops
    lockd_net_ops
    grace_net_ops
    xfrm6_tunnel_net_ops
    kcm_net_ops
    tcf_net_ops

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     
  • These pernet_operations just create and destroy /proc entries
    and net_generic()->cb_ident_idr IDR. So, we are able to mark
    them async.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

27 Feb, 2018

5 commits

  • When specifying string type mount option (e.g., logdev)
    several times in a mount, current option parsing may
    cause memory leak. Hence, call kfree for previous one
    in this case.

    Signed-off-by: Chengguang Xu
    Reviewed-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Chengguang Xu
     
  • When blkdev_open() races with device removal and creation it can happen
    that unhashed bdev inode gets associated with newly created gendisk
    like:

    CPU0 CPU1
    blkdev_open()
    bdev = bd_acquire()
    del_gendisk()
    bdev_unhash_inode(bdev);
    remove device
    create new device with the same number
    __blkdev_get()
    disk = get_gendisk()
    - gets reference to gendisk of the new device

    Now another blkdev_open() will not find original 'bdev' as it got
    unhashed, create a new one and associate it with the same 'disk' at
    which point problems start as we have two independent page caches for
    one device.

    Fix the problem by verifying that the bdev inode didn't get unhashed
    before we acquired gendisk reference. That way we make sure gendisk can
    get associated only with visible bdev inodes.

    Tested-by: Hou Tao
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • When two blkdev_open() calls race with device removal and recreation,
    __blkdev_get() can use looked up gendisk after it is freed:

    CPU0 CPU1 CPU2
    del_gendisk(disk);
    bdev_unhash_inode(inode);
    blkdev_open() blkdev_open()
    bdev = bd_acquire(inode);
    - creates and returns new inode
    bdev = bd_acquire(inode);
    - returns the same inode
    __blkdev_get(devt) __blkdev_get(devt)
    disk = get_gendisk(devt);
    - got structure of device going away


    disk = get_gendisk(devt);
    - got new device structure
    if (!bdev->bd_openers) {
    does the first open
    }
    if (!bdev->bd_openers)
    - false
    } else {
    put_disk_and_module(disk)
    - remember this was old device - this was last ref and disk is
    now freed
    }
    disk_unblock_events(disk); -> oops

    Fix the problem by making sure we drop reference to disk in
    __blkdev_get() only after we are really done with it.

    Reported-by: Hou Tao
    Tested-by: Hou Tao
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Add a proper counterpart to get_disk_and_module() -
    put_disk_and_module(). Currently it is opencoded in several places.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Commit e864f39569f4 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
    way for direct IO to become synchronous and thus trigger fsync from the
    IO completion handler. Then commit 9830f4be159b "fs: Use RWF_* flags for
    AIO operations" allowed these flags to be set for AIO as well. However
    that commit forgot to update the condition checking whether the IO
    completion handling should be defered to a workqueue and thus AIO DIO
    with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
    sleep in atomic.

    Fix the problem by checking directly iocb flags (the same way as it is
    done in dio_complete()) instead of checking all conditions that could
    lead to IO being synchronous.

    CC: Christoph Hellwig
    CC: Goldwyn Rodrigues
    CC: stable@vger.kernel.org
    Reported-by: Mark Rutland
    Tested-by: Mark Rutland
    Fixes: 9830f4be159b29399d107bffb99e0132bc5aedd4
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

26 Feb, 2018

4 commits


24 Feb, 2018

1 commit


23 Feb, 2018

7 commits

  • …iederm/user-namespace

    Pull siginfo fix from Eric Biederman:
    "This fixes a build error that only shows up on blackfin"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    fs/signalfd: fix build error for BUS_MCEERR_AR

    Linus Torvalds
     
  • During log recovery, the per-AG reservations aren't yet set up, so log
    recovery has to reserve enough blocks to handle all possible btree
    splits.

    Reported-by: Dave Chinner
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner

    Darrick J. Wong
     
  • Apparently different gcc versions have competing and
    incompatible notions of how to initialize at declaration,
    so just give up and fall back to the time-tested memset().

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • Fix build error in fs/signalfd.c by using same method that is used in
    kernel/signal.c: separate blocks for different signal si_code values.

    ./fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared (first use in this function)

    Reported-by: Geert Uytterhoeven
    Signed-off-by: Randy Dunlap
    Cc: Alexander Viro
    Signed-off-by: Eric W. Biederman

    Randy Dunlap
     
  • Each read from a file in efivarfs results in two calls to EFI
    (one to get the file size, another to get the actual data).

    On X86 these EFI calls result in broadcast system management
    interrupts (SMI) which affect performance of the whole system.
    A malicious user can loop performing reads from efivarfs bringing
    the system to its knees.

    Linus suggested per-user rate limit to solve this.

    So we add a ratelimit structure to "user_struct" and initialize
    it for the root user for no limit. When allocating user_struct for
    other users we set the limit to 100 per second. This could be used
    for other places that want to limit the rate of some detrimental
    user action.

    In efivarfs if the limit is exceeded when reading, we take an
    interruptible nap for 50ms and check the rate limit again.

    Signed-off-by: Tony Luck
    Acked-by: Ard Biesheuvel
    Signed-off-by: Linus Torvalds

    Luck, Tony
     
  • The structure nlmclnt_fl_close_lock_ops s local to the source and does
    not need to be in global scope, so make it static.

    Cleans up sparse warning:
    fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not
    declared. Should it be static?

    Signed-off-by: Colin Ian King
    Signed-off-by: Trond Myklebust

    Colin Ian King
     
  • nfs4_update_server unconditionally releases the nfs_client for the
    source server. If migration fails, this can cause the source server's
    nfs_client struct to be left with a low reference count, resulting in
    use-after-free. Also, adjust reference count handling for ELOOP.

    NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
    WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
    nfs_put_client+0xfa/0x110 [nfs]
    nfs4_run_state_manager+0x30/0x40 [nfsv4]
    kthread+0xd8/0xf0

    BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
    nfs4_xdr_enc_write+0x6b/0x160 [nfsv4]
    rpcauth_wrap_req+0xac/0xf0 [sunrpc]
    call_transmit+0x18c/0x2c0 [sunrpc]
    __rpc_execute+0xa6/0x490 [sunrpc]
    rpc_async_schedule+0x15/0x20 [sunrpc]
    process_one_work+0x160/0x470
    worker_thread+0x112/0x540
    ? rescuer_thread+0x3f0/0x3f0
    kthread+0xd8/0xf0

    This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"),
    but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops")

    Reported-by: Helen Chao
    Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops")
    Signed-off-by: Bill Baker
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Bill.Baker@oracle.com
     

22 Feb, 2018

1 commit


20 Feb, 2018

1 commit


17 Feb, 2018

1 commit

  • Pull btrfs fixes from David Sterba:
    "We have a few assorted fixes, some of them show up during fstests so I
    gave them more testing"

    * tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
    Btrfs: fix null pointer dereference when replacing missing device
    btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
    btrfs: Ignore errors from btrfs_qgroup_trace_extent_post
    Btrfs: fix unexpected -EEXIST when creating new inode
    Btrfs: fix use-after-free on root->orphan_block_rsv
    Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctly
    Btrfs: fix extent state leak from tree log
    Btrfs: fix crash due to not cleaning up tree log block's dirty bits
    Btrfs: fix deadlock in run_delalloc_nocow

    Linus Torvalds
     

16 Feb, 2018

1 commit


15 Feb, 2018

2 commits

  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes all across the map:

    - /proc/kcore vsyscall related fixes
    - LTO fix
    - build warning fix
    - CPU hotplug fix
    - Kconfig NR_CPUS cleanups
    - cpu_has() cleanups/robustification
    - .gitignore fix
    - memory-failure unmapping fix
    - UV platform fix"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
    x86/error_inject: Make just_return_func() globally visible
    x86/platform/UV: Fix GAM Range Table entries less than 1GB
    x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
    x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
    x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
    vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
    x86/Kconfig: Further simplify the NR_CPUS config
    x86/Kconfig: Simplify NR_CPUS config
    x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
    x86/cpufeature: Update _static_cpu_has() to use all named variables
    x86/cpufeature: Reindent _static_cpu_has()

    Linus Torvalds
     
  • Pull gfs2 fix from Bob Peterson:
    "Fix regressions in the gfs2 iomap for block_map implementation we
    recently discovered in commit 3974320ca6"

    * tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: Fixes to "Implement iomap for block_map"

    Linus Torvalds
     

14 Feb, 2018

1 commit

  • It turns out that commit 3974320ca6 "Implement iomap for block_map"
    introduced a few bugs that trigger occasional failures with xfstest
    generic/476:

    In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
    beyond the end of the allocated metadata (height > ip->i_height).
    There, we can end up calling hole_size with a metapath that doesn't
    match the current metadata tree, which doesn't make sense. After
    untangling the code at do_alloc, fix this by checking if the block we
    are looking for is within the range of allocated metadata.

    In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
    for reading stuffed files: this is handled separately. Make sure we
    don't truncate iomap->length for reads beyond the end of the file; in
    that case, the entire range counts as a hole.

    Finally, revert to taking a bitmap write lock when doing allocations.
    It's unclear why that change didn't lead to any failures during testing.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher