14 Dec, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main block pull request this series. Contrary to previous
    release, I've kept the core and driver changes in the same branch. We
    always ended up having dependencies between the two for obvious
    reasons, so makes more sense to keep them together. That said, I'll
    probably try and keep more topical branches going forward, especially
    for cycles that end up being as busy as this one.

    The major parts of this pull request is:

    - Improved support for O_DIRECT on block devices, with a small
    private implementation instead of using the pig that is
    fs/direct-io.c. From Christoph.

    - Request completion tracking in a scalable fashion. This is utilized
    by two components in this pull, the new hybrid polling and the
    writeback queue throttling code.

    - Improved support for polling with O_DIRECT, adding a hybrid mode
    that combines pure polling with an initial sleep. From me.

    - Support for automatic throttling of writeback queues on the block
    side. This uses feedback from the device completion latencies to
    scale the queue on the block side up or down. From me.

    - Support from SMR drives in the block layer and for SD. From Hannes
    and Shaun.

    - Multi-connection support for nbd. From Josef.

    - Cleanup of request and bio flags, so we have a clear split between
    which are bio (or rq) private, and which ones are shared. From
    Christoph.

    - A set of patches from Bart, that improve how we handle queue
    stopping and starting in blk-mq.

    - Support for WRITE_ZEROES from Chaitanya.

    - Lightnvm updates from Javier/Matias.

    - Supoort for FC for the nvme-over-fabrics code. From James Smart.

    - A bunch of fixes from a whole slew of people, too many to name
    here"

    * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
    blk-stat: fix a few cases of missing batch flushing
    blk-flush: run the queue when inserting blk-mq flush
    elevator: make the rqhash helpers exported
    blk-mq: abstract out blk_mq_dispatch_rq_list() helper
    blk-mq: add blk_mq_start_stopped_hw_queue()
    block: improve handling of the magic discard payload
    blk-wbt: don't throttle discard or write zeroes
    nbd: use dev_err_ratelimited in io path
    nbd: reset the setup task for NBD_CLEAR_SOCK
    nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
    nvme-fabrics: Add target support for FC transport
    nvme-fabrics: Add host support for FC transport
    nvme-fabrics: Add FC transport LLDD api definitions
    nvme-fabrics: Add FC transport FC-NVME definitions
    nvme-fabrics: Add FC transport error codes to nvme.h
    Add type 0x28 NVME type code to scsi fc headers
    nvme-fabrics: patch target code in prep for FC transport support
    nvme-fabrics: set sqe.command_id in core not transports
    parser: add u64 number parser
    nvme-rdma: align to generic ib_event logging helper
    ...

    Linus Torvalds
     

01 Dec, 2016

1 commit

  • The ER records are printed without explicit log level presuming line
    continuation until "\n". After the commit 4bcc595ccd8 (printk:
    reinstate KERN_CONT for printing continuation lines), the ER records are
    printed a character per line.

    Adding KERN_CONT to appropriate printk statements restores the printout
    behavior.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

01 Nov, 2016

1 commit


18 Oct, 2016

1 commit

  • When isofs_mount() is called to mount a device read-write, it returns
    EACCES even before it checks that the device actually contains an isofs
    filesystem. This may confuse mount(8) which then tries to mount all
    subsequent filesystem types in read-only mode.

    Fix the problem by returning EACCES only once we verify that the device
    indeed contains an iso9660 filesystem.

    CC: stable@vger.kernel.org
    Fixes: 17b7f7cf58926844e1dd40f5eb5348d481deca6a
    Reported-by: Kent Overstreet
    Reported-by: Karel Zak
    Signed-off-by: Jan Kara

    Jan Kara
     

01 Aug, 2016

1 commit


29 Jul, 2016

2 commits

  • Pull vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    Probably the most interesting part long-term is ->d_init() - that will
    have a bunch of followups in (at least) ceph and lustre, but we'll
    need to sort the barrier-related rules before it can get used for
    really non-trivial stuff.

    Another fun thing is the merge of ->d_iput() callers (dentry_iput()
    and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
    except the one in __d_lookup_lru())"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    fs/dcache.c: avoid soft-lockup in dput()
    vfs: new d_init method
    vfs: Update lookup_dcache() comment
    bdev: get rid of ->bd_inodes
    Remove last traces of ->sync_page
    new helper: d_same_name()
    dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
    vfs: clean up documentation
    vfs: document ->d_real()
    vfs: merge .d_select_inode() into .d_real()
    unify dentry_iput() and dentry_unlink_inode()
    binfmt_misc: ->s_root is not going anywhere
    drop redundant ->owner initializations
    ufs: get rid of redundant checks
    orangefs: constify inode_operations
    missed comment updates from ->direct_IO() prototype change
    file_inode(f)->i_mapping is f->f_mapping
    trim fsnotify hooks a bit
    9p: new helper - v9fs_parent_fid()
    debugfs: ->d_parent is never NULL or negative
    ...

    Linus Torvalds
     
  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

01 Jul, 2016

1 commit


11 Jun, 2016

1 commit

  • We always mixed in the parent pointer into the dentry name hash, but we
    did it late at lookup time. It turns out that we can simplify that
    lookup-time action by salting the hash with the parent pointer early
    instead of late.

    A few other users of our string hashes also wanted to mix in their own
    pointers into the hash, and those are updated to use the same mechanism.

    Hash users that don't have any particular initial salt can just use the
    NULL pointer as a no-salt.

    Cc: Vegard Nossum
    Cc: George Spelvin
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Jun, 2016

1 commit


10 May, 2016

1 commit


09 May, 2016

2 commits


08 May, 2016

1 commit

  • Payloads of NM entries are not supposed to contain NUL. When we run
    into such, only the part prior to the first NUL goes into the
    concatenation (i.e. the directory entry name being encoded by a bunch
    of NM entries). We do stop when the amount collected so far + the
    claimed amount in the current NM entry exceed 254. So far, so good,
    but what we return as the total length is the sum of *claimed*
    sizes, not the actual amount collected. And that can grow pretty
    large - not unlimited, since you'd need to put CE entries in
    between to be able to get more than the maximum that could be
    contained in one isofs directory entry / continuation chunk and
    we are stop once we'd encountered 32 CEs, but you can get about 8Kb
    easily. And that's what will be passed to readdir callback as the
    name length. 8Kb __copy_to_user() from a buffer allocated by
    __get_free_page()

    Cc: stable@vger.kernel.org # 0.98pl6+ (yes, really)
    Signed-off-by: Al Viro

    Al Viro
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

09 Dec, 2015

1 commit

  • kmap() in page_follow_link_light() needed to go - allowing to hold
    an arbitrary number of kmaps for long is a great way to deadlocking
    the system.

    new helper (inode_nohighmem(inode)) needs to be used for pagecache
    symlinks inodes; done for all in-tree cases. page_follow_link_light()
    instrumented to yell about anything missed.

    Signed-off-by: Al Viro

    Al Viro
     

16 Apr, 2015

1 commit


07 Jan, 2015

1 commit


19 Dec, 2014

1 commit

  • We didn't check length of rock ridge ER records before printing them.
    Thus corrupted isofs image can cause us to access and print some memory
    behind the buffer with obvious consequences.

    Reported-and-tested-by: Carl Henrik Lunde
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

17 Dec, 2014

1 commit


15 Dec, 2014

1 commit

  • Rock Ridge extensions define so called Continuation Entries (CE) which
    define where is further space with Rock Ridge data. Corrupted isofs
    image can contain arbitrarily long chain of these, including a one
    containing loop and thus causing kernel to end in an infinite loop when
    traversing these entries.

    Limit the traversal to 32 entries which should be more than enough space
    to store all the Rock Ridge data.

    Reported-by: P J P
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

20 Nov, 2014

1 commit

  • With the isofs_hash() function removed, isofs_hash_ms() is the only user
    of isofs_hash_common(), but it's defined inside of an #ifdef, which triggers
    this gcc warning in ARM axm55xx_defconfig starting with v3.18-rc3:

    fs/isofs/inode.c:177:1: warning: 'isofs_hash_common' defined but not used [-Wunused-function]

    This patch moves the function inside of the same #ifdef section to avoid that
    warning, which seems the best compromise of a relatively harmless patch for
    a late -rc.

    Signed-off-by: Arnd Bergmann
    Fixes: b0afd8e5db7b ("isofs: don't bother with ->d_op for normal case")
    Signed-off-by: Al Viro

    Arnd Bergmann
     

31 Oct, 2014

1 commit


29 Oct, 2014

1 commit


14 Oct, 2014

1 commit

  • The kernel used to contain two functions for length-delimited,
    case-insensitive string comparison, strnicmp with correct semantics and
    a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp
    was renamed to strncasecmp, and strnicmp made into a wrapper for the new
    strncasecmp to avoid breaking existing users.

    To allow the compat wrapper strnicmp to be removed at some point in the
    future, and to avoid the extra indirection cost, do
    s/strnicmp/strncasecmp/g.

    Signed-off-by: Rasmus Villemoes
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

20 Aug, 2014

1 commit

  • We did not check relocated directory in any way when processing Rock
    Ridge 'CL' tag. Thus a corrupted isofs image can possibly have a CL
    entry pointing to another CL entry leading to possibly unbounded
    recursion in kernel code and thus stack overflow or deadlocks (if there
    is a loop created from CL entries).

    Fix the problem by not allowing CL entry to point to a directory entry
    with CL entry (such use makes no good sense anyway) and by checking
    whether CL entry doesn't point to itself.

    CC: stable@vger.kernel.org
    Reported-by: Chris Evans
    Signed-off-by: Jan Kara

    Jan Kara
     

09 Aug, 2014

1 commit

  • Now with 64bit bzImage and kexec tools, we support ramdisk that size is
    bigger than 2g, as we could put it above 4G.

    Found compressed initramfs image could not be decompressed properly. It
    turns out that image length is int during decompress detection, and it
    will become < 0 when length is more than 2G. Furthermore, during
    decompressing len as int is used for inbuf count, that has problem too.

    Change len to long, that should be ok as on 32 bit platform long is
    32bits.

    Tested with following compressed initramfs image as root with kexec.
    gzip, bzip2, xz, lzma, lzop, lz4.
    run time for populate_rootfs():
    size name Nehalem-EX Westmere-EX Ivybridge-EX
    9034400256 root_img : 26s 24s 30s
    3561095057 root_img.lz4 : 28s 27s 27s
    3459554629 root_img.lzo : 29s 29s 28s
    3219399480 root_img.gz : 64s 62s 49s
    2251594592 root_img.xz : 262s 260s 183s
    2226366598 root_img.lzma: 386s 376s 277s
    2901482513 root_img.bz2 : 635s 599s

    Signed-off-by: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Rashika Kheria
    Cc: Josh Triplett
    Cc: Kyungsik Lee
    Cc: P J P
    Cc: Al Viro
    Cc: Tetsuo Handa
    Cc: "Daniel M. Weeks"
    Cc: Alexandre Courbot
    Cc: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

08 Apr, 2014

1 commit

  • Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
    "various cleanups for ext2, ext3, udf, isofs, a documentation update
    for quota, and a fix of a race in reiserfs readdir implementation"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    reiserfs: fix race in readdir
    ext2: acl: remove unneeded include of linux/capability.h
    ext3: explicitly remove inode from orphan list after failed direct io
    fs/isofs/inode.c add __init to init_inodecache()
    ext3: Speedup WB_SYNC_ALL pass
    fs/quota/Kconfig: Update filesystems
    ext3: Update outdated comment before ext3_ordered_writepage()
    ext3: Update PF_MEMALLOC handling in ext3_write_inode()
    ext2/3: use prandom_u32() instead of get_random_bytes()
    ext3: remove an unneeded check in ext3_new_blocks()
    ext3: remove unneeded check in ext3_ordered_writepage()
    fs: Mark function as static in ext3/xattr_security.c
    fs: Mark function as static in ext3/dir.c
    fs: Mark function as static in ext2/xattr_security.c
    ext3: Add __init macro to init_inodecache
    ext2: Add __init macro to init_inodecache
    udf: Add __init macro to init_inodecache
    fs: udf: parse_options: blocksize check

    Linus Torvalds
     

13 Mar, 2014

2 commits

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     
  • init_inodecache is only called by __init init_iso9660_fs

    Signed-off-by: Fabian Frederick
    Signed-off-by: Jan Kara

    Fabian Frederick
     

25 Oct, 2013

1 commit


01 Aug, 2013

1 commit

  • Refuse RW mount of isofs filesystem. So far we just silently changed it
    to RO mount but when the media is writeable, block layer won't notice
    this change and thus will think device is used RW and will block eject
    button of the drive. That is unexpected by users because for
    non-writeable media eject button works just fine.

    Userspace mount(8) command handles this just fine and retries mounting
    with MS_RDONLY set so userspace shouldn't see any regression. Plus any
    tool mounting isofs is likely confronted with the case of read-only
    media where block layer already refuses to mount the filesystem without
    MS_RDONLY set so our behavior shouldn't be anything new for it.

    Reported-by: Hui Wang
    Signed-off-by: Jan Kara

    Jan Kara
     

29 Jun, 2013

2 commits

  • Instances either don't look at it at all (the majority of cases) or
    only want it to find the superblock (which can be had as dentry->d_sb).
    A few cases that want more are actually safe with dentry->d_inode -
    the only precaution needed is the check that it hadn't been replaced with
    NULL by rmdir() or by overwriting rename(), which case should be simply
    treated as cache miss.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Al Viro

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     

13 Mar, 2013

1 commit

  • I had assumed that the only use of module aliases for filesystems
    prior to "fs: Limit sys_mount to only request filesystem modules."
    was in request_module. It turns out I was wrong. At least mkinitcpio
    in Arch linux uses these aliases.

    So readd the preexising aliases, to keep from breaking userspace.

    Userspace eventually will have to follow and use the same aliases the
    kernel does. So at some point we may be delete these aliases without
    problems. However that day is not today.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

26 Feb, 2013

1 commit


23 Feb, 2013

1 commit


10 Oct, 2012

1 commit

  • Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
    u64 inum = fid->raw[2];
    which is unhelpfully reported as at the end of shmem_alloc_inode():

    BUG: unable to handle kernel paging request at ffff880061cd3000
    IP: [] shmem_alloc_inode+0x40/0x40
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Call Trace:
    [] ? exportfs_decode_fh+0x79/0x2d0
    [] do_handle_open+0x163/0x2c0
    [] sys_open_by_handle_at+0xc/0x10
    [] tracesys+0xe1/0xe6

    Right, tmpfs is being stupid to access fid->raw[2] before validating that
    fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
    fall at the end of a page, and the next page not be present.

    But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
    careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
    could oops in the same way: add the missing fh_len checks to those.

    Reported-by: Sasha Levin
    Signed-off-by: Hugh Dickins
    Cc: Al Viro
    Cc: Sage Weil
    Cc: Steven Whitehouse
    Cc: Christoph Hellwig
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Hugh Dickins