11 Oct, 2016

4 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

08 Oct, 2016

2 commits

  • These inode operations are no longer used; remove them.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Pull VFS splice updates from Al Viro:
    "There's a bunch of branches this cycle, both mine and from other folks
    and I'd rather send pull requests separately.

    This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
    (and introduction of such). Gets rid of a lot of code in fs/splice.c
    and elsewhere; there will be followups, but these are for the next
    cycle... Some pipe/splice-related cleanups from Miklos in the same
    branch as well"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    pipe: fix comment in pipe_buf_operations
    pipe: add pipe_buf_steal() helper
    pipe: add pipe_buf_confirm() helper
    pipe: add pipe_buf_release() helper
    pipe: add pipe_buf_get() helper
    relay: simplify relay_file_read()
    switch default_file_splice_read() to use of pipe-backed iov_iter
    switch generic_file_splice_read() to use of ->read_iter()
    new iov_iter flavour: pipe-backed
    fuse_dev_splice_read(): switch to add_to_pipe()
    skb_splice_bits(): get rid of callback
    new helper: add_to_pipe()
    splice: lift pipe_lock out of splice_to_pipe()
    splice: switch get_iovec_page_array() to iov_iter
    splice_to_pipe(): don't open-code wakeup_pipe_readers()
    consistent treatment of EFAULT on O_DIRECT read/write

    Linus Torvalds
     

06 Oct, 2016

1 commit


05 Oct, 2016

1 commit

  • Pull gfs2 updates from Bob Peterson:
    "We've only got six GFS2 patches for this merge window. In patch
    order:

    - Fabian Frederick submitted a nice cleanup that uses the BIT macro
    rather than bit shifting.

    - Andreas Gruenbacher contributed a patch that fixes a long-standing
    annoyance whereby GFS2 warned about dirty pages.

    - Andreas also fixed a problem with the recent extended attribute
    readahead feature.

    - Chao Yu contributed a patch that checks the return code from
    function register_shrinker and reacts accordingly. Previously, it
    was not checked.

    - Andreas Gruenbacher also fixed a problem whereby incore file
    timestamps were forgotten if the file was invalidated. This merely
    moves the assignment inside the inode glock where it belongs.

    - Andreas also fixed a problem where incore timestamps were not
    initialized"

    * tag 'gfs2-4.8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: Initialize atime of I_NEW inodes
    gfs2: Update file times after grabbing glock
    gfs2: fix to detect failure of register_shrinker
    gfs2: Fix extended attribute readahead optimization
    gfs2: Remove dirty buffer warning from gfs2_releasepage
    GFS2: use BIT() macro

    Linus Torvalds
     

28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

27 Sep, 2016

3 commits


22 Sep, 2016

3 commits

  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • When file permissions are modified via chmod(2) and the user is not in
    the owning group or capable of CAP_FSETID, the setgid bit is cleared in
    inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
    permissions as well as the new ACL, but doesn't clear the setgid bit in
    a similar way; this allows to bypass the check in chmod(2). Fix that.

    References: CVE-2016-7097
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: Jan Kara
    Signed-off-by: Andreas Gruenbacher

    Jan Kara
     
  • register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
    deferred work"), we should detect the failure of it, otherwise we may
    fail to register shrinker after gfs2 module was been inited successfully.

    Signed-off-by: Chao Yu
    Signed-off-by: Bob Peterson

    Chao Yu
     

19 Aug, 2016

1 commit

  • Commit 39b0555f didn't check for a failing bio_add_page in
    gfs2_submit_bhs. This could cause I/O requests to get lost, and the
    affected buffer heads to stay locked forever. Fix that by submitting
    the current bio and allocating another one when bio_add_page fails. (It
    is guaranteed that we can at least add one page to a bio.)

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

18 Aug, 2016

1 commit

  • Unlike what its documentation suggests, the releasepage address space
    operation can currently be called on dirty pages via shrink_active_list.
    This may eventually be changed when the remaining code relying on the
    current behavior has been fixed, but until then, it makes no sense to
    warn on dirty buffers in gfs2_releasepage.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

07 Aug, 2016

1 commit

  • In most cases, EPERM is returned on immutable inode, and there're only a
    few places returning EACCES. I noticed this when running LTP on
    overlayfs, setxattr03 failed due to unexpected EACCES on immutable
    inode.

    So converting all EACCES to EPERM on immutable inode.

    Acked-by: Dave Chinner
    Signed-off-by: Eryu Guan
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Eryu Guan
     

03 Aug, 2016

1 commit

  • Replace 1 << value shift by more explicit BIT() macro

    Also fixes two bare unsigned definitions:

    WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
    + unsigned hsize = BIT(ip->i_depth);

    Signed-off-by: Fabian Frederick
    Signed-off-by: Bob Peterson

    Fabian Frederick
     

27 Jul, 2016

2 commits

  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     

25 Jul, 2016

1 commit

  • Pull gfs2 updates from Bob Peterson:
    "We've got ten patches this time, half of which are related to a
    plethora of nasty outcomes when inodes are transitioned from the
    unlinked state to the free state. Small file systems are particularly
    vulnerable to these problems, and it can manifest as mainly hangs, but
    also file system corruption. The patches have been tested for
    literally many weeks, with a very gruelling test, so I have a high
    level of confidence.

    - Andreas Gruenbacher wrote a series of five patches for various
    lockups during the transition of inodes from unlinked to free.

    The main patch is titled "Fix gfs2_lookup_by_inum lock inversion"
    and the other four are support and cleanup patches related to that.

    - Ben Marzinski contributed two patches with regard to a recreatable
    problem when gfs2 tries to write a page to a file that is being
    truncated, resulting in a BUG() in gfs2_remove_from_journal.

    Note that Ben had to export vfs function __block_write_full_page to
    get this to work properly. It's been posted a long time and he
    talked to various VFS people about it, and nobody seemed to mind.

    - I contributed 3 patches:
    o The first one fixes a memory corruptor: a race in which one
    process can overwrite the gl_object pointer set by another
    process, causing kernel panic and other symptoms.
    o The second patch fixes another race that resulted in a
    false-positive BUG_ON. This occurred when resource group
    reservations were freed by one process while another process
    was trying to grab a new reservation in the same resource
    group.
    o The third patch fixes a problem with doing journal replay when
    the journals are not all the same size"

    * tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes
    GFS2: Check rs_free with rd_rsspin protection
    gfs2: writeout truncated pages
    fs: export __block_write_full_page
    gfs2: Lock holder cleanup
    gfs2: Large-filesystem fix for 32-bit systems
    gfs2: Get rid of gfs2_ilookup
    gfs2: Fix gfs2_lookup_by_inum lock inversion
    gfs2: Initialize iopen glock holder for new inodes
    GFS2: don't set rgrp gl_object until it's inserted into rgrp tree

    Linus Torvalds
     

22 Jul, 2016

1 commit

  • Before this patch, if you used gfs2_jadd to add new journals of a
    size smaller than the existing journals, replaying those new journals
    would withdraw. That's because function gfs2_replay_incr_blk was
    using the number of journal blocks (jd_block) from the superblock's
    journal pointer. In other words, "My journal's max size" rather than
    "the journal we're replaying's size." This patch changes the function
    to use the size of the pertinent journal rather than always using the
    journal we happen to be using.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

21 Jul, 2016

1 commit

  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

13 Jul, 2016

1 commit

  • For the last process to close a file opened for write, function
    gfs2_rsqa_delete was deleting the file's inode's block reservation
    out of the rgrp reservations tree. Then it was checking to make sure
    rs_free was 0, but it was performing the check outside the protection
    of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed
    to prevent a race between the process freeing the reservation and
    another who is allocating a new set of blocks inside the same rgrp
    for the same inode, thus changing its value.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

06 Jul, 2016

1 commit


27 Jun, 2016

5 commits

  • When gfs2 attempts to write a page to a file that is being truncated,
    and notices that the page is completely outside of the file size, it
    tries to invalidate it. However, this may require a transaction for
    journaled data files to revoke any buffers from the page on the active
    items list. Unfortunately, this can happen inside a log flush, where a
    transaction cannot be started. Also, gfs2 may need to be able to remove
    the buffer from the ail1 list before it can finish the log flush.

    To deal with this, when writing a page of a file with data journalling
    enabled gfs2 now skips the check to see if the write is outside the file
    size, and simply writes it anyway. This situation can only occur when
    the truncate code still has the file locked exclusively, and hasn't
    marked this block as free in the metadata (which happens later in
    truc_dealloc). After gfs2 writes this page out, the truncation code
    will shortly invalidate it and write out any revokes if necessary.

    To do this, gfs2 now implements its own version of block_write_full_page
    without the check, and calls the newly exported __block_write_full_page.
    It also no longer calls gfs2_writepage_common from gfs2_jdata_writepage.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Bob Peterson

    Benjamin Marzinski
     
  • Make the code more readable by cleaning up the different ways of
    initializing lock holders and checking for initialized lock holders:
    mark lock holders as uninitialized by setting the holder's glock to NULL
    (gfs2_holder_mark_uninitialized) instead of zeroing out the entire
    object or using a separate flag. Recognize initialized holders by their
    non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects
    which are immeditiately initialized via gfs2_holder_init or
    gfs2_glock_nq_init.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     
  • Commit ff34245d switched from iget5_locked to iget_locked among other
    things, but iget_locked doesn't work for filesystems larger than 2^32
    blocks on 32-bit systems. Switch back to iget5_locked. Filesystems
    larger than 2^32 blocks are unrealistic to work well on 32-bit systems,
    so this is mostly a code cleanliness fix.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     
  • Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
    (and not for cached inodes anymore), there no longer is a need to
    optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
    and gfs2_ilookup can be removed.

    In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
    i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
    inconsistency goes away as well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     
  • The current gfs2_lookup_by_inum takes the glock of a presumed inode
    identified by block number, verifies that the block is indeed an inode,
    and then instantiates and reads the new inode via gfs2_inode_lookup.

    However, instantiating a new inode may block on freeing a previous
    instance of that inode (__wait_on_freeing_inode), and freeing an inode
    requires to take the glock already held, leading to lock inversion and
    deadlock.

    Fix this by first instantiating the new inode, then verifying that the
    block is an inode (if required), and then reading in the new inode, all
    in gfs2_inode_lookup.

    If the block we are looking for is not an inode, we discard the new
    inode via iget_failed, which marks inodes as bad and unhashes them.
    Other tasks waiting on that inode will get back a bad inode back from
    ilookup or iget_locked; in that case, retry the lookup.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

17 Jun, 2016

1 commit


10 Jun, 2016

1 commit

  • Before this patch, function read_rindex_entry would set a rgrp
    glock's gl_object pointer to itself before inserting the rgrp into
    the rgrp rbtree. The problem is: if another process was also reading
    the rgrp in, and had already inserted its newly created rgrp, then
    the second call to read_rindex_entry would overwrite that value,
    then return a bad return code to the caller. Later, other functions
    would reference the now-freed rgrp memory by way of gl_object.
    In some cases, that could result in gfs2_rgrp_brelse being called
    twice for the same rgrp: once for the failed attempt and once for
    the "real" rgrp release. Eventually the kernel would panic.
    There are also a number of other things that could go wrong when
    a kernel module is accessing freed storage. For example, this could
    result in rgrp corruption because the fake rgrp would point to a
    fake bitmap in memory too, causing gfs2_inplace_reserve to search
    some random memory for free blocks, and find some, since we were
    never setting rgd->rd_bits to NULL before freeing it.

    This patch fixes the problem by not setting gl_object until we
    have successfully inserted the rgrp into the rbtree. Also, it sets
    rd_bits to NULL as it frees them, which will ensure any accidental
    access to the wrong rgrp will result in a kernel panic rather than
    file system corruption, which is preferred.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

08 Jun, 2016

4 commits


28 May, 2016

3 commits

  • Pull vfs fixes from Al Viro:
    "Followups to the parallel lookup work:

    - update docs

    - restore killability of the places that used to take ->i_mutex
    killably now that we have down_write_killable() merged

    - Additionally, it turns out that I missed a prerequisite for
    security_d_instantiate() stuff - ->getxattr() wasn't the only thing
    that could be called before dentry is attached to inode; with smack
    we needed the same treatment applied to ->setxattr() as well"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch ->setxattr() to passing dentry and inode separately
    switch xattr_handler->set() to passing dentry and inode separately
    restore killability of old mutex_lock_killable(&inode->i_mutex) users
    add down_write_killable_nested()
    update D/f/directory-locking

    Linus Torvalds
     
  • Most users of IS_ERR_VALUE() in the kernel are wrong, as they
    pass an 'int' into a function that takes an 'unsigned long'
    argument. This happens to work because the type is sign-extended
    on 64-bit architectures before it gets converted into an
    unsigned type.

    However, anything that passes an 'unsigned short' or 'unsigned int'
    argument into IS_ERR_VALUE() is guaranteed to be broken, as are
    8-bit integers and types that are wider than 'unsigned long'.

    Andrzej Hajda has already fixed a lot of the worst abusers that
    were causing actual bugs, but it would be nice to prevent any
    users that are not passing 'unsigned long' arguments.

    This patch changes all users of IS_ERR_VALUE() that I could find
    on 32-bit ARM randconfig builds and x86 allmodconfig. For the
    moment, this doesn't change the definition of IS_ERR_VALUE()
    because there are probably still architecture specific users
    elsewhere.

    Almost all the warnings I got are for files that are better off
    using 'if (err)' or 'if (err < 0)'.
    The only legitimate user I could find that we get a warning for
    is the (32-bit only) freescale fman driver, so I did not remove
    the IS_ERR_VALUE() there but changed the type to 'unsigned long'.
    For 9pfs, I just worked around one user whose calling conventions
    are so obscure that I did not dare change the behavior.

    I was using this definition for testing:

    #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \
    unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO))

    which ends up making all 16-bit or wider types work correctly with
    the most plausible interpretation of what IS_ERR_VALUE() was supposed
    to return according to its users, but also causes a compile-time
    warning for any users that do not pass an 'unsigned long' argument.

    I suggested this approach earlier this year, but back then we ended
    up deciding to just fix the users that are obviously broken. After
    the initial warning that caused me to get involved in the discussion
    (fs/gfs2/dir.c) showed up again in the mainline kernel, Linus
    asked me to send the whole thing again.

    [ Updated the 9p parts as per Al Viro - Linus ]

    Signed-off-by: Arnd Bergmann
    Cc: Andrzej Hajda
    Cc: Andrew Morton
    Link: https://lkml.org/lkml/2016/1/7/363
    Link: https://lkml.org/lkml/2016/5/27/486
    Acked-by: Srinivas Kandagatla # For nvmem part
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • preparation for similar switch in ->setxattr() (see the next commit for
    rationale).

    Signed-off-by: Al Viro

    Al Viro