04 Oct, 2018

1 commit

  • [ Upstream commit 826d7bc9f013d01e92997883d2fd0c25f4af1f1c ]

    If the flock owner process is dead and its pid has been already freed,
    pid translation won't work, but we still want to show flock owner pid
    number when expecting /proc/$PID/fdinfo/$FD in init pidns.

    Reproducer:
    process A process A1 process A2
    fork()--------->
    exit() open()
    flock()
    fork()--------->
    exit() sleep()

    Before the patch:
    ================
    (root@vz7)/: cat /proc/${PID_A2}/fdinfo/3
    pos: 4
    flags: 02100002
    mnt_id: 257
    lock: (root@vz7)/:

    After the patch:
    ===============
    (root@vz7)/:cat /proc/${PID_A2}/fdinfo/3
    pos: 4
    flags: 02100002
    mnt_id: 295
    lock: 1: FLOCK ADVISORY WRITE ${PID_A1} b6:f8a61:529946 0 EOF

    Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks")
    Signed-off-by: Konstantin Khorenko
    Acked-by: Andrey Vagin
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Jeff Layton
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khorenko
     

22 Jul, 2017

1 commit

  • When locks.c moved to using file_lock_context, the check for any locks that
    were not released was moved from the __fput() to destroy_inode() path in
    commit 8634b51f6ca2 ("locks: convert lease handling to file_lock_context").
    This warning has been quite useful for catching bugs, particularly in NFS
    where lock handling still sees some churn.

    Let's bring back the warning for leaked locks on __fput, as this warning is
    much more likely to be seen and reported by users.

    Signed-off-by: Benjamin Coddington
    Signed-off-by: Jeff Layton

    Benjamin Coddington
     

16 Jul, 2017

2 commits

  • Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be
    atomic with the stateid update", NFSv4 has been inserting locks in rpciod
    worker context. The result is that the file_lock's fl_nspid is the
    kworker's pid instead of the original userspace pid.

    The fl_nspid is only used to represent the namespaced virtual pid number
    when displaying locks or returning from F_GETLK. There's no reason to set
    it for every inserted lock, since we can usually just look it up from
    fl_pid. So, instead of looking up and holding struct pid for every lock,
    let's just look up the virtual pid number from fl_pid when it is needed.
    That means we can remove fl_nspid entirely.

    The translaton and presentation of fl_pid should handle the following four
    cases:

    1 - F_GETLK on a remote file with a remote lock:
    In this case, the filesystem should determine the l_pid to return here.
    Filesystems should indicate that the fl_pid represents a non-local pid
    value that should not be translated by returning an fl_pid
    Signed-off-by: Jeff Layton

    Benjamin Coddington
     
  • Struct file_lock is fairly large, so let's save some space on the stack by
    using an allocation for struct file_lock in fcntl_getlk(), just as we do
    for fcntl_setlk().

    Signed-off-by: Benjamin Coddington
    Signed-off-by: Jeff Layton

    Benjamin Coddington
     

27 May, 2017

2 commits


21 Apr, 2017

1 commit

  • Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks.
    NFS will check for this flag to ensure an unlock is sent in a following
    patch.

    Fuse handles flock and posix locks differently for FL_CLOSE, and so
    requires a fixup to retain the existing behavior for flock.

    Signed-off-by: Benjamin Coddington
    Reviewed-by: Jeff Layton
    Acked-by: Miklos Szeredi
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

25 Dec, 2016

1 commit


18 Oct, 2016

1 commit

  • I overlooked a few code-paths that can lead to
    locks_delete_global_locks().

    Reported-by: Dmitry Vyukov
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Jeff Layton
    Cc: Al Viro
    Cc: Bruce Fields
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-fsdevel@vger.kernel.org
    Cc: syzkaller
    Link: http://lkml.kernel.org/r/20161008081228.GF3142@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Oct, 2016

2 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

05 Oct, 2016

1 commit


28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

22 Sep, 2016

3 commits

  • Avoid spurious preemption.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Cc: der.herr@hofr.at
    Cc: paulmck@linux.vnet.ibm.com
    Cc: riel@redhat.com
    Cc: tj@kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • As Oleg suggested, replace file_lock_list with a structure containing
    the hlist head and a spinlock.

    This completely removes the lglock from fs/locks.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Cc: der.herr@hofr.at
    Cc: paulmck@linux.vnet.ibm.com
    Cc: riel@redhat.com
    Cc: tj@kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Replace the global part of the lglock with a percpu-rwsem.

    Since fcl_lock is a spinlock and itself nests under i_lock, which too
    is a spinlock we cannot acquire sleeping locks at
    locks_{insert,remove}_global_locks().

    We can however wrap all fcl_lock acquisitions with percpu_down_read
    such that all invocations of locks_{insert,remove}_global_locks() have
    that read lock held.

    This allows us to replace the lg_global part of the lglock with the
    write side of the rwsem.

    In the absense of writers, percpu_{down,up}_read() are free of atomic
    instructions. This further avoids the very long preempt-disable
    regions caused by lglock on larger machines.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Cc: der.herr@hofr.at
    Cc: paulmck@linux.vnet.ibm.com
    Cc: riel@redhat.com
    Cc: tj@kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

16 Sep, 2016

2 commits

  • The problem with writecount is: we want consistent handling of it for
    underlying filesystems as well as overlayfs. Making sure i_writecount is
    correct on all layers is difficult. Instead this patch makes sure that
    when write access is acquired, it's always done on the underlying writable
    layer (called the upper layer). We must also make sure to look at the
    writecount on this layer when checking for conflicting leases.

    Open for write already updates the upper layer's writecount. Leaving only
    truncate.

    For truncate copy up must happen before get_write_access() so that the
    writecount is updated on the upper layer. Problem with this is if
    something fails after that, then copy-up was done needlessly. E.g. if
    break_lease() was interrupted. Probably not a big deal in practice.

    Another interesting case is if there's a denywrite on a lower file that is
    then opened for write or truncated. With this patch these will succeed,
    which is somewhat counterintuitive. But I think it's still acceptable,
    considering that the copy-up does actually create a different file, so the
    old, denywrite mapping won't be touched.

    On non-overlayfs d_real() is an identity function and d_real_inode() is
    equivalent to d_inode() so this patch doesn't change behavior in that case.

    Signed-off-by: Miklos Szeredi
    Acked-by: Jeff Layton
    Cc: "J. Bruce Fields"

    Miklos Szeredi
     
  • This patch allows flock, posix locks, ofd locks and leases to work
    correctly on overlayfs.

    Instead of using the underlying inode for storing lock context use the
    overlay inode. This allows locks to be persistent across copy-up.

    This is done by introducing locks_inode() helper and using it instead of
    file_inode() to get the inode in locking code. For non-overlayfs the two
    are equivalent, except for an extra pointer dereference in locks_inode().

    Since lock operations are in "struct file_operations" we must also make
    sure not to call underlying filesystem's lock operations. Introcude a
    super block flag MS_NOREMOTELOCK to this effect.

    Signed-off-by: Miklos Szeredi
    Acked-by: Jeff Layton
    Cc: "J. Bruce Fields"

    Miklos Szeredi
     

19 Aug, 2016

1 commit

  • On busy container servers reading /proc/locks shows all the locks
    created by all clients. This can cause large latency spikes. In my
    case I observed lsof taking up to 5-10 seconds while processing around
    50k locks. Fix this by limiting the locks shown only to those created
    in the same pidns as the one the proc fs was mounted in. When reading
    /proc/locks from the init_pid_ns proc instance then perform no
    filtering

    [ jlayton: reformat comments for 80 columns ]

    Signed-off-by: Nikolay Borisov
    Suggested-by: Eric W. Biederman
    Signed-off-by: Jeff Layton

    Nikolay Borisov
     

01 Jul, 2016

1 commit

  • (Another one for the f_path debacle.)

    ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.

    The reason is that generic_add_lease() used filp->f_path.dentry->inode
    while all the others use file_inode(). This makes a difference for files
    opened on overlayfs since the former will point to the overlay inode the
    latter to the underlying inode.

    So generic_add_lease() added the lease to the overlay inode and
    generic_delete_lease() removed it from the underlying inode. When the file
    was released the lease remained on the overlay inode's lock list, resulting
    in use after free.

    Reported-by: Eryu Guan
    Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
    Cc:
    Signed-off-by: Miklos Szeredi
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Miklos Szeredi
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

13 Jan, 2016

1 commit

  • Pull vfs copy_file_range updates from Al Viro:
    "Several series around copy_file_range/CLONE"

    * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    btrfs: use new dedupe data function pointer
    vfs: hoist the btrfs deduplication ioctl to the vfs
    vfs: wire up compat ioctl for CLONE/CLONE_RANGE
    cifs: avoid unused variable and label
    nfsd: implement the NFSv4.2 CLONE operation
    nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
    vfs: pull btrfs clone API to vfs layer
    locks: new locks_mandatory_area calling convention
    vfs: Add vfs_copy_file_range() support for pagecache copies
    btrfs: add .copy_file_range file operation
    x86: add sys_copy_file_range to syscall tables
    vfs: add copy_file_range syscall and vfs helper

    Linus Torvalds
     

09 Jan, 2016

5 commits


08 Jan, 2016

1 commit

  • Dmitry reported that he was able to reproduce the WARN_ON_ONCE that
    fires in locks_free_lock_context when the flc_posix list isn't empty.

    The problem turns out to be that we're basically rebuilding the
    file_lock from scratch in fcntl_setlk when we discover that the setlk
    has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END,
    then we may end up with fl_start and fl_end values that differ from
    when the lock was initially set, if the file position or length of the
    file has changed in the interim.

    Fix this by just reusing the same lock request structure, and simply
    override fl_type value with F_UNLCK as appropriate. That ensures that
    we really are unlocking the lock that was initially set.

    While we're there, make sure that we do pop a WARN_ON_ONCE if the
    removal ever fails. Also return -EBADF in this event, since that's
    what we would have returned if the close had happened earlier.

    Cc: Alexander Viro
    Cc:
    Fixes: c293621bbf67 (stale POSIX lock handling)
    Reported-by: Dmitry Vyukov
    Signed-off-by: Jeff Layton
    Acked-by: "J. Bruce Fields"

    Jeff Layton
     

18 Dec, 2015

1 commit

  • The Kconfig currently controlling compilation of this code is:

    config FILE_LOCKING
    bool "Enable POSIX file locking API" if EXPERT

    ...meaning that it currently is not being built as a module by anyone.

    Lets remove the couple traces of modularity so that when reading the
    driver there is no doubt it is builtin-only.

    Since module_init translates to device_initcall in the non-modular
    case, the init ordering gets bumped to one level earlier when we
    use the more appropriate fs_initcall here. However we've made similar
    changes before without any fallout and none is expected here either.

    Cc: Jeff Layton
    Acked-by: Jeff Layton
    Cc: "J. Bruce Fields"
    Cc: Alexander Viro
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Jeff Layton

    Paul Gortmaker
     

08 Dec, 2015

1 commit


18 Nov, 2015

1 commit


16 Nov, 2015

1 commit

  • Mandatory locking appears to be almost unused and buggy and there
    appears no real interest in doing anything with it. Since effectively
    no one uses the code and since the code is buggy let's allow it to be
    disabled at compile time. I would just suggest removing the code but
    undoubtedly that will break some piece of userspace code somewhere.

    For the distributions that don't care about this piece of code
    this gives a nice starting point to make mandatory locking go away.

    Cc: Benjamin Coddington
    Cc: Dmitry Vyukov
    Cc: Jeff Layton
    Cc: J. Bruce Fields
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Jeff Layton

    Jeff Layton
     

23 Oct, 2015

3 commits


15 Oct, 2015

1 commit


21 Sep, 2015

1 commit

  • locks_get_lock_context() uses cmpxchg() to install i_flctx.
    cmpxchg() is a release operation which is correct. But it uses
    a plain load to load i_flctx. This is incorrect. Subsequent loads
    from i_flctx can hoist above the load of i_flctx pointer itself
    and observe uninitialized garbage there. This in turn can lead
    to corruption of ctx->flc_lock and other members.

    Documentation/memory-barriers.txt explicitly requires to use
    a barrier in such context:
    "A load-load control dependency requires a full read memory barrier".

    Use smp_load_acquire() in locks_get_lock_context() and in bunch
    of other functions that can proceed concurrently with
    locks_get_lock_context().

    The data race was found with KernelThreadSanitizer (KTSAN).

    Signed-off-by: Dmitry Vyukov
    Signed-off-by: Jeff Layton

    Dmitry Vyukov
     

01 Sep, 2015

1 commit


13 Jul, 2015

2 commits