25 Mar, 2020

2 commits

  • [ Upstream commit dcf23ac3e846ca0cf626c155a0e3fcbbcf4fae8a ]

    There is measurable performance impact in some synthetic tests due to
    commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when
    wakeup a waiter). Fix the race condition instead by clearing the
    fl_blocker pointer after the wake_up, using explicit acquire/release
    semantics.

    This does mean that we can no longer use the clearing of fl_blocker as
    the wait condition, so switch the waiters over to checking whether the
    fl_blocked_member list_head is empty.

    Reviewed-by: yangerkun
    Reviewed-by: NeilBrown
    Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
    Signed-off-by: Jeff Layton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Linus Torvalds
     
  • [ Upstream commit 6d390e4b5d48ec03bb87e63cf0a2bff5f4e116da ]

    '16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the
    logic to check waiter->fl_blocker without blocked_lock_lock. And it will
    trigger a UAF when we try to wakeup some waiter:

    Thread 1 has create a write flock a on file, and now thread 2 try to
    unlock and delete flock a, thread 3 try to add flock b on the same file.

    Thread2 Thread3
    flock syscall(create flock b)
    ...flock_lock_inode_wait
    flock_lock_inode(will insert
    our fl_blocked_member list
    to flock a's fl_blocked_requests)
    sleep
    flock syscall(unlock)
    ...flock_lock_inode_wait
    locks_delete_lock_ctx
    ...__locks_wake_up_blocks
    __locks_delete_blocks(
    b->fl_blocker = NULL)
    ...
    break by a signal
    locks_delete_block
    b->fl_blocker == NULL &&
    list_empty(&b->fl_blocked_requests)
    success, return directly
    locks_free_lock b
    wake_up(&b->fl_waiter)
    trigger UAF

    Fix it by remove this logic, and this patch may also fix CVE-2019-19769.

    Cc: stable@vger.kernel.org
    Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
    Signed-off-by: yangerkun
    Signed-off-by: Jeff Layton
    Signed-off-by: Sasha Levin

    yangerkun
     

09 Jan, 2020

1 commit


28 Sep, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new knfsd file cache, so that we don't have to open and close
    on each (NFSv2/v3) READ or WRITE. This can speed up read and write
    in some cases. It also replaces our readahead cache.

    - Prevent silent data loss on write errors, by treating write errors
    like server reboots for the purposes of write caching, thus forcing
    clients to resend their writes.

    - Tweak the code that allocates sessions to be more forgiving, so
    that NFSv4.1 mounts are less likely to hang when a server already
    has a lot of clients.

    - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
    limited only by the backend filesystem and the maximum RPC size.

    - Allow the server to enforce use of the correct kerberos credentials
    when a client reclaims state after a reboot.

    And some miscellaneous smaller bugfixes and cleanup"

    * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
    sunrpc: clean up indentation issue
    nfsd: fix nfs read eof detection
    nfsd: Make nfsd_reset_boot_verifier_locked static
    nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
    nfsd: handle drc over-allocation gracefully.
    nfsd: add support for upcall version 2
    nfsd: add a "GetVersion" upcall for nfsdcld
    nfsd: Reset the boot verifier on all write I/O errors
    nfsd: Don't garbage collect files that might contain write errors
    nfsd: Support the server resetting the boot verifier
    nfsd: nfsd_file cache entries should be per net namespace
    nfsd: eliminate an unnecessary acl size limit
    Deprecate nfsd fault injection
    nfsd: remove duplicated include from filecache.c
    nfsd: Fix the documentation for svcxdr_tmpalloc()
    nfsd: Fix up some unused variable warnings
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: rip out the raparms cache
    nfsd: have nfsd_test_lock use the nfsd_file cache
    nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
    ...

    Linus Torvalds
     

20 Aug, 2019

1 commit

  • In __break_lease(), the file lock 'new_fl' is allocated in lease_alloc().
    However, it is not deallocated in the following execution if
    smp_load_acquire() fails, leading to a memory leak bug. To fix this issue,
    free 'new_fl' before returning the error.

    Signed-off-by: Wenwen Wang
    Signed-off-by: Jeff Layton

    Wenwen Wang
     

19 Aug, 2019

2 commits

  • Have them keep an nfsd_file reference instead of a struct file.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • With the new file caching infrastructure in nfsd, we can end up holding
    files open for an indefinite period of time, even when they are still
    idle. This may prevent the kernel from handing out leases on the file,
    which is something we don't want to block.

    Fix this by running a SRCU notifier call chain whenever on any
    lease attempt. nfsd can then purge the cache for that inode before
    returning.

    Since SRCU is only conditionally compiled in, we must only define the
    new chain if it's enabled, and users of the chain must ensure that
    SRCU is enabled.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

25 Jul, 2019

1 commit

  • Since commit 778fc546f749c588aa2f ("locks: fix tracking of inprogress
    lease breaks"), leases break don't change @fl_type but modifies
    @fl_flags. However, procfs's part haven't been updated.

    Previously, for a breaking lease the target type was printed (see
    target_leasetype()), as returns fcntl(F_GETLEASE). But now it's always
    "READ", as F_UNLCK no longer means "breaking". Unlike the previous
    one, this behaviour don't provide a complete description of the lease.

    There are /proc/pid/fdinfo/ outputs for a lease (the same for READ and
    WRITE) breaked by O_WRONLY.
    -- before:
    lock: 1: LEASE BREAKING READ 2558 08:03:815793 0 EOF
    -- after:
    lock: 1: LEASE BREAKING UNLCK 2558 08:03:815793 0 EOF

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jeff Layton

    Pavel Begunkov
     

11 Jul, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new /proc/fs/nfsd/clients/ directory which exposes some
    long-requested information about NFSv4 clients (like open files)
    and allows forced revocation of client state.

    - Replace the global duplicate reply cache by a cache per network
    namespace; previously, a request in one network namespace could
    incorrectly match an entry from another, though we haven't seen
    this in production. This is the last remaining container bug that
    I'm aware of; at this point you should be able to run separate
    nfsd's in each network namespace, each with their own set of
    exports, and everything should work.

    - Cleanup and modify lock code to show the pid of lockd as the owner
    of NLM locks. This is the correct version of the bugfix originally
    attempted in b8eee0e90f97 ("lockd: Show pid of lockd for remote
    locks")"

    * tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
    nfsd: Make __get_nfsdfs_client() static
    nfsd: Make two functions static
    nfsd: Fix misuse of strlcpy
    sunrpc/cache: remove the exporting of cache_seq_next
    nfsd: decode implementation id
    nfsd: create xdr_netobj_dup helper
    nfsd: allow forced expiration of NFSv4 clients
    nfsd: create get_nfsdfs_clp helper
    nfsd4: show layout stateids
    nfsd: show lock and deleg stateids
    nfsd4: add file to display list of client's opens
    nfsd: add more information to client info file
    nfsd: escape high characters in binary data
    nfsd: copy client's address including port number to cl_addr
    nfsd4: add a client info file
    nfsd: make client/ directory names small ints
    nfsd: add nfsd/clients directory
    nfsd4: use reference count to free client
    nfsd: rename cl_refcount
    nfsd: persist nfsd filesystem across mounts
    ...

    Linus Torvalds
     

04 Jul, 2019

1 commit


19 Jun, 2019

2 commits

  • check_conflicting_open() is checking for existing fd's open for read or
    for write before allowing to take a write lease. The check that was
    implemented using i_count and d_count is an approximation that has
    several false positives. For example, overlayfs since v4.19, takes an
    extra reference on the dentry; An open with O_PATH takes a reference on
    the dentry although the file cannot be read nor written.

    Change the implementation to use i_readcount and i_writecount to
    eliminate the false positive conflicts and allow a write lease to be
    taken on an overlayfs file.

    The change of behavior with existing fd's open with O_PATH is symmetric
    w.r.t. current behavior of lease breakers - an open with O_PATH currently
    does not break a write lease.

    This increases the size of struct inode by 4 bytes on 32bit archs when
    CONFIG_FILE_LOCKING is defined and CONFIG_IMA was not already
    defined.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jeff Layton

    Amir Goldstein
     
  • Signed-off-by: Ira Weiny
    Signed-off-by: Jeff Layton

    Ira Weiny
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 May, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "This consists mostly of nfsd container work:

    Scott Mayhew revived an old api that communicates with a userspace
    daemon to manage some on-disk state that's used to track clients
    across server reboots. We've been using a usermode_helper upcall for
    that, but it's tough to run those with the right namespaces, so a
    daemon is much friendlier to container use cases.

    Trond fixed nfsd's handling of user credentials in user namespaces. He
    also contributed patches that allow containers to support different
    sets of NFS protocol versions.

    The only remaining container bug I'm aware of is that the NFS reply
    cache is shared between all containers. If anyone's aware of other
    gaps in our container support, let me know.

    The rest of this is miscellaneous bugfixes"

    * tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits)
    nfsd: update callback done processing
    locks: move checks from locks_free_lock() to locks_release_private()
    nfsd: fh_drop_write in nfsd_unlink
    nfsd: allow fh_want_write to be called twice
    nfsd: knfsd must use the container user namespace
    SUNRPC: rsi_parse() should use the current user namespace
    SUNRPC: Fix the server AUTH_UNIX userspace mappings
    lockd: Pass the user cred from knfsd when starting the lockd server
    SUNRPC: Temporary sockets should inherit the cred from their parent
    SUNRPC: Cache the process user cred in the RPC server listener
    nfsd: Allow containers to set supported nfs versions
    nfsd: Add custom rpcbind callbacks for knfsd
    SUNRPC: Allow further customisation of RPC program registration
    SUNRPC: Clean up generic dispatcher code
    SUNRPC: Add a callback to initialise server requests
    SUNRPC/nfs: Fix return value for nfs4_callback_compound()
    nfsd: handle legacy client tracking records sent by nfsdcld
    nfsd: re-order client tracking method selection
    nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld
    nfsd: un-deprecate nfsdcld
    ...

    Linus Torvalds
     

08 May, 2019

1 commit

  • …kernel/git/gustavoars/linux

    Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
    "Mark switch cases where we are expecting to fall through.

    This is part of the ongoing efforts to enable -Wimplicit-fallthrough.

    Most of them have been baking in linux-next for a whole development
    cycle. And with Stephen Rothwell's help, we've had linux-next
    nag-emails going out for newly introduced code that triggers
    -Wimplicit-fallthrough to avoid gaining more of these cases while we
    work to remove the ones that are already present.

    We are getting close to completing this work. Currently, there are
    only 32 of 2311 of these cases left to be addressed in linux-next. I'm
    auditing every case; I take a look into the code and analyze it in
    order to determine if I'm dealing with an actual bug or a false
    positive, as explained here:

    https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/

    While working on this, I've found and fixed the several missing
    break/return bugs, some of them introduced more than 5 years ago.

    Once this work is finished, we'll be able to universally enable
    "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
    entering the kernel again"

    * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
    memstick: mark expected switch fall-throughs
    drm/nouveau/nvkm: mark expected switch fall-throughs
    NFC: st21nfca: Fix fall-through warnings
    NFC: pn533: mark expected switch fall-throughs
    block: Mark expected switch fall-throughs
    ASN.1: mark expected switch fall-through
    lib/cmdline.c: mark expected switch fall-throughs
    lib: zstd: Mark expected switch fall-throughs
    scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
    scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
    scsi: ppa: mark expected switch fall-through
    scsi: osst: mark expected switch fall-throughs
    scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
    scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
    scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
    scsi: imm: mark expected switch fall-throughs
    scsi: csiostor: csio_wr: mark expected switch fall-through
    ...

    Linus Torvalds
     

24 Apr, 2019

1 commit

  • Code that allocates locks using locks_alloc_lock() will free it
    using locks_free_lock(), and will benefit from the BUG_ON()
    consistency checks therein.

    However some code (nfsd and lockd) allocate a lock embedded in
    some other data structure, and so free the lock themselves after
    calling locks_release_private(). This path does not benefit from
    the consistency checks.

    To help catch future errors, move the BUG_ON() checks to
    locks_release_private() - which locks_free_lock() already calls.
    This ensures that all users for locks will find out if the lock
    isn't detached properly before being free.

    Signed-off-by: NeilBrown
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

09 Apr, 2019

1 commit

  • In preparation to enabling -Wimplicit-fallthrough, mark switch cases
    where we are expecting to fall through.

    This patch fixes the following warnings:

    fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Warning level 3 was used: -Wimplicit-fallthrough=3

    This patch is part of the ongoing efforts to enabling
    -Wimplicit-fallthrough.

    Reviewed-by: Kees Cook
    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

25 Mar, 2019

1 commit

  • Andreas reported that he was seeing the tdbtorture test fail in some
    cases with -EDEADLCK when it wasn't before. Some debugging showed that
    deadlock detection was sometimes discovering the caller's lock request
    itself in a dependency chain.

    While we remove the request from the blocked_lock_hash prior to
    reattempting to acquire it, any locks that are blocked on that request
    will still be present in the hash and will still have their fl_blocker
    pointer set to the current request.

    This causes posix_locks_deadlock to find a deadlock dependency chain
    when it shouldn't, as a lock request cannot block itself.

    We are going to end up waking all of those blocked locks anyway when we
    go to reinsert the request back into the blocked_lock_hash, so just do
    it prior to checking for deadlocks. This ensures that any lock blocked
    on the current request will no longer be part of any blocked request
    chain.

    URL: https://bugzilla.kernel.org/show_bug.cgi?id=202975
    Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.")
    Cc: stable@vger.kernel.org
    Reported-by: Andreas Schneider
    Signed-off-by: Neil Brown
    Signed-off-by: Jeff Layton

    Jeff Layton
     

28 Feb, 2019

1 commit

  • Effective revert commit:

    87709e28dc7c ("fs/locks: Use percpu_down_read_preempt_disable()")

    This is causing major pain for PREEMPT_RT.

    Sebastian did a lot of lockperf runs on 2 and 4 node machines with all
    preemption modes (PREEMPT=n should be an obvious NOP for this patch
    and thus serves as a good control) and no results showed significance
    over 2-sigma (the PREEMPT=n results were almost empty at 1-sigma).

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jan, 2019

1 commit

  • After moving all requests from
    fl->fl_blocked_requests
    to
    new->fl_blocked_requests

    it is nonsensical to do anything to all the remaining elements, there
    aren't any. This should do something to all the requests that have been
    moved. For simplicity, it does it to all requests in the target list.

    Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests
    is "obviously correct" as it preserves the invariant of the linkage
    among requests.

    Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com
    Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.")
    Signed-off-by: NeilBrown
    Signed-off-by: Jeff Layton

    NeilBrown
     

17 Dec, 2018

1 commit


07 Dec, 2018

5 commits

  • - spaces before tabs,
    - spaces at the end of lines,
    - multiple blank lines,
    - blank lines before EXPORT_SYMBOL,
    can all go.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • posix_unblock_lock() is not specific to posix locks, and behaves
    nearly identically to locks_delete_block() - the former returning a
    status while the later doesn't.

    So discard posix_unblock_lock() and use locks_delete_block() instead,
    after giving that function an appropriate return value.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • When we find an existing lock which conflicts with a request,
    and the request wants to wait, we currently add the request
    to a list. When the lock is removed, the whole list is woken.
    This can cause the thundering-herd problem.
    To reduce the problem, we make use of the (new) fact that
    a pending request can itself have a list of blocked requests.
    When we find a conflict, we look through the existing blocked requests.
    If any one of them blocks the new request, the new request is attached
    below that request, otherwise it is added to the list of blocked
    requests, which are now known to be mutually non-conflicting.

    This way, when the lock is released, only a set of non-conflicting
    locks will be woken, the rest can stay asleep.
    If the lock request cannot be granted and the request needs to be
    requeued, all the other requests it blocks will then be woken

    To make this more concrete:

    If you have a many-core machine, and have many threads all wanting to
    briefly lock a give file (udev is known to do this), you can get quite
    poor performance.

    When one thread releases a lock, it wakes up all other threads that
    are waiting (classic thundering-herd) - one will get the lock and the
    others go to sleep.
    When you have few cores, this is not very noticeable: by the time the
    4th or 5th thread gets enough CPU time to try to claim the lock, the
    earlier threads have claimed it, done what was needed, and released.
    So with few cores, many of the threads don't end up contending.
    With 50+ cores, lost of threads can get the CPU at the same time,
    and the contention can easily be measured.

    This patchset creates a tree of pending lock requests in which siblings
    don't conflict and each lock request does conflict with its parent.
    When a lock is released, only requests which don't conflict with each
    other a woken.

    Testing shows that lock-acquisitions-per-second is now fairly stable
    even as the number of contending process goes to 1000. Without this
    patch, locks-per-second drops off steeply after a few 10s of
    processes.

    There is a small cost to this extra complexity.
    At 20 processes running a particular test on 72 cores, the lock
    acquisitions per second drops from 1.8 million to 1.4 million with
    this patch. For 100 processes, this patch still provides 1.4 million
    while without this patch there are about 700,000.

    Reported-and-tested-by: Martin Wilck
    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • posix_locks_conflict() and flock_locks_conflict() both return int.
    leases_conflict() returns bool.

    This inconsistency will cause problems for the next patch if not
    fixed.

    So change posix_locks_conflict() and flock_locks_conflict() to return
    bool.
    Also change the locks_conflict() helper.

    And convert some
    return (foo);
    to
    return foo;

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • Now that requests can block other requests, we
    need to be careful to always clean up those blocked
    requests.
    Any time that we wait for a request, we might have
    other requests attached, and when we stop waiting,
    we must clean them up.
    If the lock was granted, the requests might have been
    moved to the new lock, though when merged with a
    pre-exiting lock, this might not happen.
    In all cases we don't want blocked locks to remain
    attached, so we remove them to be safe.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Tested-by: syzbot+a4a3d526b4157113ec6a@syzkaller.appspotmail.com
    Tested-by: kernel test robot
    Signed-off-by: Jeff Layton

    NeilBrown
     

01 Dec, 2018

4 commits

  • Currently, a lock can block pending requests, but all pending
    requests are equal. If lots of pending requests are
    mutually exclusive, this means they will all be woken up
    and all but one will fail. This can hurt performance.

    So we will allow pending requests to block other requests.
    Only the first request will be woken, and it will wake the others.

    This patch doesn't implement this fully, but prepares the way.

    - It acknowledges that a request might be blocking other requests,
    and when the request is converted to a lock, those blocked
    requests are moved across.
    - When a request is requeued or discarded, all blocked requests are
    woken.
    - When deadlock-detection looks for the lock which blocks a
    given request, we follow the chain of ->fl_blocker all
    the way to the top.

    Tested-by: kernel test robot
    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • Both locks_remove_posix() and locks_remove_flock() use a
    struct file_lock without calling locks_init_lock() on it.
    This means the various list_heads are not initialized, which
    will become a problem with a later patch.

    So change them both to initialize properly. For flock locks,
    this involves using flock_make_lock(), and changing it to
    allow a file_lock to be passed in, so memory allocation isn't
    always needed.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • This functionality will be useful in future patches, so
    split it out from locks_wake_up_blocks().

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • struct file lock contains an 'fl_next' pointer which
    is used to point to the lock that this request is blocked
    waiting for. So rename it to fl_blocker.

    The fl_blocked list_head in an active lock is the head of a list of
    blocked requests. In a request it is a node in that list.
    These are two distinct uses, so replace with two list_heads
    with different names.
    fl_blocked_requests is the head of a list of blocked requests
    fl_blocked_member is a node in a member of that list.

    The two different list_heads are never used at the same time, but that
    will change in a future patch.

    Note that a tracepoint is changed to report fl_blocker instead
    of fl_next.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     

22 Aug, 2018

2 commits

  • Pull overlayfs updates from Miklos Szeredi:
    "This contains two new features:

    - Stack file operations: this allows removal of several hacks from
    the VFS, proper interaction of read-only open files with copy-up,
    possibility to implement fs modifying ioctls properly, and others.

    - Metadata only copy-up: when file is on lower layer and only
    metadata is modified (except size) then only copy up the metadata
    and continue to use the data from the lower file"

    * tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits)
    ovl: Enable metadata only feature
    ovl: Do not do metacopy only for ioctl modifying file attr
    ovl: Do not do metadata only copy-up for truncate operation
    ovl: add helper to force data copy-up
    ovl: Check redirect on index as well
    ovl: Set redirect on upper inode when it is linked
    ovl: Set redirect on metacopy files upon rename
    ovl: Do not set dentry type ORIGIN for broken hardlinks
    ovl: Add an inode flag OVL_CONST_INO
    ovl: Treat metacopy dentries as type OVL_PATH_MERGE
    ovl: Check redirects for metacopy files
    ovl: Move some dir related ovl_lookup_single() code in else block
    ovl: Do not expose metacopy only dentry from d_real()
    ovl: Open file with data except for the case of fsync
    ovl: Add helper ovl_inode_realdata()
    ovl: Store lower data inode in ovl_inode
    ovl: Fix ovl_getattr() to get number of blocks from lower
    ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
    ovl: Copy up meta inode data from lowest data inode
    ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
    ...

    Linus Torvalds
     
  • …iederm/user-namespace

    Pull core signal handling updates from Eric Biederman:
    "It was observed that a periodic timer in combination with a
    sufficiently expensive fork could prevent fork from every completing.
    This contains the changes to remove the need for that restart.

    This set of changes is split into several parts:

    - The first part makes PIDTYPE_TGID a proper pid type instead
    something only for very special cases. The part starts using
    PIDTYPE_TGID enough so that in __send_signal where signals are
    actually delivered we know if the signal is being sent to a a group
    of processes or just a single process.

    - With that prep work out of the way the logic in fork is modified so
    that fork logically makes signals received while it is running
    appear to be received after the fork completes"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
    signal: Don't send signals to tasks that don't exist
    signal: Don't restart fork when signals come in.
    fork: Have new threads join on-going signal group stops
    fork: Skip setting TIF_SIGPENDING in ptrace_init_task
    signal: Add calculate_sigpending()
    fork: Unconditionally exit if a fatal signal is pending
    fork: Move and describe why the code examines PIDNS_ADDING
    signal: Push pid type down into complete_signal.
    signal: Push pid type down into __send_signal
    signal: Push pid type down into send_signal
    signal: Pass pid type into do_send_sig_info
    signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
    signal: Pass pid type into group_send_sig_info
    signal: Pass pid and pid type into send_sigqueue
    posix-timers: Noralize good_sigevent
    signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
    pid: Implement PIDTYPE_TGID
    pids: Move the pgrp and session pid pointers from task_struct to signal_struct
    kvm: Don't open code task_pid in kvm_vcpu_ioctl
    pids: Compute task_tgid using signal->leader_pid
    ...

    Linus Torvalds
     

14 Aug, 2018

1 commit

  • Pull file locking updates from Jeff Layton:
    "Just a couple of patches from Konstantin to fix /proc/locks when the
    process that set the lock has exited, and a new tracepoint for the
    flock() codepath. Also threw in mailmap entries for my addresses and a
    comment cleanup"

    * tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    locks: remove misleading obsolete comment
    mailmap: remap some of my email addresses to kernel.org address
    locks: add tracepoint in flock codepath
    fs/lock: show locks taken by processes from another pidns
    fs/lock: skip lock owner pid translation in case we are in init_pid_ns

    Linus Torvalds
     

09 Aug, 2018

1 commit

  • The spinlock handling in this file has changed significantly since this
    comment was written, and the file_lock_lock is no more. In addition,
    this overall comment no longer applies. Deleting an entry now requires
    both locks.

    Signed-off-by: Jeff Layton

    Jeff Layton
     

07 Aug, 2018

1 commit


21 Jul, 2018

1 commit

  • When f_setown is called a pid and a pid type are stored. Replace the use
    of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread
    group. Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now
    is only for a thread.

    Update the users of __f_setown to use PIDTYPE_TGID instead of
    PIDTYPE_PID.

    For now the code continues to capture task_pid (when task_tgid would
    really be appropriate), and iterate on PIDTYPE_PID (even when type ==
    PIDTYPE_TGID) out of an abundance of caution to preserve existing
    behavior.

    Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID
    for tgid lookup also be used to avoid taking the tasklist lock.

    Suggested-by: Oleg Nesterov
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

18 Jul, 2018

2 commits

  • This partially reverts commit c568d68341be7030f5647def68851e469b21ca11.

    Overlayfs files will now automatically get the correct locks, no need to
    hack overlay support in VFS.

    It is a partial revert, because it leaves the locks_inode() calls in place
    and defines locks_inode() to file_inode(). We could revert those as well,
    but it would be unnecessary code churn and it makes sense to document that
    we are getting the inode for locking purposes.

    Don't revert MS_NOREMOTELOCK yet since that has been part of the userspace
    API for some time (though not in a useful way). Will try to remove
    internal flags later when the dust around the new mount API settles.

    Signed-off-by: Miklos Szeredi
    Acked-by: Jeff Layton

    Miklos Szeredi
     
  • This reverts commit 4d0c5ba2ff79ef9f5188998b29fd28fcb05f3667.

    We now get write access on both overlay and underlying layers so this patch
    is no longer needed for correct operation.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

15 Jun, 2018

1 commit

  • Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
    "This is a late set of changes from Deepa Dinamani doing an automated
    treewide conversion of the inode and iattr structures from 'timespec'
    to 'timespec64', to push the conversion from the VFS layer into the
    individual file systems.

    As Deepa writes:

    'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
    timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
    becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
    This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
    timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

    Thomas Gleixner adds:

    'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

    * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    pstore: Remove bogus format string definition
    vfs: change inode times to use struct timespec64
    pstore: Convert internal records to timespec64
    udf: Simplify calls to udf_disk_stamp_to_time
    fs: nfs: get rid of memcpys for inode times
    ceph: make inode time prints to be long long
    lustre: Use long long type to print inode time
    fs: add timespec64_truncate()

    Linus Torvalds
     

14 Jun, 2018

1 commit

  • Currently if we face a lock taken by a process invisible in the current
    pidns we skip the lock completely, but this

    1) makes the output not that nice
    (root@vz7)/: cat /proc/${PID_A2}/fdinfo/3
    pos: 4
    flags: 02100002
    mnt_id: 257
    lock: (root@vz7)/:

    2) makes it more difficult to debug issues with leaked flocks
    if you get error on lock, but don't see any locks in /proc/$id/fdinfo/$file

    Let's show information about such locks again as previously, but
    show zero in the owner pid field.

    After the patch:
    ===============
    (root@vz7)/:cat /proc/${PID_A2}/fdinfo/3
    pos: 4
    flags: 02100002
    mnt_id: 295
    lock: 1: FLOCK ADVISORY WRITE 0 b6:f8a61:529946 0 EOF

    Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks")
    Signed-off-by: Konstantin Khorenko
    Acked-by: Andrey Vagin
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Jeff Layton

    Konstantin Khorenko