24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

10 Aug, 2020

1 commit

  • Pull NFS server updates from Chuck Lever:
    "Highlights:
    - Support for user extended attributes on NFS (RFC 8276)
    - Further reduce unnecessary NFSv4 delegation recalls

    Notable fixes:
    - Fix recent krb5p regression
    - Address a few resource leaks and a rare NULL dereference

    Other:
    - De-duplicate RPC/RDMA error handling and other utility functions
    - Replace storage and display of kernel memory addresses by tracepoints"

    * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
    svcrdma: CM event handler clean up
    svcrdma: Remove transport reference counting
    svcrdma: Fix another Receive buffer leak
    SUNRPC: Refresh the show_rqstp_flags() macro
    nfsd: netns.h: delete a duplicated word
    SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
    nfsd: avoid a NULL dereference in __cld_pipe_upcall()
    nfsd4: a client's own opens needn't prevent delegations
    nfsd: Use seq_putc() in two functions
    svcrdma: Display chunk completion ID when posting a rw_ctxt
    svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
    svcrdma: Introduce Send completion IDs
    svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
    svcrdma: Introduce Receive completion IDs
    svcrdma: Introduce infrastructure to support completion IDs
    svcrdma: Add common XDR encoders for RDMA and Read segments
    svcrdma: Add common XDR decoders for RDMA and Read segments
    SUNRPC: Add helpers for decoding list discriminators symbolically
    svcrdma: Remove declarations for functions long removed
    svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
    ...

    Linus Torvalds
     

04 Aug, 2020

1 commit


14 Jul, 2020

1 commit

  • We recently fixed lease breaking so that a client's actions won't break
    its own delegations.

    But we still have an unnecessary self-conflict when granting
    delegations: a client's own write opens will prevent us from handing out
    a read delegation even when no other client has the file open for write.

    Fix that by turning off the checks for conflicting opens under
    vfs_setlease, and instead performing those checks in the nfsd code.

    We don't depend much on locks here: instead we acquire the delegation,
    then check for conflicts, and drop the delegation again if we find any.

    The check beforehand is an optimization of sorts, just to avoid
    acquiring the delegation unnecessarily. There's a race where the first
    check could cause us to deny the delegation when we could have granted
    it. But, that's OK, delegation grants are optional (and probably not
    even a good idea in that case).

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Chuck Lever

    J. Bruce Fields
     

12 Jun, 2020

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Keep nfsd clients from unnecessarily breaking their own
    delegations.

    Note this requires a small kthreadd addition. The result is Tejun
    Heo's suggestion (see link), and he was OK with this going through
    my tree.

    - Patch nfsd/clients/ to display filenames, and to fix byte-order
    when displaying stateid's.

    - fix a module loading/unloading bug, from Neil Brown.

    - A big series from Chuck Lever with RPC/RDMA and tracing
    improvements, and lay some groundwork for RPC-over-TLS"

    Link: https://lore.kernel.org/r/1588348912-24781-1-git-send-email-bfields@redhat.com

    * tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linux: (49 commits)
    sunrpc: use kmemdup_nul() in gssp_stringify()
    nfsd: safer handling of corrupted c_type
    nfsd4: make drc_slab global, not per-net
    SUNRPC: Remove unreachable error condition in rpcb_getport_async()
    nfsd: Fix svc_xprt refcnt leak when setup callback client failed
    sunrpc: clean up properly in gss_mech_unregister()
    sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations.
    sunrpc: check that domain table is empty at module unload.
    NFSD: Fix improperly-formatted Doxygen comments
    NFSD: Squash an annoying compiler warning
    SUNRPC: Clean up request deferral tracepoints
    NFSD: Add tracepoints for monitoring NFSD callbacks
    NFSD: Add tracepoints to the NFSD state management code
    NFSD: Add tracepoints to NFSD's duplicate reply cache
    SUNRPC: svc_show_status() macro should have enum definitions
    SUNRPC: Restructure svc_udp_recvfrom()
    SUNRPC: Refactor svc_recvfrom()
    SUNRPC: Clean up svc_release_skb() functions
    SUNRPC: Refactor recvfrom path dealing with incomplete TCP receives
    SUNRPC: Replace dprintk() call sites in TCP receive path
    ...

    Linus Torvalds
     

05 Jun, 2020

1 commit

  • Pull proc updates from Eric Biederman:
    "This has four sets of changes:

    - modernize proc to support multiple private instances

    - ensure we see the exit of each process tid exactly

    - remove has_group_leader_pid

    - use pids not tasks in posix-cpu-timers lookup

    Alexey updated proc so each mount of proc uses a new superblock. This
    allows people to actually use mount options with proc with no fear of
    messing up another mount of proc. Given the kernel's internal mounts
    of proc for things like uml this was a real problem, and resulted in
    Android's hidepid mount options being ignored and introducing security
    issues.

    The rest of the changes are small cleanups and fixes that came out of
    my work to allow this change to proc. In essence it is swapping the
    pids in de_thread during exec which removes a special case the code
    had to handle. Then updating the code to stop handling that special
    case"

    * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: proc_pid_ns takes super_block as an argument
    remove the no longer needed pid_alive() check in __task_pid_nr_ns()
    posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
    posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
    posix-cpu-timers: Extend rcu_read_lock removing task_struct references
    signal: Remove has_group_leader_pid
    exec: Remove BUG_ON(has_group_leader_pid)
    posix-cpu-timer: Unify the now redundant code in lookup_task
    posix-cpu-timer: Tidy up group_leader logic in lookup_task
    proc: Ensure we see the exit of each process tid exactly once
    rculist: Add hlists_swap_heads_rcu
    proc: Use PIDTYPE_TGID in next_tgid
    Use proc_pid_ns() to get pid_namespace from the proc superblock
    proc: use named enums for better readability
    proc: use human-readable values for hidepid
    docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
    proc: add option to mount only a pids subset
    proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
    proc: allow to mount many instances of proc in one pid namespace
    proc: rename struct proc_fs_info to proc_fs_opts

    Linus Torvalds
     

03 Jun, 2020

1 commit


19 May, 2020

1 commit

  • syzbot found that

    touch /proc/testfile

    causes NULL pointer dereference at tomoyo_get_local_path()
    because inode of the dentry is NULL.

    Before c59f415a7cb6, Tomoyo received pid_ns from proc's s_fs_info
    directly. Since proc_pid_ns() can only work with inode, using it in
    the tomoyo_get_local_path() was wrong.

    To avoid creating more functions for getting proc_ns, change the
    argument type of the proc_pid_ns() function. Then, Tomoyo can use
    the existing super_block to get pid_ns.

    Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com
    Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com
    Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com
    Fixes: c59f415a7cb6 ("Use proc_pid_ns() to get pid_namespace from the proc superblock")
    Signed-off-by: Alexey Gladkov
    Signed-off-by: Eric W. Biederman

    Alexey Gladkov
     

09 May, 2020

1 commit

  • We currently revoke read delegations on any write open or any operation
    that modifies file data or metadata (including rename, link, and
    unlink). But if the delegation in question is the only read delegation
    and is held by the client performing the operation, that's not really
    necessary.

    It's not always possible to prevent this in the NFSv4.0 case, because
    there's not always a way to determine which client an NFSv4.0 delegation
    came from. (In theory we could try to guess this from the transport
    layer, e.g., by assuming all traffic on a given TCP connection comes
    from the same client. But that's not really correct.)

    In the NFSv4.1 case the session layer always tells us the client.

    This patch should remove such self-conflicts in all cases where we can
    reliably determine the client from the compound.

    To do that we need to track "who" is performing a given (possibly
    lease-breaking) file operation. We're doing that by storing the
    information in the svc_rqst and using kthread_data() to map the current
    task back to a svc_rqst.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

05 May, 2020

1 commit


25 Apr, 2020

1 commit

  • To get pid_namespace from the procfs superblock should be used a special
    helper. This will avoid errors when s_fs_info will change the type.

    Link: https://lore.kernel.org/lkml/20200423200316.164518-3-gladkov.alexey@gmail.com/
    Link: https://lore.kernel.org/lkml/20200423112858.95820-1-gladkov.alexey@gmail.com/
    Link: https://lore.kernel.org/lkml/06B50A1C-406F-4057-BFA8-3A7729EA7469@lca.pw/
    Signed-off-by: Alexey Gladkov
    Signed-off-by: Eric W. Biederman

    Alexey Gladkov
     

19 Mar, 2020

1 commit

  • There is measurable performance impact in some synthetic tests due to
    commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when
    wakeup a waiter). Fix the race condition instead by clearing the
    fl_blocker pointer after the wake_up, using explicit acquire/release
    semantics.

    This does mean that we can no longer use the clearing of fl_blocker as
    the wait condition, so switch the waiters over to checking whether the
    fl_blocked_member list_head is empty.

    Reviewed-by: yangerkun
    Reviewed-by: NeilBrown
    Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
    Signed-off-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Mar, 2020

1 commit

  • '16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the
    logic to check waiter->fl_blocker without blocked_lock_lock. And it will
    trigger a UAF when we try to wakeup some waiter:

    Thread 1 has create a write flock a on file, and now thread 2 try to
    unlock and delete flock a, thread 3 try to add flock b on the same file.

    Thread2 Thread3
    flock syscall(create flock b)
    ...flock_lock_inode_wait
    flock_lock_inode(will insert
    our fl_blocked_member list
    to flock a's fl_blocked_requests)
    sleep
    flock syscall(unlock)
    ...flock_lock_inode_wait
    locks_delete_lock_ctx
    ...__locks_wake_up_blocks
    __locks_delete_blocks(
    b->fl_blocker = NULL)
    ...
    break by a signal
    locks_delete_block
    b->fl_blocker == NULL &&
    list_empty(&b->fl_blocked_requests)
    success, return directly
    locks_free_lock b
    wake_up(&b->fl_waiter)
    trigger UAF

    Fix it by remove this logic, and this patch may also fix CVE-2019-19769.

    Cc: stable@vger.kernel.org
    Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
    Signed-off-by: yangerkun
    Signed-off-by: Jeff Layton

    yangerkun
     

29 Dec, 2019

1 commit


28 Sep, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new knfsd file cache, so that we don't have to open and close
    on each (NFSv2/v3) READ or WRITE. This can speed up read and write
    in some cases. It also replaces our readahead cache.

    - Prevent silent data loss on write errors, by treating write errors
    like server reboots for the purposes of write caching, thus forcing
    clients to resend their writes.

    - Tweak the code that allocates sessions to be more forgiving, so
    that NFSv4.1 mounts are less likely to hang when a server already
    has a lot of clients.

    - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
    limited only by the backend filesystem and the maximum RPC size.

    - Allow the server to enforce use of the correct kerberos credentials
    when a client reclaims state after a reboot.

    And some miscellaneous smaller bugfixes and cleanup"

    * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
    sunrpc: clean up indentation issue
    nfsd: fix nfs read eof detection
    nfsd: Make nfsd_reset_boot_verifier_locked static
    nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
    nfsd: handle drc over-allocation gracefully.
    nfsd: add support for upcall version 2
    nfsd: add a "GetVersion" upcall for nfsdcld
    nfsd: Reset the boot verifier on all write I/O errors
    nfsd: Don't garbage collect files that might contain write errors
    nfsd: Support the server resetting the boot verifier
    nfsd: nfsd_file cache entries should be per net namespace
    nfsd: eliminate an unnecessary acl size limit
    Deprecate nfsd fault injection
    nfsd: remove duplicated include from filecache.c
    nfsd: Fix the documentation for svcxdr_tmpalloc()
    nfsd: Fix up some unused variable warnings
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: rip out the raparms cache
    nfsd: have nfsd_test_lock use the nfsd_file cache
    nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
    ...

    Linus Torvalds
     

20 Aug, 2019

1 commit

  • In __break_lease(), the file lock 'new_fl' is allocated in lease_alloc().
    However, it is not deallocated in the following execution if
    smp_load_acquire() fails, leading to a memory leak bug. To fix this issue,
    free 'new_fl' before returning the error.

    Signed-off-by: Wenwen Wang
    Signed-off-by: Jeff Layton

    Wenwen Wang
     

19 Aug, 2019

2 commits

  • Have them keep an nfsd_file reference instead of a struct file.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • With the new file caching infrastructure in nfsd, we can end up holding
    files open for an indefinite period of time, even when they are still
    idle. This may prevent the kernel from handing out leases on the file,
    which is something we don't want to block.

    Fix this by running a SRCU notifier call chain whenever on any
    lease attempt. nfsd can then purge the cache for that inode before
    returning.

    Since SRCU is only conditionally compiled in, we must only define the
    new chain if it's enabled, and users of the chain must ensure that
    SRCU is enabled.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

25 Jul, 2019

1 commit

  • Since commit 778fc546f749c588aa2f ("locks: fix tracking of inprogress
    lease breaks"), leases break don't change @fl_type but modifies
    @fl_flags. However, procfs's part haven't been updated.

    Previously, for a breaking lease the target type was printed (see
    target_leasetype()), as returns fcntl(F_GETLEASE). But now it's always
    "READ", as F_UNLCK no longer means "breaking". Unlike the previous
    one, this behaviour don't provide a complete description of the lease.

    There are /proc/pid/fdinfo/ outputs for a lease (the same for READ and
    WRITE) breaked by O_WRONLY.
    -- before:
    lock: 1: LEASE BREAKING READ 2558 08:03:815793 0 EOF
    -- after:
    lock: 1: LEASE BREAKING UNLCK 2558 08:03:815793 0 EOF

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jeff Layton

    Pavel Begunkov
     

11 Jul, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new /proc/fs/nfsd/clients/ directory which exposes some
    long-requested information about NFSv4 clients (like open files)
    and allows forced revocation of client state.

    - Replace the global duplicate reply cache by a cache per network
    namespace; previously, a request in one network namespace could
    incorrectly match an entry from another, though we haven't seen
    this in production. This is the last remaining container bug that
    I'm aware of; at this point you should be able to run separate
    nfsd's in each network namespace, each with their own set of
    exports, and everything should work.

    - Cleanup and modify lock code to show the pid of lockd as the owner
    of NLM locks. This is the correct version of the bugfix originally
    attempted in b8eee0e90f97 ("lockd: Show pid of lockd for remote
    locks")"

    * tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
    nfsd: Make __get_nfsdfs_client() static
    nfsd: Make two functions static
    nfsd: Fix misuse of strlcpy
    sunrpc/cache: remove the exporting of cache_seq_next
    nfsd: decode implementation id
    nfsd: create xdr_netobj_dup helper
    nfsd: allow forced expiration of NFSv4 clients
    nfsd: create get_nfsdfs_clp helper
    nfsd4: show layout stateids
    nfsd: show lock and deleg stateids
    nfsd4: add file to display list of client's opens
    nfsd: add more information to client info file
    nfsd: escape high characters in binary data
    nfsd: copy client's address including port number to cl_addr
    nfsd4: add a client info file
    nfsd: make client/ directory names small ints
    nfsd: add nfsd/clients directory
    nfsd4: use reference count to free client
    nfsd: rename cl_refcount
    nfsd: persist nfsd filesystem across mounts
    ...

    Linus Torvalds
     

04 Jul, 2019

1 commit


19 Jun, 2019

2 commits

  • check_conflicting_open() is checking for existing fd's open for read or
    for write before allowing to take a write lease. The check that was
    implemented using i_count and d_count is an approximation that has
    several false positives. For example, overlayfs since v4.19, takes an
    extra reference on the dentry; An open with O_PATH takes a reference on
    the dentry although the file cannot be read nor written.

    Change the implementation to use i_readcount and i_writecount to
    eliminate the false positive conflicts and allow a write lease to be
    taken on an overlayfs file.

    The change of behavior with existing fd's open with O_PATH is symmetric
    w.r.t. current behavior of lease breakers - an open with O_PATH currently
    does not break a write lease.

    This increases the size of struct inode by 4 bytes on 32bit archs when
    CONFIG_FILE_LOCKING is defined and CONFIG_IMA was not already
    defined.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jeff Layton

    Amir Goldstein
     
  • Signed-off-by: Ira Weiny
    Signed-off-by: Jeff Layton

    Ira Weiny
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 May, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "This consists mostly of nfsd container work:

    Scott Mayhew revived an old api that communicates with a userspace
    daemon to manage some on-disk state that's used to track clients
    across server reboots. We've been using a usermode_helper upcall for
    that, but it's tough to run those with the right namespaces, so a
    daemon is much friendlier to container use cases.

    Trond fixed nfsd's handling of user credentials in user namespaces. He
    also contributed patches that allow containers to support different
    sets of NFS protocol versions.

    The only remaining container bug I'm aware of is that the NFS reply
    cache is shared between all containers. If anyone's aware of other
    gaps in our container support, let me know.

    The rest of this is miscellaneous bugfixes"

    * tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits)
    nfsd: update callback done processing
    locks: move checks from locks_free_lock() to locks_release_private()
    nfsd: fh_drop_write in nfsd_unlink
    nfsd: allow fh_want_write to be called twice
    nfsd: knfsd must use the container user namespace
    SUNRPC: rsi_parse() should use the current user namespace
    SUNRPC: Fix the server AUTH_UNIX userspace mappings
    lockd: Pass the user cred from knfsd when starting the lockd server
    SUNRPC: Temporary sockets should inherit the cred from their parent
    SUNRPC: Cache the process user cred in the RPC server listener
    nfsd: Allow containers to set supported nfs versions
    nfsd: Add custom rpcbind callbacks for knfsd
    SUNRPC: Allow further customisation of RPC program registration
    SUNRPC: Clean up generic dispatcher code
    SUNRPC: Add a callback to initialise server requests
    SUNRPC/nfs: Fix return value for nfs4_callback_compound()
    nfsd: handle legacy client tracking records sent by nfsdcld
    nfsd: re-order client tracking method selection
    nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld
    nfsd: un-deprecate nfsdcld
    ...

    Linus Torvalds
     

08 May, 2019

1 commit

  • …kernel/git/gustavoars/linux

    Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
    "Mark switch cases where we are expecting to fall through.

    This is part of the ongoing efforts to enable -Wimplicit-fallthrough.

    Most of them have been baking in linux-next for a whole development
    cycle. And with Stephen Rothwell's help, we've had linux-next
    nag-emails going out for newly introduced code that triggers
    -Wimplicit-fallthrough to avoid gaining more of these cases while we
    work to remove the ones that are already present.

    We are getting close to completing this work. Currently, there are
    only 32 of 2311 of these cases left to be addressed in linux-next. I'm
    auditing every case; I take a look into the code and analyze it in
    order to determine if I'm dealing with an actual bug or a false
    positive, as explained here:

    https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/

    While working on this, I've found and fixed the several missing
    break/return bugs, some of them introduced more than 5 years ago.

    Once this work is finished, we'll be able to universally enable
    "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
    entering the kernel again"

    * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
    memstick: mark expected switch fall-throughs
    drm/nouveau/nvkm: mark expected switch fall-throughs
    NFC: st21nfca: Fix fall-through warnings
    NFC: pn533: mark expected switch fall-throughs
    block: Mark expected switch fall-throughs
    ASN.1: mark expected switch fall-through
    lib/cmdline.c: mark expected switch fall-throughs
    lib: zstd: Mark expected switch fall-throughs
    scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
    scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
    scsi: ppa: mark expected switch fall-through
    scsi: osst: mark expected switch fall-throughs
    scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
    scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
    scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
    scsi: imm: mark expected switch fall-throughs
    scsi: csiostor: csio_wr: mark expected switch fall-through
    ...

    Linus Torvalds
     

24 Apr, 2019

1 commit

  • Code that allocates locks using locks_alloc_lock() will free it
    using locks_free_lock(), and will benefit from the BUG_ON()
    consistency checks therein.

    However some code (nfsd and lockd) allocate a lock embedded in
    some other data structure, and so free the lock themselves after
    calling locks_release_private(). This path does not benefit from
    the consistency checks.

    To help catch future errors, move the BUG_ON() checks to
    locks_release_private() - which locks_free_lock() already calls.
    This ensures that all users for locks will find out if the lock
    isn't detached properly before being free.

    Signed-off-by: NeilBrown
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

09 Apr, 2019

1 commit

  • In preparation to enabling -Wimplicit-fallthrough, mark switch cases
    where we are expecting to fall through.

    This patch fixes the following warnings:

    fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Warning level 3 was used: -Wimplicit-fallthrough=3

    This patch is part of the ongoing efforts to enabling
    -Wimplicit-fallthrough.

    Reviewed-by: Kees Cook
    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

25 Mar, 2019

1 commit

  • Andreas reported that he was seeing the tdbtorture test fail in some
    cases with -EDEADLCK when it wasn't before. Some debugging showed that
    deadlock detection was sometimes discovering the caller's lock request
    itself in a dependency chain.

    While we remove the request from the blocked_lock_hash prior to
    reattempting to acquire it, any locks that are blocked on that request
    will still be present in the hash and will still have their fl_blocker
    pointer set to the current request.

    This causes posix_locks_deadlock to find a deadlock dependency chain
    when it shouldn't, as a lock request cannot block itself.

    We are going to end up waking all of those blocked locks anyway when we
    go to reinsert the request back into the blocked_lock_hash, so just do
    it prior to checking for deadlocks. This ensures that any lock blocked
    on the current request will no longer be part of any blocked request
    chain.

    URL: https://bugzilla.kernel.org/show_bug.cgi?id=202975
    Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.")
    Cc: stable@vger.kernel.org
    Reported-by: Andreas Schneider
    Signed-off-by: Neil Brown
    Signed-off-by: Jeff Layton

    Jeff Layton
     

28 Feb, 2019

1 commit

  • Effective revert commit:

    87709e28dc7c ("fs/locks: Use percpu_down_read_preempt_disable()")

    This is causing major pain for PREEMPT_RT.

    Sebastian did a lot of lockperf runs on 2 and 4 node machines with all
    preemption modes (PREEMPT=n should be an obvious NOP for this patch
    and thus serves as a good control) and no results showed significance
    over 2-sigma (the PREEMPT=n results were almost empty at 1-sigma).

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jan, 2019

1 commit

  • After moving all requests from
    fl->fl_blocked_requests
    to
    new->fl_blocked_requests

    it is nonsensical to do anything to all the remaining elements, there
    aren't any. This should do something to all the requests that have been
    moved. For simplicity, it does it to all requests in the target list.

    Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests
    is "obviously correct" as it preserves the invariant of the linkage
    among requests.

    Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com
    Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.")
    Signed-off-by: NeilBrown
    Signed-off-by: Jeff Layton

    NeilBrown
     

17 Dec, 2018

1 commit


07 Dec, 2018

5 commits

  • - spaces before tabs,
    - spaces at the end of lines,
    - multiple blank lines,
    - blank lines before EXPORT_SYMBOL,
    can all go.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • posix_unblock_lock() is not specific to posix locks, and behaves
    nearly identically to locks_delete_block() - the former returning a
    status while the later doesn't.

    So discard posix_unblock_lock() and use locks_delete_block() instead,
    after giving that function an appropriate return value.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • When we find an existing lock which conflicts with a request,
    and the request wants to wait, we currently add the request
    to a list. When the lock is removed, the whole list is woken.
    This can cause the thundering-herd problem.
    To reduce the problem, we make use of the (new) fact that
    a pending request can itself have a list of blocked requests.
    When we find a conflict, we look through the existing blocked requests.
    If any one of them blocks the new request, the new request is attached
    below that request, otherwise it is added to the list of blocked
    requests, which are now known to be mutually non-conflicting.

    This way, when the lock is released, only a set of non-conflicting
    locks will be woken, the rest can stay asleep.
    If the lock request cannot be granted and the request needs to be
    requeued, all the other requests it blocks will then be woken

    To make this more concrete:

    If you have a many-core machine, and have many threads all wanting to
    briefly lock a give file (udev is known to do this), you can get quite
    poor performance.

    When one thread releases a lock, it wakes up all other threads that
    are waiting (classic thundering-herd) - one will get the lock and the
    others go to sleep.
    When you have few cores, this is not very noticeable: by the time the
    4th or 5th thread gets enough CPU time to try to claim the lock, the
    earlier threads have claimed it, done what was needed, and released.
    So with few cores, many of the threads don't end up contending.
    With 50+ cores, lost of threads can get the CPU at the same time,
    and the contention can easily be measured.

    This patchset creates a tree of pending lock requests in which siblings
    don't conflict and each lock request does conflict with its parent.
    When a lock is released, only requests which don't conflict with each
    other a woken.

    Testing shows that lock-acquisitions-per-second is now fairly stable
    even as the number of contending process goes to 1000. Without this
    patch, locks-per-second drops off steeply after a few 10s of
    processes.

    There is a small cost to this extra complexity.
    At 20 processes running a particular test on 72 cores, the lock
    acquisitions per second drops from 1.8 million to 1.4 million with
    this patch. For 100 processes, this patch still provides 1.4 million
    while without this patch there are about 700,000.

    Reported-and-tested-by: Martin Wilck
    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • posix_locks_conflict() and flock_locks_conflict() both return int.
    leases_conflict() returns bool.

    This inconsistency will cause problems for the next patch if not
    fixed.

    So change posix_locks_conflict() and flock_locks_conflict() to return
    bool.
    Also change the locks_conflict() helper.

    And convert some
    return (foo);
    to
    return foo;

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • Now that requests can block other requests, we
    need to be careful to always clean up those blocked
    requests.
    Any time that we wait for a request, we might have
    other requests attached, and when we stop waiting,
    we must clean them up.
    If the lock was granted, the requests might have been
    moved to the new lock, though when merged with a
    pre-exiting lock, this might not happen.
    In all cases we don't want blocked locks to remain
    attached, so we remove them to be safe.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Tested-by: syzbot+a4a3d526b4157113ec6a@syzkaller.appspotmail.com
    Tested-by: kernel test robot
    Signed-off-by: Jeff Layton

    NeilBrown
     

01 Dec, 2018

3 commits

  • Currently, a lock can block pending requests, but all pending
    requests are equal. If lots of pending requests are
    mutually exclusive, this means they will all be woken up
    and all but one will fail. This can hurt performance.

    So we will allow pending requests to block other requests.
    Only the first request will be woken, and it will wake the others.

    This patch doesn't implement this fully, but prepares the way.

    - It acknowledges that a request might be blocking other requests,
    and when the request is converted to a lock, those blocked
    requests are moved across.
    - When a request is requeued or discarded, all blocked requests are
    woken.
    - When deadlock-detection looks for the lock which blocks a
    given request, we follow the chain of ->fl_blocker all
    the way to the top.

    Tested-by: kernel test robot
    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • Both locks_remove_posix() and locks_remove_flock() use a
    struct file_lock without calling locks_init_lock() on it.
    This means the various list_heads are not initialized, which
    will become a problem with a later patch.

    So change them both to initialize properly. For flock locks,
    this involves using flock_make_lock(), and changing it to
    allow a file_lock to be passed in, so memory allocation isn't
    always needed.

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown
     
  • This functionality will be useful in future patches, so
    split it out from locks_wake_up_blocks().

    Signed-off-by: NeilBrown
    Reviewed-by: J. Bruce Fields
    Signed-off-by: Jeff Layton

    NeilBrown