08 Sep, 2021

1 commit

  • This reverts commit 0f12156dff2862ac54235fc72703f18770769042.

    The kernel test robot reports a sizeable performance regression for this
    commit, and while it clearly does the rigth thing in theory, we'll need
    to look at just how to avoid or minimize the performance overhead of the
    memcg accounting.

    People already have suggestions on how to do that, but it's "future
    work".

    So revert it for now.

    Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
    Acked-by: Jens Axboe
    Acked-by: Shakeel Butt
    Acked-by: Roman Gushchin
    Cc: Tejun Heo
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Sep, 2021

2 commits

  • Merge misc updates from Andrew Morton:
    "173 patches.

    Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
    pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
    bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
    hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
    oom-kill, migration, ksm, percpu, vmstat, and madvise)"

    * emailed patches from Andrew Morton : (173 commits)
    mm/madvise: add MADV_WILLNEED to process_madvise()
    mm/vmstat: remove unneeded return value
    mm/vmstat: simplify the array size calculation
    mm/vmstat: correct some wrong comments
    mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
    selftests: vm: add COW time test for KSM pages
    selftests: vm: add KSM merging time test
    mm: KSM: fix data type
    selftests: vm: add KSM merging across nodes test
    selftests: vm: add KSM zero page merging test
    selftests: vm: add KSM unmerge test
    selftests: vm: add KSM merge test
    mm/migrate: correct kernel-doc notation
    mm: wire up syscall process_mrelease
    mm: introduce process_mrelease system call
    memblock: make memblock_find_in_range method private
    mm/mempolicy.c: use in_task() in mempolicy_slab_node()
    mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
    mm/mempolicy: advertise new MPOL_PREFERRED_MANY
    mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
    ...

    Linus Torvalds
     
  • User can create file locks for each open file and force kernel to allocate
    small but long-living objects per each open file.

    It makes sense to account for these objects to limit the host's memory
    consumption from inside the memcg-limited container.

    Link: https://lkml.kernel.org/r/b009f4c7-f0ab-c0ec-8e83-918f47d677da@virtuozzo.com
    Signed-off-by: Vasily Averin
    Reviewed-by: Shakeel Butt
    Cc: Alexander Viro
    Cc: Alexey Dobriyan
    Cc: Andrei Vagin
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Christian Brauner
    Cc: Dmitry Safonov
    Cc: "Eric W. Biederman"
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: "J. Bruce Fields"
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Jiri Slaby
    Cc: Johannes Weiner
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Oleg Nesterov
    Cc: Roman Gushchin
    Cc: Serge Hallyn
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Vladimir Davydov
    Cc: Yutian Yang
    Cc: Zefan Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     

23 Aug, 2021

1 commit

  • We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
    off in Fedora and RHEL8. Several other distros have followed suit.

    I've heard of one problem in all that time: Someone migrated from an
    older distro that supported "-o mand" to one that didn't, and the host
    had a fstab entry with "mand" in it which broke on reboot. They didn't
    actually _use_ mandatory locking so they just removed the mount option
    and moved on.

    This patch rips out mandatory locking support wholesale from the kernel,
    along with the Kconfig option and the Documentation file. It also
    changes the mount code to ignore the "mand" mount option instead of
    erroring out, and to throw a big, ugly warning.

    Signed-off-by: Jeff Layton

    Jeff Layton
     

06 May, 2021

1 commit

  • Pull more nfsd updates from Chuck Lever:
    "Additional fixes and clean-ups for NFSD since tags/nfsd-5.13,
    including a fix to grant read delegations for files open for writing"

    * tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
    SUNRPC: Fix null pointer dereference in svc_rqst_free()
    SUNRPC: fix ternary sign expansion bug in tracing
    nfsd: Fix fall-through warnings for Clang
    nfsd: grant read delegations to clients holding writes
    nfsd: reshuffle some code
    nfsd: track filehandle aliasing in nfs4_files
    nfsd: hash nfs4_files by inode number
    nfsd: ensure new clients break delegations
    nfsd: removed unused argument in nfsd_startup_generic()
    nfsd: remove unused function
    svcrdma: Pass a useful error code to the send_err tracepoint
    svcrdma: Rename goto labels in svc_rdma_sendto()
    svcrdma: Don't leak send_ctxt on Send errors

    Linus Torvalds
     

27 Apr, 2021

1 commit

  • Pull file locking updates from Jeff Layton:
    "When we reworked the blocked locks into a tree structure instead of a
    flat list a few releases ago, we lost the ability to see all of the
    file locks in /proc/locks. Luo's patch fixes it to dump out all of the
    blocked locks instead, which restores the full output.

    This changes the format of /proc/locks as the blocked locks are shown
    at multiple levels of indentation now, but lslocks (the only common
    program I've ID'ed that scrapes this info) seems to be OK with that.

    Tian also contributed a small patch to remove a useless assignment"

    * tag 'locks-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    fs/locks: remove useless assignment in fcntl_getlk
    fs/locks: print full locks information

    Linus Torvalds
     

20 Apr, 2021

1 commit

  • It's OK to grant a read delegation to a client that holds a write,
    as long as it's the only client holding the write.

    We originally tried to do this in commit 94415b06eb8a ("nfsd4: a
    client's own opens needn't prevent delegations"), which had to be
    reverted in commit 6ee65a773096 ("Revert "nfsd4: a client's own
    opens needn't prevent delegations"").

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Chuck Lever

    J. Bruce Fields
     

13 Apr, 2021

1 commit


11 Mar, 2021

1 commit

  • Commit fd7732e033e3 ("fs/locks: create a tree of dependent requests.")
    has put blocked locks into a tree.

    So, with a for loop, we can't check all locks information.

    To solve this problem, we should traverse the tree.

    Signed-off-by: Luo Longjun
    Signed-off-by: Jeff Layton

    Luo Longjun
     

09 Mar, 2021

1 commit

  • This reverts commit 94415b06eb8aed13481646026dc995f04a3a534a.

    That commit claimed to allow a client to get a read delegation when it
    was the only writer. Actually it allowed a client to get a read
    delegation when *any* client has a write open!

    The main problem is that it's depending on nfs4_clnt_odstate structures
    that are actually only maintained for pnfs exports.

    This causes clients to miss writes performed by other clients, even when
    there have been intervening closes and opens, violating close-to-open
    cache consistency.

    We can do this a different way, but first we should just revert this.

    I've added pynfs 4.1 test DELEG19 to test for this, as I should have
    done originally!

    Cc: stable@vger.kernel.org
    Reported-by: Timo Rothenpieler
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Chuck Lever

    J. Bruce Fields
     

16 Dec, 2020

1 commit

  • …biederm/user-namespace

    Pull execve updates from Eric Biederman:
    "This set of changes ultimately fixes the interaction of posix file
    lock and exec. Fundamentally most of the change is just moving where
    unshare_files is called during exec, and tweaking the users of
    files_struct so that the count of files_struct is not unnecessarily
    played with.

    Along the way fcheck and related helpers were renamed to more
    accurately reflect what they do.

    There were also many other small changes that fell out, as this is the
    first time in a long time much of this code has been touched.

    Benchmarks haven't turned up any practical issues but Al Viro has
    observed a possibility for a lot of pounding on task_lock. So I have
    some changes in progress to convert put_files_struct to always rcu
    free files_struct. That wasn't ready for the merge window so that will
    have to wait until next time"

    * 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
    exec: Move io_uring_task_cancel after the point of no return
    coredump: Document coredump code exclusively used by cell spufs
    file: Remove get_files_struct
    file: Rename __close_fd_get_file close_fd_get_file
    file: Replace ksys_close with close_fd
    file: Rename __close_fd to close_fd and remove the files parameter
    file: Merge __alloc_fd into alloc_fd
    file: In f_dupfd read RLIMIT_NOFILE once.
    file: Merge __fd_install into fd_install
    proc/fd: In fdinfo seq_show don't use get_files_struct
    bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu
    proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu
    file: Implement task_lookup_next_fd_rcu
    kcmp: In get_file_raw_ptr use task_lookup_fd_rcu
    proc/fd: In tid_fd_mode use task_lookup_fd_rcu
    file: Implement task_lookup_fd_rcu
    file: Rename fcheck lookup_fd_rcu
    file: Replace fcheck_files with files_lookup_fd_rcu
    file: Factor files_lookup_fd_locked out of fcheck_files
    file: Rename __fcheck_files to files_lookup_fd_raw
    ...

    Linus Torvalds
     

11 Dec, 2020

1 commit

  • To make it easy to tell where files->file_lock protection is being
    used when looking up a file create files_lookup_fd_locked. Only allow
    this function to be called with the file_lock held.

    Update the callers of fcheck and fcheck_files that are called with the
    files->file_lock held to call files_lookup_fd_locked instead.

    Hopefully this makes it easier to quickly understand what is going on.

    The need for better names became apparent in the last round of
    discussion of this set of changes[1].

    [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com
    Link: https://lkml.kernel.org/r/20201120231441.29911-8-ebiederm@xmission.com
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

26 Oct, 2020

2 commits

  • locks_delete_lock -> locks_delete_block

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Jeff Layton

    Mauro Carvalho Chehab
     
  • When the sum of fl->fl_start and l->l_len overflows,
    UBSAN shows the following warning:

    UBSAN: Undefined behaviour in fs/locks.c:482:29
    signed integer overflow: 2 + 9223372036854775806
    cannot be represented in type 'long long int'
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0xe4/0x14e lib/dump_stack.c:118
    ubsan_epilogue+0xe/0x81 lib/ubsan.c:161
    handle_overflow+0x193/0x1e2 lib/ubsan.c:192
    flock64_to_posix_lock fs/locks.c:482 [inline]
    flock_to_posix_lock+0x595/0x690 fs/locks.c:515
    fcntl_setlk+0xf3/0xa90 fs/locks.c:2262
    do_fcntl+0x456/0xf60 fs/fcntl.c:387
    __do_sys_fcntl fs/fcntl.c:483 [inline]
    __se_sys_fcntl fs/fcntl.c:468 [inline]
    __x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:468
    do_syscall_64+0xc8/0x5a0 arch/x86/entry/common.c:293
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fix it by parenthesizing 'l->l_len - 1'.

    Signed-off-by: Luo Meng
    Signed-off-by: Jeff Layton

    Luo Meng
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

10 Aug, 2020

1 commit

  • Pull NFS server updates from Chuck Lever:
    "Highlights:
    - Support for user extended attributes on NFS (RFC 8276)
    - Further reduce unnecessary NFSv4 delegation recalls

    Notable fixes:
    - Fix recent krb5p regression
    - Address a few resource leaks and a rare NULL dereference

    Other:
    - De-duplicate RPC/RDMA error handling and other utility functions
    - Replace storage and display of kernel memory addresses by tracepoints"

    * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
    svcrdma: CM event handler clean up
    svcrdma: Remove transport reference counting
    svcrdma: Fix another Receive buffer leak
    SUNRPC: Refresh the show_rqstp_flags() macro
    nfsd: netns.h: delete a duplicated word
    SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
    nfsd: avoid a NULL dereference in __cld_pipe_upcall()
    nfsd4: a client's own opens needn't prevent delegations
    nfsd: Use seq_putc() in two functions
    svcrdma: Display chunk completion ID when posting a rw_ctxt
    svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
    svcrdma: Introduce Send completion IDs
    svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
    svcrdma: Introduce Receive completion IDs
    svcrdma: Introduce infrastructure to support completion IDs
    svcrdma: Add common XDR encoders for RDMA and Read segments
    svcrdma: Add common XDR decoders for RDMA and Read segments
    SUNRPC: Add helpers for decoding list discriminators symbolically
    svcrdma: Remove declarations for functions long removed
    svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
    ...

    Linus Torvalds
     

04 Aug, 2020

1 commit


14 Jul, 2020

1 commit

  • We recently fixed lease breaking so that a client's actions won't break
    its own delegations.

    But we still have an unnecessary self-conflict when granting
    delegations: a client's own write opens will prevent us from handing out
    a read delegation even when no other client has the file open for write.

    Fix that by turning off the checks for conflicting opens under
    vfs_setlease, and instead performing those checks in the nfsd code.

    We don't depend much on locks here: instead we acquire the delegation,
    then check for conflicts, and drop the delegation again if we find any.

    The check beforehand is an optimization of sorts, just to avoid
    acquiring the delegation unnecessarily. There's a race where the first
    check could cause us to deny the delegation when we could have granted
    it. But, that's OK, delegation grants are optional (and probably not
    even a good idea in that case).

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Chuck Lever

    J. Bruce Fields
     

12 Jun, 2020

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Keep nfsd clients from unnecessarily breaking their own
    delegations.

    Note this requires a small kthreadd addition. The result is Tejun
    Heo's suggestion (see link), and he was OK with this going through
    my tree.

    - Patch nfsd/clients/ to display filenames, and to fix byte-order
    when displaying stateid's.

    - fix a module loading/unloading bug, from Neil Brown.

    - A big series from Chuck Lever with RPC/RDMA and tracing
    improvements, and lay some groundwork for RPC-over-TLS"

    Link: https://lore.kernel.org/r/1588348912-24781-1-git-send-email-bfields@redhat.com

    * tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linux: (49 commits)
    sunrpc: use kmemdup_nul() in gssp_stringify()
    nfsd: safer handling of corrupted c_type
    nfsd4: make drc_slab global, not per-net
    SUNRPC: Remove unreachable error condition in rpcb_getport_async()
    nfsd: Fix svc_xprt refcnt leak when setup callback client failed
    sunrpc: clean up properly in gss_mech_unregister()
    sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations.
    sunrpc: check that domain table is empty at module unload.
    NFSD: Fix improperly-formatted Doxygen comments
    NFSD: Squash an annoying compiler warning
    SUNRPC: Clean up request deferral tracepoints
    NFSD: Add tracepoints for monitoring NFSD callbacks
    NFSD: Add tracepoints to the NFSD state management code
    NFSD: Add tracepoints to NFSD's duplicate reply cache
    SUNRPC: svc_show_status() macro should have enum definitions
    SUNRPC: Restructure svc_udp_recvfrom()
    SUNRPC: Refactor svc_recvfrom()
    SUNRPC: Clean up svc_release_skb() functions
    SUNRPC: Refactor recvfrom path dealing with incomplete TCP receives
    SUNRPC: Replace dprintk() call sites in TCP receive path
    ...

    Linus Torvalds
     

05 Jun, 2020

1 commit

  • Pull proc updates from Eric Biederman:
    "This has four sets of changes:

    - modernize proc to support multiple private instances

    - ensure we see the exit of each process tid exactly

    - remove has_group_leader_pid

    - use pids not tasks in posix-cpu-timers lookup

    Alexey updated proc so each mount of proc uses a new superblock. This
    allows people to actually use mount options with proc with no fear of
    messing up another mount of proc. Given the kernel's internal mounts
    of proc for things like uml this was a real problem, and resulted in
    Android's hidepid mount options being ignored and introducing security
    issues.

    The rest of the changes are small cleanups and fixes that came out of
    my work to allow this change to proc. In essence it is swapping the
    pids in de_thread during exec which removes a special case the code
    had to handle. Then updating the code to stop handling that special
    case"

    * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: proc_pid_ns takes super_block as an argument
    remove the no longer needed pid_alive() check in __task_pid_nr_ns()
    posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
    posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
    posix-cpu-timers: Extend rcu_read_lock removing task_struct references
    signal: Remove has_group_leader_pid
    exec: Remove BUG_ON(has_group_leader_pid)
    posix-cpu-timer: Unify the now redundant code in lookup_task
    posix-cpu-timer: Tidy up group_leader logic in lookup_task
    proc: Ensure we see the exit of each process tid exactly once
    rculist: Add hlists_swap_heads_rcu
    proc: Use PIDTYPE_TGID in next_tgid
    Use proc_pid_ns() to get pid_namespace from the proc superblock
    proc: use named enums for better readability
    proc: use human-readable values for hidepid
    docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
    proc: add option to mount only a pids subset
    proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
    proc: allow to mount many instances of proc in one pid namespace
    proc: rename struct proc_fs_info to proc_fs_opts

    Linus Torvalds
     

03 Jun, 2020

1 commit


19 May, 2020

1 commit

  • syzbot found that

    touch /proc/testfile

    causes NULL pointer dereference at tomoyo_get_local_path()
    because inode of the dentry is NULL.

    Before c59f415a7cb6, Tomoyo received pid_ns from proc's s_fs_info
    directly. Since proc_pid_ns() can only work with inode, using it in
    the tomoyo_get_local_path() was wrong.

    To avoid creating more functions for getting proc_ns, change the
    argument type of the proc_pid_ns() function. Then, Tomoyo can use
    the existing super_block to get pid_ns.

    Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com
    Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com
    Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com
    Fixes: c59f415a7cb6 ("Use proc_pid_ns() to get pid_namespace from the proc superblock")
    Signed-off-by: Alexey Gladkov
    Signed-off-by: Eric W. Biederman

    Alexey Gladkov
     

09 May, 2020

1 commit

  • We currently revoke read delegations on any write open or any operation
    that modifies file data or metadata (including rename, link, and
    unlink). But if the delegation in question is the only read delegation
    and is held by the client performing the operation, that's not really
    necessary.

    It's not always possible to prevent this in the NFSv4.0 case, because
    there's not always a way to determine which client an NFSv4.0 delegation
    came from. (In theory we could try to guess this from the transport
    layer, e.g., by assuming all traffic on a given TCP connection comes
    from the same client. But that's not really correct.)

    In the NFSv4.1 case the session layer always tells us the client.

    This patch should remove such self-conflicts in all cases where we can
    reliably determine the client from the compound.

    To do that we need to track "who" is performing a given (possibly
    lease-breaking) file operation. We're doing that by storing the
    information in the svc_rqst and using kthread_data() to map the current
    task back to a svc_rqst.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

05 May, 2020

1 commit


25 Apr, 2020

1 commit

  • To get pid_namespace from the procfs superblock should be used a special
    helper. This will avoid errors when s_fs_info will change the type.

    Link: https://lore.kernel.org/lkml/20200423200316.164518-3-gladkov.alexey@gmail.com/
    Link: https://lore.kernel.org/lkml/20200423112858.95820-1-gladkov.alexey@gmail.com/
    Link: https://lore.kernel.org/lkml/06B50A1C-406F-4057-BFA8-3A7729EA7469@lca.pw/
    Signed-off-by: Alexey Gladkov
    Signed-off-by: Eric W. Biederman

    Alexey Gladkov
     

19 Mar, 2020

1 commit

  • There is measurable performance impact in some synthetic tests due to
    commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when
    wakeup a waiter). Fix the race condition instead by clearing the
    fl_blocker pointer after the wake_up, using explicit acquire/release
    semantics.

    This does mean that we can no longer use the clearing of fl_blocker as
    the wait condition, so switch the waiters over to checking whether the
    fl_blocked_member list_head is empty.

    Reviewed-by: yangerkun
    Reviewed-by: NeilBrown
    Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
    Signed-off-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Mar, 2020

1 commit

  • '16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the
    logic to check waiter->fl_blocker without blocked_lock_lock. And it will
    trigger a UAF when we try to wakeup some waiter:

    Thread 1 has create a write flock a on file, and now thread 2 try to
    unlock and delete flock a, thread 3 try to add flock b on the same file.

    Thread2 Thread3
    flock syscall(create flock b)
    ...flock_lock_inode_wait
    flock_lock_inode(will insert
    our fl_blocked_member list
    to flock a's fl_blocked_requests)
    sleep
    flock syscall(unlock)
    ...flock_lock_inode_wait
    locks_delete_lock_ctx
    ...__locks_wake_up_blocks
    __locks_delete_blocks(
    b->fl_blocker = NULL)
    ...
    break by a signal
    locks_delete_block
    b->fl_blocker == NULL &&
    list_empty(&b->fl_blocked_requests)
    success, return directly
    locks_free_lock b
    wake_up(&b->fl_waiter)
    trigger UAF

    Fix it by remove this logic, and this patch may also fix CVE-2019-19769.

    Cc: stable@vger.kernel.org
    Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
    Signed-off-by: yangerkun
    Signed-off-by: Jeff Layton

    yangerkun
     

29 Dec, 2019

1 commit


28 Sep, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new knfsd file cache, so that we don't have to open and close
    on each (NFSv2/v3) READ or WRITE. This can speed up read and write
    in some cases. It also replaces our readahead cache.

    - Prevent silent data loss on write errors, by treating write errors
    like server reboots for the purposes of write caching, thus forcing
    clients to resend their writes.

    - Tweak the code that allocates sessions to be more forgiving, so
    that NFSv4.1 mounts are less likely to hang when a server already
    has a lot of clients.

    - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
    limited only by the backend filesystem and the maximum RPC size.

    - Allow the server to enforce use of the correct kerberos credentials
    when a client reclaims state after a reboot.

    And some miscellaneous smaller bugfixes and cleanup"

    * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
    sunrpc: clean up indentation issue
    nfsd: fix nfs read eof detection
    nfsd: Make nfsd_reset_boot_verifier_locked static
    nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
    nfsd: handle drc over-allocation gracefully.
    nfsd: add support for upcall version 2
    nfsd: add a "GetVersion" upcall for nfsdcld
    nfsd: Reset the boot verifier on all write I/O errors
    nfsd: Don't garbage collect files that might contain write errors
    nfsd: Support the server resetting the boot verifier
    nfsd: nfsd_file cache entries should be per net namespace
    nfsd: eliminate an unnecessary acl size limit
    Deprecate nfsd fault injection
    nfsd: remove duplicated include from filecache.c
    nfsd: Fix the documentation for svcxdr_tmpalloc()
    nfsd: Fix up some unused variable warnings
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: rip out the raparms cache
    nfsd: have nfsd_test_lock use the nfsd_file cache
    nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
    ...

    Linus Torvalds
     

20 Aug, 2019

1 commit

  • In __break_lease(), the file lock 'new_fl' is allocated in lease_alloc().
    However, it is not deallocated in the following execution if
    smp_load_acquire() fails, leading to a memory leak bug. To fix this issue,
    free 'new_fl' before returning the error.

    Signed-off-by: Wenwen Wang
    Signed-off-by: Jeff Layton

    Wenwen Wang
     

19 Aug, 2019

2 commits

  • Have them keep an nfsd_file reference instead of a struct file.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • With the new file caching infrastructure in nfsd, we can end up holding
    files open for an indefinite period of time, even when they are still
    idle. This may prevent the kernel from handing out leases on the file,
    which is something we don't want to block.

    Fix this by running a SRCU notifier call chain whenever on any
    lease attempt. nfsd can then purge the cache for that inode before
    returning.

    Since SRCU is only conditionally compiled in, we must only define the
    new chain if it's enabled, and users of the chain must ensure that
    SRCU is enabled.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

25 Jul, 2019

1 commit

  • Since commit 778fc546f749c588aa2f ("locks: fix tracking of inprogress
    lease breaks"), leases break don't change @fl_type but modifies
    @fl_flags. However, procfs's part haven't been updated.

    Previously, for a breaking lease the target type was printed (see
    target_leasetype()), as returns fcntl(F_GETLEASE). But now it's always
    "READ", as F_UNLCK no longer means "breaking". Unlike the previous
    one, this behaviour don't provide a complete description of the lease.

    There are /proc/pid/fdinfo/ outputs for a lease (the same for READ and
    WRITE) breaked by O_WRONLY.
    -- before:
    lock: 1: LEASE BREAKING READ 2558 08:03:815793 0 EOF
    -- after:
    lock: 1: LEASE BREAKING UNLCK 2558 08:03:815793 0 EOF

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jeff Layton

    Pavel Begunkov
     

11 Jul, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new /proc/fs/nfsd/clients/ directory which exposes some
    long-requested information about NFSv4 clients (like open files)
    and allows forced revocation of client state.

    - Replace the global duplicate reply cache by a cache per network
    namespace; previously, a request in one network namespace could
    incorrectly match an entry from another, though we haven't seen
    this in production. This is the last remaining container bug that
    I'm aware of; at this point you should be able to run separate
    nfsd's in each network namespace, each with their own set of
    exports, and everything should work.

    - Cleanup and modify lock code to show the pid of lockd as the owner
    of NLM locks. This is the correct version of the bugfix originally
    attempted in b8eee0e90f97 ("lockd: Show pid of lockd for remote
    locks")"

    * tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
    nfsd: Make __get_nfsdfs_client() static
    nfsd: Make two functions static
    nfsd: Fix misuse of strlcpy
    sunrpc/cache: remove the exporting of cache_seq_next
    nfsd: decode implementation id
    nfsd: create xdr_netobj_dup helper
    nfsd: allow forced expiration of NFSv4 clients
    nfsd: create get_nfsdfs_clp helper
    nfsd4: show layout stateids
    nfsd: show lock and deleg stateids
    nfsd4: add file to display list of client's opens
    nfsd: add more information to client info file
    nfsd: escape high characters in binary data
    nfsd: copy client's address including port number to cl_addr
    nfsd4: add a client info file
    nfsd: make client/ directory names small ints
    nfsd: add nfsd/clients directory
    nfsd4: use reference count to free client
    nfsd: rename cl_refcount
    nfsd: persist nfsd filesystem across mounts
    ...

    Linus Torvalds
     

04 Jul, 2019

1 commit


19 Jun, 2019

2 commits

  • check_conflicting_open() is checking for existing fd's open for read or
    for write before allowing to take a write lease. The check that was
    implemented using i_count and d_count is an approximation that has
    several false positives. For example, overlayfs since v4.19, takes an
    extra reference on the dentry; An open with O_PATH takes a reference on
    the dentry although the file cannot be read nor written.

    Change the implementation to use i_readcount and i_writecount to
    eliminate the false positive conflicts and allow a write lease to be
    taken on an overlayfs file.

    The change of behavior with existing fd's open with O_PATH is symmetric
    w.r.t. current behavior of lease breakers - an open with O_PATH currently
    does not break a write lease.

    This increases the size of struct inode by 4 bytes on 32bit archs when
    CONFIG_FILE_LOCKING is defined and CONFIG_IMA was not already
    defined.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jeff Layton

    Amir Goldstein
     
  • Signed-off-by: Ira Weiny
    Signed-off-by: Jeff Layton

    Ira Weiny
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 May, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "This consists mostly of nfsd container work:

    Scott Mayhew revived an old api that communicates with a userspace
    daemon to manage some on-disk state that's used to track clients
    across server reboots. We've been using a usermode_helper upcall for
    that, but it's tough to run those with the right namespaces, so a
    daemon is much friendlier to container use cases.

    Trond fixed nfsd's handling of user credentials in user namespaces. He
    also contributed patches that allow containers to support different
    sets of NFS protocol versions.

    The only remaining container bug I'm aware of is that the NFS reply
    cache is shared between all containers. If anyone's aware of other
    gaps in our container support, let me know.

    The rest of this is miscellaneous bugfixes"

    * tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits)
    nfsd: update callback done processing
    locks: move checks from locks_free_lock() to locks_release_private()
    nfsd: fh_drop_write in nfsd_unlink
    nfsd: allow fh_want_write to be called twice
    nfsd: knfsd must use the container user namespace
    SUNRPC: rsi_parse() should use the current user namespace
    SUNRPC: Fix the server AUTH_UNIX userspace mappings
    lockd: Pass the user cred from knfsd when starting the lockd server
    SUNRPC: Temporary sockets should inherit the cred from their parent
    SUNRPC: Cache the process user cred in the RPC server listener
    nfsd: Allow containers to set supported nfs versions
    nfsd: Add custom rpcbind callbacks for knfsd
    SUNRPC: Allow further customisation of RPC program registration
    SUNRPC: Clean up generic dispatcher code
    SUNRPC: Add a callback to initialise server requests
    SUNRPC/nfs: Fix return value for nfs4_callback_compound()
    nfsd: handle legacy client tracking records sent by nfsdcld
    nfsd: re-order client tracking method selection
    nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld
    nfsd: un-deprecate nfsdcld
    ...

    Linus Torvalds
     

08 May, 2019

1 commit

  • …kernel/git/gustavoars/linux

    Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
    "Mark switch cases where we are expecting to fall through.

    This is part of the ongoing efforts to enable -Wimplicit-fallthrough.

    Most of them have been baking in linux-next for a whole development
    cycle. And with Stephen Rothwell's help, we've had linux-next
    nag-emails going out for newly introduced code that triggers
    -Wimplicit-fallthrough to avoid gaining more of these cases while we
    work to remove the ones that are already present.

    We are getting close to completing this work. Currently, there are
    only 32 of 2311 of these cases left to be addressed in linux-next. I'm
    auditing every case; I take a look into the code and analyze it in
    order to determine if I'm dealing with an actual bug or a false
    positive, as explained here:

    https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/

    While working on this, I've found and fixed the several missing
    break/return bugs, some of them introduced more than 5 years ago.

    Once this work is finished, we'll be able to universally enable
    "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
    entering the kernel again"

    * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
    memstick: mark expected switch fall-throughs
    drm/nouveau/nvkm: mark expected switch fall-throughs
    NFC: st21nfca: Fix fall-through warnings
    NFC: pn533: mark expected switch fall-throughs
    block: Mark expected switch fall-throughs
    ASN.1: mark expected switch fall-through
    lib/cmdline.c: mark expected switch fall-throughs
    lib: zstd: Mark expected switch fall-throughs
    scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
    scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
    scsi: ppa: mark expected switch fall-through
    scsi: osst: mark expected switch fall-throughs
    scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
    scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
    scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
    scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
    scsi: imm: mark expected switch fall-throughs
    scsi: csiostor: csio_wr: mark expected switch fall-through
    ...

    Linus Torvalds