16 May, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "This consists mostly of nfsd container work:

    Scott Mayhew revived an old api that communicates with a userspace
    daemon to manage some on-disk state that's used to track clients
    across server reboots. We've been using a usermode_helper upcall for
    that, but it's tough to run those with the right namespaces, so a
    daemon is much friendlier to container use cases.

    Trond fixed nfsd's handling of user credentials in user namespaces. He
    also contributed patches that allow containers to support different
    sets of NFS protocol versions.

    The only remaining container bug I'm aware of is that the NFS reply
    cache is shared between all containers. If anyone's aware of other
    gaps in our container support, let me know.

    The rest of this is miscellaneous bugfixes"

    * tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits)
    nfsd: update callback done processing
    locks: move checks from locks_free_lock() to locks_release_private()
    nfsd: fh_drop_write in nfsd_unlink
    nfsd: allow fh_want_write to be called twice
    nfsd: knfsd must use the container user namespace
    SUNRPC: rsi_parse() should use the current user namespace
    SUNRPC: Fix the server AUTH_UNIX userspace mappings
    lockd: Pass the user cred from knfsd when starting the lockd server
    SUNRPC: Temporary sockets should inherit the cred from their parent
    SUNRPC: Cache the process user cred in the RPC server listener
    nfsd: Allow containers to set supported nfs versions
    nfsd: Add custom rpcbind callbacks for knfsd
    SUNRPC: Allow further customisation of RPC program registration
    SUNRPC: Clean up generic dispatcher code
    SUNRPC: Add a callback to initialise server requests
    SUNRPC/nfs: Fix return value for nfs4_callback_compound()
    nfsd: handle legacy client tracking records sent by nfsdcld
    nfsd: re-order client tracking method selection
    nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld
    nfsd: un-deprecate nfsdcld
    ...

    Linus Torvalds
     

15 May, 2019

32 commits

  • linux/dax.h is included more than once.

    Link: http://lkml.kernel.org/r/5c867e95.1c69fb81.4f15a.e5e4@mx.google.com
    Signed-off-by: Sabyasachi Gupta
    Acked-by: Souptick Joarder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sabyasachi Gupta
     
  • linux/xattr.h is included more than once.

    Link: http://lkml.kernel.org/r/5c86803d.1c69fb81.1a7c6.2b78@mx.google.com
    Signed-off-by: Sabyasachi Gupta
    Acked-by: Souptick Joarder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sabyasachi Gupta
     
  • linux/poll.h is included more than once.

    Link: http://lkml.kernel.org/r/5c86820f.1c69fb81.149f0.0834@mx.google.com
    Signed-off-by: Sabyasachi Gupta
    Acked-by: Souptick Joarder
    Cc: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sabyasachi Gupta
     
  • Fix sparse warning:

    fs/eventfd.c:26:1: warning:
    symbol 'eventfd_ida' was not declared. Should it be static?

    Link: http://lkml.kernel.org/r/20190413142348.34716-1-yuehaibing@huawei.com
    Signed-off-by: YueHaibing
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    YueHaibing
     
  • Finding endpoints of an IPC channel is one of essential task to
    understand how a user program works. Procfs and netlink socket provide
    enough hints to find endpoints for IPC channels like pipes, unix
    sockets, and pseudo terminals. However, there is no simple way to find
    endpoints for an eventfd file from userland. An inode number doesn't
    hint. Unlike pipe, all eventfd files share the same inode object.

    To provide the way to find endpoints of an eventfd file, this patch adds
    "eventfd-id" field to /proc/PID/fdinfo of eventfd as identifier.
    Integers managed by an IDA are used as ids.

    A tool like lsof can utilize the information to print endpoints.

    Link: http://lkml.kernel.org/r/20190327181823.20222-1-yamato@redhat.com
    Signed-off-by: Masatake YAMATO
    Cc: Al Viro
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masatake YAMATO
     
  • ->recursion_depth is changed only by current, therefore decrementing can
    be done without taking any locks.

    Link: http://lkml.kernel.org/r/20190417213150.GA26474@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • fsync() needs to make sure the data & meta-data of file are persistent
    after the return of fsync(), even when a power-failure occurs later. In
    the case of fat-fs, the FAT belongs to the meta-data of file, so we need
    to issue a flush after the writeback of FAT instead before.

    Also bail out early when any stage of fsync fails.

    Link: http://lkml.kernel.org/r/20190409030158.136316-1-houtao1@huawei.com
    Signed-off-by: Hou Tao
    Acked-by: OGAWA Hirofumi
    Cc: Al Viro
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hou Tao
     
  • csum_partial() gives different results for little-endian and big-endian
    hosts. This causes images created on little-endian hosts and mounted on
    big endian hosts to see csum mismatches. This causes an endianness bug.
    Sparse gives a warning as csum_partial returns a restricted integer type
    __wsum_t and xattr_hash expects __u32. This warning acts as a reminder
    for this bug and should not be suppressed.

    This comment aims to convey these endianness issues.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20190423161831.GA15387@bharath12345-Inspiron-5559
    Signed-off-by: Bharath Vedartham
    Cc: Al Viro
    Cc: Jann Horn
    Cc: Jeff Mahoney
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bharath Vedartham
     
  • Commmit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
    made changes in the rare case when the ELF loader was directly invoked
    (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
    the loader), by moving into the mmap region to avoid both ET_EXEC and
    PIE binaries. This had the effect of also moving the brk region into
    mmap, which could lead to the stack and brk being arbitrarily close to
    each other. An unlucky process wouldn't get its requested stack size
    and stack allocations could end up scribbling on the heap.

    This is illustrated here. In the case of using the loader directly, brk
    (so helpfully identified as "[heap]") is allocated with the _loader_ not
    the binary. For example, with ASLR entirely disabled, you can see this
    more clearly:

    $ /bin/cat /proc/self/maps
    555555554000-55555555c000 r-xp 00000000 ... /bin/cat
    55555575b000-55555575c000 r--p 00007000 ... /bin/cat
    55555575c000-55555575d000 rw-p 00008000 ... /bin/cat
    55555575d000-55555577e000 rw-p 00000000 ... [heap]
    ...
    7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
    7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
    7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffe000-7ffff7fff000 rw-p 00000000 ...
    7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]

    $ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps
    ...
    7ffff7bcc000-7ffff7bd4000 r-xp 00000000 ... /bin/cat
    7ffff7bd4000-7ffff7dd3000 ---p 00008000 ... /bin/cat
    7ffff7dd3000-7ffff7dd4000 r--p 00007000 ... /bin/cat
    7ffff7dd4000-7ffff7dd5000 rw-p 00008000 ... /bin/cat
    7ffff7dd5000-7ffff7dfc000 r-xp 00000000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7fb2000-7ffff7fd6000 rw-p 00000000 ...
    7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
    7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
    7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffe000-7ffff8020000 rw-p 00000000 ... [heap]
    7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]

    The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since
    nothing is there in the direct loader case (and ET_EXEC is still far
    away at 0x400000). Anything that ran before should still work (i.e.
    the ultimately-launched binary already had the brk very far from its
    text, so this should be no different from a COMPAT_BRK standpoint). The
    only risk I see here is that if someone started to suddenly depend on
    the entire memory space lower than the mmap region being available when
    launching binaries via a direct loader execs which seems highly
    unlikely, I'd hope: this would mean a binary would _not_ work when
    exec()ed normally.

    (Note that this is only done under CONFIG_ARCH_HAS_ELF_RANDOMIZATION
    when randomization is turned on.)

    Link: http://lkml.kernel.org/r/20190422225727.GA21011@beast
    Link: https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=Omfhx_p0nCKPSjA@mail.gmail.com
    Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
    Signed-off-by: Kees Cook
    Reported-by: Ali Saidi
    Cc: Ali Saidi
    Cc: Guenter Roeck
    Cc: Michal Hocko
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Jann Horn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Get "current_pt_regs" pointer right before usage.

    Space savings on x86_64:

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-180 (-180)
    Function old new delta
    load_elf_binary 5806 5626 -180 !!!

    Looks like the compiler doesn't know that "current_pt_regs" is stable
    pointer (because it doesn't know ->stack isn't) even though it knows
    that "current" is stable pointer. So it saves it in the very beginning
    and then tries to carry it through a lot of code.

    Here is what happens here:

    load_elf_binary()
    ...
    mov rax,QWORD PTR gs:0x14c00
    mov r13,QWORD PTR [rax+0x18] r13 = current->stack
    call kmem_cache_alloc # first kmalloc

    [980 bytes later!]

    # let's spill that sucker because we need a register
    # for "load_bias" calculations at
    #
    # if (interpreter) {
    # load_bias = ELF_ET_DYN_BASE;
    # if (current->flags & PF_RANDOMIZE)
    # load_bias += arch_mmap_rnd();
    # elf_flags |= elf_fixed;
    # }
    mov QWORD PTR [rsp+0x68],r13

    If this is not _the_ root cause it is still eeeeh.

    After the patch things become much simpler:

    mov rax, QWORD PTR gs:0x14c00 # current
    mov rdx, QWORD PTR [rax+0x18] # current->stack
    movq [rdx+0x3fb8], 0 # fill pt_regs
    ...
    call finalize_exec

    Link: http://lkml.kernel.org/r/20190419200343.GA19788@avx2
    Signed-off-by: Alexey Dobriyan
    Tested-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There are two places where mapping protections are calculated: one for
    executable, another one for interpreter -- take them out.

    ELF read and execute permissions are interchanged with Linux PROT_READ
    and PROT_EXEC, microoptimizations are welcome!

    Link: http://lkml.kernel.org/r/20190417213413.GB26474@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190416202002.GB24304@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Rewrite

    for (...) {
    if (->p_type == PT_INTERP) {
    ...
    break;
    }
    }

    loop into

    for (...) {
    if (->p_type != PT_INTERP)
    continue;
    ...
    break;
    }

    Link: http://lkml.kernel.org/r/20190416201906.GA24304@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190314205042.GE18143@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There is no reason for PT_INTERP filename to linger till the end of the
    whole loading process.

    Link: http://lkml.kernel.org/r/20190314204953.GD18143@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Nikitas Angelinas
    Reviewed-by: Andrew Morton
    Cc: Mukesh Ojha
    [nikitas.angelinas@gmail.com: fix GPF when dereferencing invalid interpreter]
    Link: http://lkml.kernel.org/r/20190330140032.GA1527@vostro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190314204707.GC18143@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • As pointed out by zoujc@lenovo.com, setup_arg_pages() already
    initialized current->mm->start_stack.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=202881
    Reported-by:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The name clear_all_latency_tracing is misleading, in fact which only
    clear per task's latency_record[], and we do have another function named
    clear_global_latency_tracing which clear the global latency_record[]
    buffer.

    Link: http://lkml.kernel.org/r/20190226114602.16902-1-linf@wangsu.com
    Signed-off-by: Lin Feng
    Cc: Alexey Dobriyan
    Cc: Fabian Frederick
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lin Feng
     
  • Merge misc updates from Andrew Morton:

    - a few misc things and hotfixes

    - ocfs2

    - almost all of MM

    * emailed patches from Andrew Morton : (139 commits)
    kernel/memremap.c: remove the unused device_private_entry_fault() export
    mm: delete find_get_entries_tag
    mm/huge_memory.c: make __thp_get_unmapped_area static
    mm/mprotect.c: fix compilation warning because of unused 'mm' variable
    mm/page-writeback: introduce tracepoint for wait_on_page_writeback()
    mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags
    mm/Kconfig: update "Memory Model" help text
    mm/vmscan.c: don't disable irq again when count pgrefill for memcg
    mm: memblock: make keeping memblock memory opt-in rather than opt-out
    hugetlbfs: always use address space in inode for resv_map pointer
    mm/z3fold.c: support page migration
    mm/z3fold.c: add structure for buddy handles
    mm/z3fold.c: improve compression by extending search
    mm/z3fold.c: introduce helper functions
    mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist
    mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig
    mm/vmscan.c: simplify shrink_inactive_list()
    fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
    xen/privcmd-buf.c: convert to use vm_map_pages_zero()
    xen/gntdev.c: convert to use vm_map_pages()
    ...

    Linus Torvalds
     
  • Continuing discussion about 58b6e5e8f1ad ("hugetlbfs: fix memory leak for
    resv_map") brought up the issue that inode->i_mapping may not point to the
    address space embedded within the inode at inode eviction time. The
    hugetlbfs truncate routine handles this by explicitly using inode->i_data.
    However, code cleaning up the resv_map will still use the address space
    pointed to by inode->i_mapping. Luckily, private_data is NULL for address
    spaces in all such cases today but, there is no guarantee this will
    continue.

    Change all hugetlbfs code getting a resv_map pointer to explicitly get it
    from the address space embedded within the inode. In addition, add more
    comments in the code to indicate why this is being done.

    Link: http://lkml.kernel.org/r/20190419204435.16984-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reported-by: Yufen Yu
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: "Kirill A . Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
    writeback") claims that sync_file_range(2) syscall was "created for
    userspace to be able to issue background writeout and so waiting for
    in-flight IO is undesirable there" and changes the writeback (back) to
    WB_SYNC_NONE.

    This claim is only partially true. It is true for users that use the flag
    SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was the
    reason for changing to WB_SYNC_NONE writeback.

    However, that claim is not true for users that use that flag combination
    SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}. Those users explicitly
    requested to wait for in-flight IO as well as to writeback of dirty pages.

    Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT and use
    WB_SYNC_ALL writeback to perform the full range sync request.

    Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com
    Link: http://lkml.kernel.org/r/20190419072938.31320-1-amir73il@gmail.com
    Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
    Signed-off-by: Amir Goldstein
    Acked-by: Jan Kara
    Cc: Dave Chinner
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amir Goldstein
     
  • This updates each existing invalidation to use the correct mmu notifier
    event that represent what is happening to the CPU page table. See the
    patch which introduced the events to see the rational behind this.

    Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • CPU page table update can happens for many reasons, not only as a result
    of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
    a result of kernel activities (memory compression, reclaim, migration,
    ...).

    Users of mmu notifier API track changes to the CPU page table and take
    specific action for them. While current API only provide range of virtual
    address affected by the change, not why the changes is happening.

    This patchset do the initial mechanical convertion of all the places that
    calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
    event as well as the vma if it is know (most invalidation happens against
    a given vma). Passing down the vma allows the users of mmu notifier to
    inspect the new vma page protection.

    The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
    should assume that every for the range is going away when that event
    happens. A latter patch do convert mm call path to use a more appropriate
    events for each call.

    This is done as 2 patches so that no call site is forgotten especialy
    as it uses this following coccinelle patch:

    %vm_mm, E3, E4)
    ...>

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(..., struct vm_area_struct *VMA, ...) {
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(...) {
    struct vm_area_struct *VMA;
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN;
    @@
    FN(...) {
    }
    ---------------------------------------------------------------------->%

    Applied with:
    spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
    spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
    spatch --sp-file mmu-notifier.spatch --dir mm --in-place

    Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • hugetlb uses a fault mutex hash table to prevent page faults of the
    same pages concurrently. The key for shared and private mappings is
    different. Shared keys off address_space and file index. Private keys
    off mm and virtual address. Consider a private mappings of a populated
    hugetlbfs file. A fault will map the page from the file and if needed
    do a COW to map a writable page.

    Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
    pages. It uses the address_space file index key. However, private
    mappings will use a different key and could race with this code to map
    the file page. This causes problems (BUG) for the page cache remove
    code as it expects the page to be unmapped. A sample stack is:

    page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
    kernel BUG at mm/filemap.c:169!
    ...
    RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
    ...
    Call Trace:
    __delete_from_page_cache+0x39/0x220
    delete_from_page_cache+0x45/0x70
    remove_inode_hugepages+0x13c/0x380
    ? __add_to_page_cache_locked+0x162/0x380
    hugetlbfs_fallocate+0x403/0x540
    ? _cond_resched+0x15/0x30
    ? __inode_security_revalidate+0x5d/0x70
    ? selinux_file_permission+0x100/0x130
    vfs_fallocate+0x13f/0x270
    ksys_fallocate+0x3c/0x80
    __x64_sys_fallocate+0x1a/0x20
    do_syscall_64+0x5b/0x180
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    There seems to be another potential COW issue/race with this approach
    of different private and shared keys as noted in commit 8382d914ebf7
    ("mm, hugetlb: improve page-fault scalability").

    Since every hugetlb mapping (even anon and private) is actually a file
    mapping, just use the address_space index key for all mappings. This
    results in potentially more hash collisions. However, this should not
    be the common case.

    Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
    Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
    Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
    Signed-off-by: Mike Kravetz
    Reviewed-by: Naoya Horiguchi
    Reviewed-by: Davidlohr Bueso
    Cc: Joonsoo Kim
    Cc: "Kirill A . Shutemov"
    Cc: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • MADV_DONTNEED is handled with mmap_sem taken in read mode. We call
    page_mkclean without holding mmap_sem.

    MADV_DONTNEED implies that pages in the region are unmapped and subsequent
    access to the pages in that range is handled as a new page fault. This
    implies that if we don't have parallel access to the region when
    MADV_DONTNEED is run we expect those range to be unallocated.

    w.r.t page_mkclean() we need to make sure that we don't break the
    MADV_DONTNEED semantics. MADV_DONTNEED check for pmd_none without holding
    pmd_lock. This implies we skip the pmd if we temporarily mark pmd none.
    Avoid doing that while marking the page clean.

    Keep the sequence same for dax too even though we don't support
    MADV_DONTNEED for dax mapping

    The bug was noticed by code review and I didn't observe any failures w.r.t
    test run. This is similar to

    commit 58ceeb6bec86d9140f9d91d71a710e963523d063
    Author: Kirill A. Shutemov
    Date: Thu Apr 13 14:56:26 2017 -0700

    thp: fix MADV_DONTNEED vs. MADV_FREE race

    commit ced108037c2aa542b3ed8b7afd1576064ad1362a
    Author: Kirill A. Shutemov
    Date: Thu Apr 13 14:56:20 2017 -0700

    thp: fix MADV_DONTNEED vs. numa balancing race

    Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Andrew Morton
    Cc: Dan Williams
    Cc:"Kirill A . Shutemov"
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • To facilitate additional options to get_user_pages_fast() change the
    singular write parameter to be gup_flags.

    This patch does not change any functionality. New functionality will
    follow in subsequent patches.

    Some of the get_user_pages_fast() call sites were unchanged because they
    already passed FOLL_WRITE or 0 for the write parameter.

    NOTE: It was suggested to change the ordering of the get_user_pages_fast()
    arguments to ensure that callers were converted. This breaks the current
    GUP call site convention of having the returned pages be the final
    parameter. So the suggestion was rejected.

    Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Mike Marshall
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • Pach series "Add FOLL_LONGTERM to GUP fast and use it".

    HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
    advantages. These pages can be held for a significant time. But
    get_user_pages_fast() does not protect against mapping FS DAX pages.

    Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
    retains the performance while also adding the FS DAX checks. XDP has also
    shown interest in using this functionality.[1]

    In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
    and remove the specialized get_user_pages_longterm call.

    [1] https://lkml.org/lkml/2019/3/19/939

    "longterm" is a relative thing and at this point is probably a misnomer.
    This is really flagging a pin which is going to be given to hardware and
    can't move. I've thought of a couple of alternative names but I think we
    have to settle on if we are going to use FL_LAYOUT or something else to
    solve the "longterm" problem. Then I think we can change the flag to a
    better name.

    Secondly, it depends on how often you are registering memory. I have
    spoken with some RDMA users who consider MR in the performance path...
    For the overall application performance. I don't have the numbers as the
    tests for HFI1 were done a long time ago. But there was a significant
    advantage. Some of which is probably due to the fact that you don't have
    to hold mmap_sem.

    Finally, architecturally I think it would be good for everyone to use
    *_fast. There are patches submitted to the RDMA list which would allow
    the use of *_fast (they reworking the use of mmap_sem) and as soon as they
    are accepted I'll submit a patch to convert the RDMA core as well. Also
    to this point others are looking to use *_fast.

    As an aside, Jasons pointed out in my previous submission that *_fast and
    *_unlocked look very much the same. I agree and I think further cleanup
    will be coming. But I'm focused on getting the final solution for DAX at
    the moment.

    This patch (of 7):

    This patch starts a series which aims to support FOLL_LONGTERM in
    get_user_pages_fast(). Some callers who would like to do a longterm (user
    controlled pin) of pages with the fast variant of GUP for performance
    purposes.

    Rather than have a separate get_user_pages_longterm() call, introduce
    FOLL_LONGTERM and change the longterm callers to use it.

    This patch does not change any functionality. In the short term
    "longterm" or user controlled pins are unsafe for Filesystems and FS DAX
    in particular has been blocked. However, callers of get_user_pages_fast()
    were not "protected".

    FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
    requires vmas to determine if DAX is in use.

    NOTE: In merging with the CMA changes we opt to change the
    get_user_pages() call in check_and_migrate_cma_pages() to a call of
    __get_user_pages_locked() on the newly migrated pages. This makes the
    code read better in that we are calling __get_user_pages_locked() on the
    pages before and after a potential migration.

    As a side affect some of the interfaces are cleaned up but this is not the
    primary purpose of the series.

    In review[1] it was asked:

    > This I don't get - if you do lock down long term mappings performance
    > of the actual get_user_pages call shouldn't matter to start with.
    >
    > What do I miss?

    A couple of points.

    First "longterm" is a relative thing and at this point is probably a
    misnomer. This is really flagging a pin which is going to be given to
    hardware and can't move. I've thought of a couple of alternative names
    but I think we have to settle on if we are going to use FL_LAYOUT or
    something else to solve the "longterm" problem. Then I think we can
    change the flag to a better name.

    Second, It depends on how often you are registering memory. I have spoken
    with some RDMA users who consider MR in the performance path... For the
    overall application performance. I don't have the numbers as the tests
    for HFI1 were done a long time ago. But there was a significant
    advantage. Some of which is probably due to the fact that you don't have
    to hold mmap_sem.

    Finally, architecturally I think it would be good for everyone to use
    *_fast. There are patches submitted to the RDMA list which would allow
    the use of *_fast (they reworking the use of mmap_sem) and as soon as they
    are accepted I'll submit a patch to convert the RDMA core as well. Also
    to this point others are looking to use *_fast.

    As an asside, Jasons pointed out in my previous submission that *_fast and
    *_unlocked look very much the same. I agree and I think further cleanup
    will be coming. But I'm focused on getting the final solution for DAX at
    the moment.

    [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

    [ira.weiny@intel.com: v3]
    Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Andrew Morton
    Cc: Aneesh Kumar K.V
    Cc: Michal Hocko
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Jason Gunthorpe
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "David S. Miller"
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Ralf Baechle
    Cc: James Hogan
    Cc: Dan Williams
    Cc: Mike Marshall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • Userfaultfd can be misued to make it easier to exploit existing
    use-after-free (and similar) bugs that might otherwise only make a
    short window or race condition available. By using userfaultfd to
    stall a kernel thread, a malicious program can keep some state that it
    wrote, stable for an extended period, which it can then access using an
    existing exploit. While it doesn't cause the exploit itself, and while
    it's not the only thing that can stall a kernel thread when accessing a
    memory location, it's one of the few that never needs privilege.

    We can add a flag, allowing userfaultfd to be restricted, so that in
    general it won't be useable by arbitrary user programs, but in
    environments that require userfaultfd it can be turned back on.

    Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
    whether userfaultfd is allowed by unprivileged users. When this is
    set to zero, only privileged users (root user, or users with the
    CAP_SYS_PTRACE capability) will be able to use the userfaultfd
    syscalls.

    Andrea said:

    : The only difference between the bpf sysctl and the userfaultfd sysctl
    : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
    : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
    : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
    : already if it's doing other kind of tracking on processes runtime, in
    : addition of userfaultfd. In other words both syscalls works only for
    : root, when the two sysctl are opt-in set to 1.

    [dgilbert@redhat.com: changelog additions]
    [akpm@linux-foundation.org: documentation tweak, per Mike]
    Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
    Signed-off-by: Peter Xu
    Suggested-by: Andrea Arcangeli
    Suggested-by: Mike Rapoport
    Reviewed-by: Mike Rapoport
    Reviewed-by: Andrea Arcangeli
    Cc: Paolo Bonzini
    Cc: Hugh Dickins
    Cc: Luis Chamberlain
    Cc: Maxime Coquelin
    Cc: Maya Gokhale
    Cc: Jerome Glisse
    Cc: Pavel Emelyanov
    Cc: Johannes Weiner
    Cc: Martin Cracauer
    Cc: Denis Plotnikov
    Cc: Marty McFadden
    Cc: Mike Kravetz
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: "Kirill A . Shutemov"
    Cc: "Dr . David Alan Gilbert"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • In some cases, ocfs2_iget() reads the data of inode, which has been
    deleted for some reason. That will make the system panic. So We should
    judge whether this inode has been deleted, and tell the caller that the
    inode is a bad inode.

    For example, the ocfs2 is used as the backed of nfs, and the client is
    nfsv3. This issue can be reproduced by the following steps.

    on the nfs server side,
    ..../patha/pathb

    Step 1: The process A was scheduled before calling the function fh_verify.

    Step 2: The process B is removing the 'pathb', and just completed the call
    to function dput. Then the dentry of 'pathb' has been deleted from the
    dcache, and all ancestors have been deleted also. The relationship of
    dentry and inode was deleted through the function hlist_del_init. The
    following is the call stack.
    dentry_iput->hlist_del_init(&dentry->d_u.d_alias)

    At this time, the inode is still in the dcache.

    Step 3: The process A call the function ocfs2_get_dentry, which get the
    inode from dcache. Then the refcount of inode is 1. The following is the
    call stack.
    nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)

    Step 4: Dirty pages are flushed by bdi threads. So the inode of 'patha'
    is evicted, and this directory was deleted. But the inode of 'pathb'
    can't be evicted, because the refcount of the inode was 1.

    Step 5: The process A keep running, and call the function
    reconnect_path(in exportfs_decode_fh), which call function
    ocfs2_get_parent of ocfs2. Get the block number of parent
    directory(patha) by the name of ... Then read the data from disk by the
    block number. But this inode has been deleted, so the system panic.

    Process A Process B
    1. in nfsd3_proc_getacl |
    2. | dput
    3. fh_to_dentry(ocfs2_get_dentry) |
    4. bdi flush dirty cache |
    5. ocfs2_iget |

    [283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block:
    Invalid dinode #580640: OCFS2_VALID_FL not set

    [283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced
    after error

    [283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G W
    4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2
    [283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
    Desktop Reference Platform, BIOS 6.00 09/21/2015
    [283465.549657] 0000000000000000 ffff8800a56fb7b8 ffffffff816e839c
    ffffffffa0514758
    [283465.550392] 000000000008dc20 ffff8800a56fb838 ffffffff816e62d3
    0000000000000008
    [283465.551056] ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8
    ffff88005df9f000
    [283465.551710] Call Trace:
    [283465.552516] [] dump_stack+0x63/0x81
    [283465.553291] [] panic+0xcb/0x21b
    [283465.554037] [] ocfs2_handle_error+0xf0/0xf0 [ocfs2]
    [283465.554882] [] __ocfs2_error+0x67/0x70 [ocfs2]
    [283465.555768] [] ocfs2_validate_inode_block+0x229/0x230
    [ocfs2]
    [283465.556683] [] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2]
    [283465.557408] [] ? ocfs2_inode_cache_io_unlock+0x20/0x20
    [ocfs2]
    [283465.557973] [] ocfs2_read_inode_block_full+0x3b/0x60
    [ocfs2]
    [283465.558525] [] ocfs2_iget+0x4aa/0x880 [ocfs2]
    [283465.559082] [] ocfs2_get_parent+0x9e/0x220 [ocfs2]
    [283465.559622] [] reconnect_path+0xb5/0x300
    [283465.560156] [] exportfs_decode_fh+0xf6/0x2b0
    [283465.560708] [] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
    [283465.561262] [] ? prepare_creds+0x26/0x110
    [283465.561932] [] fh_verify+0x350/0x660 [nfsd]
    [283465.562862] [] ? nfsd_cache_lookup+0x44/0x630 [nfsd]
    [283465.563697] [] nfsd3_proc_getattr+0x69/0xf0 [nfsd]
    [283465.564510] [] nfsd_dispatch+0xe0/0x290 [nfsd]
    [283465.565358] [] ? svc_tcp_adjust_wspace+0x12/0x30
    [sunrpc]
    [283465.566272] [] svc_process_common+0x412/0x6a0 [sunrpc]
    [283465.567155] [] svc_process+0x123/0x210 [sunrpc]
    [283465.568020] [] nfsd+0xff/0x170 [nfsd]
    [283465.568962] [] ? nfsd_destroy+0x80/0x80 [nfsd]
    [283465.570112] [] kthread+0xcb/0xf0
    [283465.571099] [] ? kthread_create_on_node+0x180/0x180
    [283465.572114] [] ret_from_fork+0x58/0x90
    [283465.573156] [] ? kthread_create_on_node+0x180/0x180

    Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@oracle.com
    Signed-off-by: Shuning Zhang
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: piaojun
    Cc: "Gang He"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shuning Zhang
     
  • Deduplicate the ocfs2 file type conversion implementation and remove
    OCFS2_FT_* definitions - file systems that use the same file types as
    defined by POSIX do not need to define their own versions and can use the
    common helper functions decared in fs_types.h and implemented in
    fs_types.c

    Common implementation can be found via bbe7449e2599 ("fs: common
    implementation of file type").

    Link: http://lkml.kernel.org/r/20190326213919.GA20878@pathfinder
    Signed-off-by: Amir Goldstein
    Signed-off-by: Phillip Potter
    Reviewed-by: Jan Kara
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Changwei Ge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phillip Potter
     
  • Starting with c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page
    protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls
    pmdp_set_access_flags(). That helper enforces a pmd aligned @address
    argument via VM_BUG_ON() assertion.

    Update the implementation to take a 'struct vm_fault' argument directly
    and apply the address alignment fixup internally to fix crash signatures
    like:

    kernel BUG at arch/x86/mm/pgtable.c:515!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 51 PID: 43713 Comm: java Tainted: G OE 4.19.35 #1
    [..]
    RIP: 0010:pmdp_set_access_flags+0x48/0x50
    [..]
    Call Trace:
    vmf_insert_pfn_pmd+0x198/0x350
    dax_iomap_fault+0xe82/0x1190
    ext4_dax_huge_fault+0x103/0x1f0
    ? __switch_to_asm+0x40/0x70
    __handle_mm_fault+0x3f6/0x1370
    ? __switch_to_asm+0x34/0x70
    ? __switch_to_asm+0x40/0x70
    handle_mm_fault+0xda/0x200
    __do_page_fault+0x249/0x4f0
    do_page_fault+0x32/0x110
    ? page_fault+0x8/0x30
    page_fault+0x1e/0x30

    Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com
    Fixes: c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()")
    Signed-off-by: Dan Williams
    Reported-by: Piotr Balcer
    Tested-by: Yan Ma
    Tested-by: Pankaj Gupta
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Jan Kara
    Reviewed-by: Aneesh Kumar K.V
    Cc: Chandan Rajendra
    Cc: Souptick Joarder
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Pull overlayfs update from Miklos Szeredi:
    "Just bug fixes in this small update"

    * tag 'ovl-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: relax WARN_ON() for overlapping layers use case
    ovl: check the capability before cred overridden
    ovl: do not generate duplicate fsnotify events for "fake" path
    ovl: support stacked SEEK_HOLE/SEEK_DATA
    ovl: fix missing upper fs freeze protection on copy up for ioctl

    Linus Torvalds
     

14 May, 2019

5 commits

  • Pull fuse update from Miklos Szeredi:
    "Add more caching controls for userspace filesystems to use, as well as
    bug fixes and cleanups"

    * tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: clean up fuse_alloc_inode
    fuse: Add ioctl flag for x32 compat ioctl
    fuse: Convert fusectl to use the new mount API
    fuse: fix changelog entry for protocol 7.9
    fuse: fix changelog entry for protocol 7.12
    fuse: document fuse_fsync_in.fsync_flags
    fuse: Add FOPEN_STREAM to use stream_open()
    fuse: require /dev/fuse reads to have enough buffer capacity
    fuse: retrieve: cap requested size to negotiated max_write
    fuse: allow filesystems to have precise control over data cache
    fuse: convert printk -> pr_*
    fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
    fuse: fix writepages on 32bit

    Linus Torvalds
     
  • Pull f2fs updates from Jaegeuk Kim:
    "Another round of various bug fixes came in. Damien improved SMR drive
    support a bit, and Chao replaced BUG_ON() with reporting errors to
    user since we've not hit from users but did hit from crafted images.
    We've found a disk layout bug in large_nat_bits feature which supports
    very large NAT entries enabled at mkfs. If the feature is enabled, it
    will give a notice to run fsck to correct the on-disk layout.

    Enhancements:
    - reduce memory consumption for SMR drive
    - better discard handling for multiple partitions
    - tracepoints for f2fs_file_write_iter/f2fs_filemap_fault
    - allow to change CP_CHKSUM_OFFSET
    - detect wrong layout of large_nat_bitmap feature
    - enhance checking valid data indices

    Bug fixes:
    - Multiple partition support for SMR drive
    - deadlock problem in f2fs_balance_fs_bg
    - add boundary checks to fix abnormal behaviors on fuzzed images
    - inline_xattr space calculations
    - replace f2fs_bug_on with errors

    In addition, this series contains various memory boundary check and
    sanity check of on-disk consistency"

    * tag 'f2fs-for-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
    f2fs: fix to avoid accessing xattr across the boundary
    f2fs: fix to avoid potential race on sbi->unusable_block_count access/update
    f2fs: add tracepoint for f2fs_filemap_fault()
    f2fs: introduce DATA_GENERIC_ENHANCE
    f2fs: fix to handle error in f2fs_disable_checkpoint()
    f2fs: remove redundant check in f2fs_file_write_iter()
    f2fs: fix to be aware of readonly device in write_checkpoint()
    f2fs: fix to skip recovery on readonly device
    f2fs: fix to consider multiple device for readonly check
    f2fs: relocate chksum_offset for large_nat_bitmap feature
    f2fs: allow unfixed f2fs_checkpoint.checksum_offset
    f2fs: Replace spaces with tab
    f2fs: insert space before the open parenthesis '('
    f2fs: allow address pointer number of dnode aligning to specified size
    f2fs: introduce f2fs_read_single_page() for cleanup
    f2fs: mark is_extension_exist() inline
    f2fs: fix to set FI_UPDATE_WRITE correctly
    f2fs: fix to avoid panic in f2fs_inplace_write_data()
    f2fs: fix to do sanity check on valid block count of segment
    f2fs: fix to do sanity check on valid node/block count
    ...

    Linus Torvalds
     
  • If a call to kobject_init_and_add() fails we must call kobject_put()
    otherwise we leak memory.

    Function gfs2_sys_fs_add always calls kobject_init_and_add() which
    always calls kobject_init().

    It is safe to leave object destruction up to the kobject release
    function and never free it manually.

    Remove call to kfree() and always call kobject_put() in the error path.

    Signed-off-by: Tobin C. Harding
    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Linus Torvalds

    Tobin C. Harding
     
  • …nel/git/jack/linux-fs

    Pull fsnotify fixes from Jan Kara:
    "Two fsnotify fixes"

    * tag 'fsnotify_for_v5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    fsnotify: fix unlink performance regression
    fsnotify: Clarify connector assignment in fsnotify_add_mark_list()

    Linus Torvalds
     
  • Pull misc filesystem updates from Jan Kara:
    "A couple of small bugfixes and cleanups for quota, udf, ext2, and
    reiserfs"

    * tag 'fs_for_v5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    quota: check time limit when back out space/inode change
    fs/quota: erase unused but set variable warning
    quota: fix wrong indentation
    udf: fix an uninitialized read bug and remove dead code
    fs/reiserfs/journal.c: Make remove_journal_hash static
    quota: remove trailing whitespaces
    quota: code cleanup for __dquot_alloc_space()
    ext2: Adjust the comment of function ext2_alloc_branch
    udf: Explain handling of load_nls() failure

    Linus Torvalds
     

13 May, 2019

2 commits

  • Pull UBI/UBIFS updates from Richard Weinberger:

    - fscrypt framework usage updates

    - One huge fix for xattr unlink

    - Cleanup of fscrypt ifdefs

    - Fix for our new UBIFS auth feature

    * tag 'upstream-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
    ubi: wl: Fix uninitialized variable
    ubifs: Drop unnecessary setting of zbr->znode
    ubifs: Remove ifdefs around CONFIG_UBIFS_ATIME_SUPPORT
    ubifs: Remove #ifdef around CONFIG_FS_ENCRYPTION
    ubifs: Limit number of xattrs per inode
    ubifs: orphan: Handle xattrs like files
    ubifs: journal: Handle xattrs like files
    ubifs: find.c: replace swap function with built-in one
    ubifs: Do not skip hash checking in data nodes
    ubifs: work around high stack usage with clang
    ubifs: remove unused function __ubifs_shash_final
    ubifs: remove unnecessary #ifdef around fscrypt_ioctl_get_policy()
    ubifs: remove unnecessary calls to set up directory key

    Linus Torvalds
     
  • Pull UML updates from Richard Weinberger:

    - Kconfig cleanups

    - Fix cpu_all_mask() usage

    - Various bug fixes

    * tag 'for-linus-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: irq: don't set the chip for all irqs
    um: define set_pte_at() as a static inline function, not a macro
    um: remove uses of variable length arrays
    um: remove unused variable
    uml: fix a boot splat wrt use of cpu_all_mask
    um: Do not unlock mutex that is not hold.
    hostfs: fix mismatch between link_file definition and declaration
    arch: um: drivers: Kconfig: pedantic formatting
    arch: um: Kconfig: pedantic indention cleanups
    um: Revert to using stack for pt_regs in signal handling

    Linus Torvalds