22 Jul, 2019

1 commit

  • Pull cifs fixes from Steve French:
    "Two fixes for stable, one that had dependency on earlier patch in this
    merge window and can now go in, and a perf improvement in SMB3 open"

    * tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: update internal module number
    cifs: flush before set-info if we have writeable handles
    smb3: optimize open to not send query file internal info
    cifs: copy_file_range needs to strip setuid bits and update timestamps
    CIFS: fix deadlock in cached root handling

    Linus Torvalds
     

21 Jul, 2019

1 commit

  • Pull dcache and mountpoint updates from Al Viro:
    "Saner handling of refcounts to mountpoints.

    Transfer the counting reference from struct mount ->mnt_mountpoint
    over to struct mountpoint ->m_dentry. That allows us to get rid of the
    convoluted games with ordering of mount shutdowns.

    The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
    mixed-filesystem shrink lists, which we'll also need for the Slab
    Movable Objects patchset"

    * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch the remnants of releasing the mountpoint away from fs_pin
    get rid of detach_mnt()
    make struct mountpoint bear the dentry reference to mountpoint, not struct mount
    Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
    fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
    __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
    nfs: dget_parent() never returns NULL
    ceph: don't open-code the check for dead lockref

    Linus Torvalds
     

20 Jul, 2019

4 commits

  • Pull iomap split/cleanup from Darrick Wong:
    "As promised, here's the second part of the iomap merge for 5.3, in
    which we break up iomap.c into smaller files grouped by functional
    area so that it'll be easier in the long run to maintain cohesiveness
    of code units and to review incoming patches. There are no functional
    changes and fs/iomap.c split cleanly.

    Summary:

    - Regroup the fs/iomap.c code by major functional area so that we can
    start development for 5.4 from a more stable base"

    * tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    iomap: move internal declarations into fs/iomap/
    iomap: move the main iteration code into a separate file
    iomap: move the buffered IO code into a separate file
    iomap: move the direct IO code into a separate file
    iomap: move the SEEK_HOLE code into a separate file
    iomap: move the file mapping reporting code into a separate file
    iomap: move the swapfile code into a separate file
    iomap: start moving code to fs/iomap/

    Linus Torvalds
     
  • Pull adfs updates from Al Viro:
    "More ADFS patches from Russell King"

    * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/adfs: add time stamp and file type helpers
    fs/adfs: super: limit idlen according to directory type
    fs/adfs: super: fix use-after-free bug
    fs/adfs: super: safely update options on remount
    fs/adfs: super: correct superblock flags
    fs/adfs: clean up indirect disc addresses and fragment IDs
    fs/adfs: clean up error message printing
    fs/adfs: use %pV for error messages
    fs/adfs: use format_version from disc_record
    fs/adfs: add helper to get filesystem size
    fs/adfs: add helper to get discrecord from map
    fs/adfs: correct disc record structure

    Linus Torvalds
     
  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     
  • Merge yet more updates from Andrew Morton:
    "The rest of MM and a kernel-wide procfs cleanup.

    Summary of the more significant patches:

    - Patch series "mm/memory_hotplug: Factor out memory block
    devicehandling", v3. David Hildenbrand.

    Some spring-cleaning of the memory hotplug code, notably in
    drivers/base/memory.c

    - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
    Shi.

    Fix /proc/pid/smaps output for THP pages used in shmem.

    - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.

    Bugfix and speedup for kernel/resource.c

    - Patch series "mm: Further memory block device cleanups", David
    Hildenbrand.

    More spring-cleaning of the memory hotplug code.

    - Patch series "mm: Sub-section memory hotplug support". Dan
    Williams.

    Generalise the memory hotplug code so that pmem can use it more
    completely. Then remove the hacks from the libnvdimm code which
    were there to work around the memory-hotplug code's constraints.

    - "proc/sysctl: add shared variables for range check", Matteo Croce.

    We have about 250 instances of

    int zero;
    ...
    .extra1 = &zero,

    in the tree. This is a tree-wide sweep to make all those private
    "zero"s and "one"s use global variables.

    Alas, it isn't practical to make those two global integers const"

    * emailed patches from Andrew Morton : (38 commits)
    proc/sysctl: add shared variables for range check
    mm: migrate: remove unused mode argument
    mm/sparsemem: cleanup 'section number' data types
    libnvdimm/pfn: stop padding pmem namespaces to section alignment
    libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
    mm/devm_memremap_pages: enable sub-section remap
    mm: document ZONE_DEVICE memory-model implications
    mm/sparsemem: support sub-section hotplug
    mm/sparsemem: prepare for sub-section ranges
    mm: kill is_dev_zone() helper
    mm/hotplug: kill is_dev_zone() usage in __remove_pages()
    mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
    mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
    mm/sparsemem: add helpers track active portions of a section at boot
    mm/sparsemem: introduce a SECTION_IS_EARLY flag
    mm/sparsemem: introduce struct mem_section_usage
    drivers/base/memory.c: get rid of find_memory_block_hinted()
    mm/memory_hotplug: move and simplify walk_memory_blocks()
    mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
    mm: make register_mem_sect_under_node() static
    ...

    Linus Torvalds
     

19 Jul, 2019

18 commits

  • In the sysctl code the proc_dointvec_minmax() function is often used to
    validate the user supplied value between an allowed range. This
    function uses the extra1 and extra2 members from struct ctl_table as
    minimum and maximum allowed value.

    On sysctl handler declaration, in every source file there are some
    readonly variables containing just an integer which address is assigned
    to the extra1 and extra2 members, so the sysctl range is enforced.

    The special values 0, 1 and INT_MAX are very often used as range
    boundary, leading duplication of variables like zero=0, one=1,
    int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

    Add a const int array containing the most commonly used values, some
    macros to refer more easily to the correct array member, and use them
    instead of creating a local one for every object file.

    This is the bloat-o-meter output comparing the old and new binary
    compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data old new delta
    sysctl_vals - 12 +12
    __kstrtab_sysctl_vals - 12 +12
    max 14 10 -4
    int_max 16 - -16
    one 68 - -68
    zero 128 28 -100
    Total: Before=20583249, After=20583085, chg -0.00%

    [mcroce@redhat.com: tipc: remove two unused variables]
    Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
    [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
    [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
    Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
    [akpm@linux-foundation.org: fix fs/eventpoll.c]
    Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
    Signed-off-by: Matteo Croce
    Signed-off-by: Arnd Bergmann
    Acked-by: Kees Cook
    Reviewed-by: Aaron Tomlin
    Cc: Matthew Wilcox
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matteo Croce
     
  • migrate_page_move_mapping() doesn't use the mode argument. Remove it
    and update callers accordingly.

    Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.com
    Signed-off-by: Keith Busch
    Reviewed-by: Zi Yan
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keith Busch
     
  • Commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
    vma") introduced THPeligible bit for processes' smaps. But, when
    checking the eligibility for shmem vma, __transparent_hugepage_enabled()
    is called to override the result from shmem_huge_enabled(). It may
    result in the anonymous vma's THP flag override shmem's. For example,
    running a simple test which create THP for shmem, but with anonymous THP
    disabled, when reading the process's smaps, it may show:

    7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
    Size: 4096 kB
    ...
    [snip]
    ...
    ShmemPmdMapped: 4096 kB
    ...
    [snip]
    ...
    THPeligible: 0

    And, /proc/meminfo does show THP allocated and PMD mapped too:

    ShmemHugePages: 4096 kB
    ShmemPmdMapped: 4096 kB

    This doesn't make too much sense. The shmem objects should be treated
    separately from anonymous THP. Calling shmem_huge_enabled() with
    checking MMF_DISABLE_THP sounds good enough. And, we could skip stack
    and dax vma check since we already checked if the vma is shmem already.

    Also check if vma is suitable for THP by calling
    transhuge_vma_suitable().

    And minor fix to smaps output format and documentation.

    Link: http://lkml.kernel.org/r/1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com
    Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
    Signed-off-by: Yang Shi
    Acked-by: Hugh Dickins
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     
  • To 2.21

    Signed-off-by: Steve French

    Steve French
     
  • Servers can defer destaging any data and updating the mtime until close().
    This means that if we do a setinfo to modify the mtime while other handles
    are open for write the server may overwrite our setinfo timestamps when
    if flushes the file on close() of the writeable handle.

    To solve this we add an explicit flush when the mtime is about to
    be updated.

    This fixes "cp -p" to preserve mtime when copying a file onto an SMB2 share.

    CC: Stable
    Signed-off-by: Ronnie Sahlberg
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Ronnie Sahlberg
     
  • We can cut one third of the traffic on open by not querying the
    inode number explicitly via SMB3 query_info since it is now
    returned on open in the qfid context.

    This is better in multiple ways, and
    speeds up file open about 10% (more if network is slow).

    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Steve French
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable fixes:

    - SUNRPC: Ensure bvecs are re-synced when we re-encode the RPC
    request

    - Fix an Oops in ff_layout_track_ds_error due to a PTR_ERR()
    dereference

    - Revert buggy NFS readdirplus optimisation

    - NFSv4: Handle the special Linux file open access mode

    - pnfs: Fix a problem where we gratuitously start doing I/O through
    the MDS

    Features:

    - Allow NFS client to set up multiple TCP connections to the server
    using a new 'nconnect=X' mount option. Queue length is used to
    balance load.

    - Enhance statistics reporting to report on all transports when using
    multiple connections.

    - Speed up SUNRPC by removing bh-safe spinlocks

    - Add a mechanism to allow NFSv4 to request that containers set a
    unique per-host identifier for when the hostname is not set.

    - Ensure NFSv4 updates the lease_time after a clientid update

    Bugfixes and cleanup:

    - Fix use-after-free in rpcrdma_post_recvs

    - Fix a memory leak when nfs_match_client() is interrupted

    - Fix buggy file access checking in NFSv4 open for execute

    - disable unsupported client side deduplication

    - Fix spurious client disconnections

    - Fix occasional RDMA transport deadlock

    - Various RDMA cleanups

    - Various tracepoint fixes

    - Fix the TCP callback channel to guarantee the server can actually
    send the number of callback requests that was negotiated at mount
    time"

    * tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
    pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS
    pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
    SUNRPC: Optimise transport balancing code
    SUNRPC: Ensure the bvecs are reset when we re-encode the RPC request
    pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error
    NFSv4: Don't use the zero stateid with layoutget
    SUNRPC: Fix up backchannel slot table accounting
    SUNRPC: Fix initialisation of struct rpc_xprt_switch
    SUNRPC: Skip zero-refcount transports
    SUNRPC: Replace division by multiplication in calculation of queue length
    NFSv4: Validate the stateid before applying it to state recovery
    nfs4.0: Refetch lease_time after clientid update
    nfs4: Rename nfs41_setup_state_renewal
    nfs4: Make nfs4_proc_get_lease_time available for nfs4.0
    nfs: Fix copy-and-paste error in debug message
    NFS: Replace 16 seq_printf() calls by seq_puts()
    NFS: Use seq_putc() in nfs_show_stats()
    Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
    SUNRPC: Fix transport accounting when caller specifies an rpc_xprt
    NFS: Record task, client ID, and XID in xdr_status trace points
    ...

    Linus Torvalds
     
  • Add tracepoints to allow debugging of the event chain leading to
    a pnfs fallback to doing I/O through the MDS.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If the client has to stop in pnfs_update_layout() to wait for another
    layoutget to complete, it currently exits and defaults to I/O through
    the MDS if the layoutget was successful.

    Fixes: d03360aaf5cc ("pNFS: Ensure we return the error if someone kills...")
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v4.20+

    Trond Myklebust
     
  • cifs has both source and destination inodes locked throughout the copy.
    Like ->write_iter(), we update mtime and strip setuid bits of destination
    file before copy and like ->read_iter(), we update atime of source file
    after copy.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Steve French

    Amir Goldstein
     
  • Prevent deadlock between open_shroot() and
    cifs_mark_open_files_invalid() by releasing the lock before entering
    SMB2_open, taking it again after and checking if we still need to use
    the result.

    Link: https://lore.kernel.org/linux-cifs/684ed01c-cbca-2716-bc28-b0a59a0f8521@prodrive-technologies.com/T/#u
    Fixes: 3d4ef9a15343 ("smb3: fix redundant opens on root")
    Signed-off-by: Aurelien Aptel
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French
    CC: Stable

    Aurelien Aptel
     
  • mirror->mirror_ds can be NULL if uninitialised, but can contain
    a PTR_ERR() if call to GETDEVICEINFO failed.

    Fixes: 65990d1afbd2 ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # 4.10+

    Trond Myklebust
     
  • The NFSv4.1 protocol explicitly forbids us from using the zero stateid
    together with layoutget, so when we see that nfs4_select_rw_stateid()
    is unable to return a valid delegation, lock or open stateid, then
    we should initiate recovery and retry.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Pull xfs cleanups from Darrick Wong:
    "We had a few more lateish cleanup patches come in for 5.3 -- a couple
    of syncups with the userspace libxfs code and a conversion of the XFS
    administrator's guide to ReST format.

    Summary:

    - Bring fs/xfs/libxfs/xfs_trans_inode.c in sync with userspace
    libxfs.

    - Convert the xfs administrator guide to rst and move it into the
    official admin guide under Documentation"

    * tag 'xfs-5.3-merge-13' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    Documentation: filesystem: Convert xfs.txt to ReST
    xfs: sync up xfs_trans_inode with userspace
    xfs: move xfs_trans_inode.c to libxfs/

    Linus Torvalds
     
  • Pull cifs updates from Steve French:
    "Fixes (three for stable) and improvements including much faster
    encryption (SMB3.1.1 GCM)"

    * tag '4.3-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (27 commits)
    smb3: smbdirect no longer experimental
    cifs: fix crash in smb2_compound_op()/smb2_set_next_command()
    cifs: fix crash in cifs_dfs_do_automount
    cifs: fix parsing of symbolic link error response
    cifs: refactor and clean up arguments in the reparse point parsing
    SMB3: query inode number on open via create context
    smb3: Send netname context during negotiate protocol
    smb3: do not send compression info by default
    smb3: add new mount option to retrieve mode from special ACE
    smb3: Allow query of symlinks stored as reparse points
    cifs: Fix a race condition with cifs_echo_request
    cifs: always add credits back for unsolicited PDUs
    fs: cifs: cifsssmb: Change return type of convert_ace_to_cifs_ace
    add some missing definitions
    cifs: fix typo in debug message with struct field ia_valid
    smb3: minor cleanup of compound_send_recv
    CIFS: Fix module dependency
    cifs: simplify code by removing CONFIG_CIFS_ACL ifdef
    cifs: Fix check for matching with existing mount
    cifs: Properly handle auto disabling of serverino option
    ...

    Linus Torvalds
     
  • Pull ceph updates from Ilya Dryomov:
    "Lots of exciting things this time!

    - support for rbd object-map and fast-diff features (myself). This
    will speed up reads, discards and things like snap diffs on sparse
    images.

    - ceph.snap.btime vxattr to expose snapshot creation time (David
    Disseldorp). This will be used to integrate with "Restore Previous
    Versions" feature added in Windows 7 for folks who reexport ceph
    through SMB.

    - security xattrs for ceph (Zheng Yan). Only selinux is supported for
    now due to the limitations of ->dentry_init_security().

    - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
    Layton). This is actually a single feature bit which was missing
    because of the filesystem pieces. With this in, the kernel client
    will finally be reported as "luminous" by "ceph features" -- it is
    still being reported as "jewel" even though all required Luminous
    features were implemented in 4.13.

    - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
    with xattrs is to not terminate and this was causing
    inconsistencies with ceph-fuse.

    - change filesystem time granularity from 1 us to 1 ns, again fixing
    an inconsistency with ceph-fuse (Luis Henriques).

    On top of this there are some additional dentry name handling and cap
    flushing fixes from Zheng. Finally, Jeff is formally taking over for
    Zheng as the filesystem maintainer"

    * tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
    ceph: fix end offset in truncate_inode_pages_range call
    ceph: use generic_delete_inode() for ->drop_inode
    ceph: use ceph_evict_inode to cleanup inode's resource
    ceph: initialize superblock s_time_gran to 1
    MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
    rbd: setallochint only if object doesn't exist
    rbd: support for object-map and fast-diff
    rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
    libceph: export osd_req_op_data() macro
    libceph: change ceph_osdc_call() to take page vector for response
    libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
    rbd: new exclusive lock wait/wake code
    rbd: quiescing lock should wait for image requests
    rbd: lock should be quiesced on reacquire
    rbd: introduce copyup state machine
    rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
    rbd: move OSD request allocation into object request state machines
    rbd: factor out __rbd_osd_setup_discard_ops()
    rbd: factor out rbd_osd_setup_copyup()
    rbd: introduce obj_req->osd_reqs list
    ...

    Linus Torvalds
     
  • Pull dax updates from Dan Williams:
    "The fruits of a bug hunt in the fsdax implementation with Willy and a
    small feature update for device-dax:

    - Fix a hang condition that started triggering after the Xarray
    conversion of fsdax in the v4.20 kernel.

    - Add a 'resource' (root-only physical base address) sysfs attribute
    to device-dax instances to correlate memory-blocks onlined via the
    kmem driver with a given device instance"

    * tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    dax: Fix missed wakeup with PMD faults
    device-dax: Add a 'resource' attribute

    Linus Torvalds
     
  • Pull libnvdimm updates from Dan Williams:
    "Primarily just the virtio_pmem driver:

    - virtio_pmem

    The new virtio_pmem facility introduces a paravirtualized
    persistent memory device that allows a guest VM to use DAX
    mechanisms to access a host-file with host-page-cache. It arranges
    for MAP_SYNC to be disabled and instead triggers a host fsync()
    when a 'write-cache flush' command is sent to the virtual disk
    device.

    - Miscellaneous small fixups"

    * tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    virtio_pmem: fix sparse warning
    xfs: disable map_sync for async flush
    ext4: disable map_sync for async flush
    dax: check synchronous mapping is supported
    dm: enable synchronous dax
    libnvdimm: add dax_dev sync flag
    virtio-pmem: Add virtio pmem driver
    libnvdimm: nd_region flush callback support
    libnvdimm, namespace: Drop uuid_t implementation detail

    Linus Torvalds
     

18 Jul, 2019

1 commit


17 Jul, 2019

15 commits

  • Merge more updates from Andrew Morton:
    "VM:
    - z3fold fixes and enhancements by Henry Burns and Vitaly Wool

    - more accurate reclaimed slab caches calculations by Yafang Shao

    - fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
    Christoph Hellwig

    - !CONFIG_MMU fixes by Christoph Hellwig

    - new novmcoredd parameter to omit device dumps from vmcore, by
    Kairui Song

    - new test_meminit module for testing heap and pagealloc
    initialization, by Alexander Potapenko

    - ioremap improvements for huge mappings, by Anshuman Khandual

    - generalize kprobe page fault handling, by Anshuman Khandual

    - device-dax hotplug fixes and improvements, by Pavel Tatashin

    - enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V

    - add pte_devmap() support for arm64, by Robin Murphy

    - unify locked_vm accounting with a helper, by Daniel Jordan

    - several misc fixes

    core/lib:
    - new typeof_member() macro including some users, by Alexey Dobriyan

    - make BIT() and GENMASK() available in asm, by Masahiro Yamada

    - changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
    code generation, by Alexey Dobriyan

    - rbtree code size optimizations, by Michel Lespinasse

    - convert struct pid count to refcount_t, by Joel Fernandes

    get_maintainer.pl:
    - add --no-moderated switch to skip moderated ML's, by Joe Perches

    misc:
    - ptrace PTRACE_GET_SYSCALL_INFO interface

    - coda updates

    - gdb scripts, various"

    [ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]

    * emailed patches from Andrew Morton : (100 commits)
    fs/select.c: use struct_size() in kmalloc()
    mm: add account_locked_vm utility function
    arm64: mm: implement pte_devmap support
    mm: introduce ARCH_HAS_PTE_DEVMAP
    mm: clean up is_device_*_page() definitions
    mm/mmap: move common defines to mman-common.h
    mm: move MAP_SYNC to asm-generic/mman-common.h
    device-dax: "Hotremove" persistent memory that is used like normal RAM
    mm/hotplug: make remove_memory() interface usable
    device-dax: fix memory and resource leak if hotplug fails
    include/linux/lz4.h: fix spelling and copy-paste errors in documentation
    ipc/mqueue.c: only perform resource calculation if user valid
    include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
    scripts/gdb: add helpers to find and list devices
    scripts/gdb: add lx-genpd-summary command
    drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
    kernel/pid.c: convert struct pid count to refcount_t
    drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
    select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
    select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
    ...

    Linus Torvalds
     
  • Move internal function declarations out of fs/internal.h into
    include/linux/iomap.h so that our transition is complete.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Move the main iteration code into a separate file so that we can group
    related functions in a single file instead of having a single enormous
    source file.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Move the buffered IO code into a separate file so that we can group
    related functions in a single file instead of having a single enormous
    source file.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Move the direct IO code into a separate file so that we can group
    related functions in a single file instead of having a single enormous
    source file.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Move the SEEK_HOLE/SEEK_DATA code into a separate file so that we can
    group related functions in a single file instead of having a single
    enormous source file.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Move the file mapping reporting code (FIEMAP/FIBMAP) into a separate
    file so that we can group related functions in a single file instead of
    having a single enormous source file.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Move the swapfile activation code into a separate file so that we can
    group related functions in a single file instead of having a single
    enormous source file.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • We used to need rather convoluted ordering trickery to guarantee
    that dput() of ex-mountpoints happens before the final mntput()
    of the same. Since we don't need that anymore, there's no point
    playing with fs_pin for that.

    Signed-off-by: Al Viro

    Al Viro
     
  • Lift getting the original mount (dentry is actually not needed at all)
    of the mountpoint into the callers - to do_move_mount() and pivot_root()
    level. That simplifies the cleanup in those and allows to get saner
    arguments for attach_mnt_recursive().

    Signed-off-by: Al Viro

    Al Viro
     
  • Using dput_to_list() to shift the contributing reference from ->mnt_mountpoint
    to ->mnt_mp->m_dentry. Dentries are dropped (with dput_to_list()) as soon
    as struct mountpoint is destroyed; in cases where we are under namespace_sem
    we use the global list, shrinking it in namespace_unlock(). In case of
    detaching stuck MNT_LOCKed children at final mntput_no_expire() we use a local
    list and shrink it ourselves. ->mnt_ex_mountpoint crap is gone.

    Signed-off-by: Al Viro

    Al Viro
     
  • RocksDB can hang indefinitely when using a DAX file. This is due to
    a bug in the XArray conversion when handling a PMD fault and finding a
    PTE entry. We use the wrong index in the hash and end up waiting on
    the wrong waitqueue.

    There's actually no need to wait; if we find a PTE entry while looking
    for a PMD entry, we can return immediately as we know we should fall
    back to a PTE fault (which may not conflict with the lock held).

    We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found.
    This value can never be found in an XArray while holding its lock, so
    it does not create an ambiguity.

    Cc:
    Link: http://lkml.kernel.org/r/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4jsnuwLGyL_UEG4A@mail.gmail.com
    Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray")
    Signed-off-by: Matthew Wilcox (Oracle)
    Tested-by: Dan Williams
    Reported-by: Robert Barror
    Reported-by: Seema Pandit
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Matthew Wilcox (Oracle)
     
  • One of the more common cases of allocation size calculations is finding
    the size of a structure that has a zero-sized array at the end, along
    with memory for some number of elements for that array. For example:

    struct foo {
    int stuff;
    struct boo entry[];
    };

    size = sizeof(struct foo) + count * sizeof(struct boo);
    instance = kmalloc(size, GFP_KERNEL);

    Instead of leaving these open-coded and prone to type mistakes, we can now
    use the new struct_size() helper:

    instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

    Also, notice that variable size is unnecessary, hence it is removed.

    This code was detected with the help of Coccinelle.

    Link: http://lkml.kernel.org/r/20190604164226.GA13823@embeddedor
    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gustavo A. R. Silva
     
  • Now that restore_saved_sigmask_unless() is always called with the same
    argument right before poll_select_copy_remaining() we can move it into
    poll_select_copy_remaining() and make it the only caller of restore() in
    fs/select.c.

    The patch also renames poll_select_copy_remaining(),
    poll_select_finish() looks better after this change.

    kern_select() doesn't use set_user_sigmask(), so in this case
    poll_select_finish() does restore_saved_sigmask_unless() "for no
    reason". But this won't hurt, and WARN_ON(!TIF_SIGPENDING) is still
    valid.

    Link: http://lkml.kernel.org/r/20190606140915.GC13440@redhat.com
    Signed-off-by: Oleg Nesterov
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: David Laight
    Cc: Davidlohr Bueso
    Cc: Deepa Dinamani
    Cc: Eric W. Biederman
    Cc: Eric Wong
    Cc: Jason Baron
    Cc: Jens Axboe
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_poll() returns -EINTR if interrupted and after that all its callers
    have to translate it into -ERESTARTNOHAND. Change do_poll() to return
    -ERESTARTNOHAND and update (simplify) the callers.

    Note that this also unifies all users of restore_saved_sigmask_unless(),
    see the next patch.

    Linus:

    : The *right* return value will actually be then chosen by
    : poll_select_copy_remaining(), which will turn ERESTARTNOHAND to EINTR
    : when it can't update the timeout.
    :
    : Except for the cases that use restart_block and do that instead and
    : don't have the whole timeout restart issue as a result.

    Link: http://lkml.kernel.org/r/20190606140852.GB13440@redhat.com
    Signed-off-by: Oleg Nesterov
    Acked-by: Linus Torvalds
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: David Laight
    Cc: Davidlohr Bueso
    Cc: Deepa Dinamani
    Cc: Eric W. Biederman
    Cc: Eric Wong
    Cc: Jason Baron
    Cc: Jens Axboe
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov