04 May, 2017

14 commits

  • Merge misc updates from Andrew Morton:

    - a few misc things

    - most of MM

    - KASAN updates

    * emailed patches from Andrew Morton : (102 commits)
    kasan: separate report parts by empty lines
    kasan: improve double-free report format
    kasan: print page description after stacks
    kasan: improve slab object description
    kasan: change report header
    kasan: simplify address description logic
    kasan: change allocation and freeing stack traces headers
    kasan: unify report headers
    kasan: introduce helper functions for determining bug type
    mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
    mm: hwpoison: call shake_page() unconditionally
    mm/swapfile.c: fix swap space leak in error path of swap_free_entries()
    mm/gup.c: fix access_ok() argument type
    mm/truncate: avoid pointless cleancache_invalidate_inode() calls.
    mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty
    fs/block_dev: always invalidate cleancache in invalidate_bdev()
    fs: fix data invalidation in the cleancache during direct IO
    zram: reduce load operation in page_same_filled
    zram: use zram_free_page instead of open-coded
    zram: introduce zram data accessor
    ...

    Linus Torvalds
     
  • invalidate_bdev() calls cleancache_invalidate_inode() iff ->nrpages != 0
    which doen't make any sense.

    Make sure that invalidate_bdev() always calls cleancache_invalidate_inode()
    regardless of mapping->nrpages value.

    Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
    Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Reviewed-by: Jan Kara
    Acked-by: Konrad Rzeszutek Wilk
    Cc: Alexander Viro
    Cc: Ross Zwisler
    Cc: Jens Axboe
    Cc: Johannes Weiner
    Cc: Alexey Kuznetsov
    Cc: Christoph Hellwig
    Cc: Nikolay Borisov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • Patch series "Properly invalidate data in the cleancache", v2.

    We've noticed that after direct IO write, buffered read sometimes gets
    stale data which is coming from the cleancache. The reason for this is
    that some direct write hooks call call invalidate_inode_pages2[_range]()
    conditionally iff mapping->nrpages is not zero, so we may not invalidate
    data in the cleancache.

    Another odd thing is that we check only for ->nrpages and don't check
    for ->nrexceptional, but invalidate_inode_pages2[_range] also
    invalidates exceptional entries as well. So we invalidate exceptional
    entries only if ->nrpages != 0? This doesn't feel right.

    - Patch 1 fixes direct IO writes by removing ->nrpages check.
    - Patch 2 fixes similar case in invalidate_bdev().
    Note: I only fixed conditional cleancache_invalidate_inode() here.
    Do we also need to add ->nrexceptional check in into invalidate_bdev()?

    - Patches 3-4: some optimizations.

    This patch (of 4):

    Some direct IO write fs hooks call invalidate_inode_pages2[_range]()
    conditionally iff mapping->nrpages is not zero. This can't be right,
    because invalidate_inode_pages2[_range]() also invalidate data in the
    cleancache via cleancache_invalidate_inode() call. So if page cache is
    empty but there is some data in the cleancache, buffered read after
    direct IO write would get stale data from the cleancache.

    Also it doesn't feel right to check only for ->nrpages because
    invalidate_inode_pages2[_range] invalidates exceptional entries as well.

    Fix this by calling invalidate_inode_pages2[_range]() regardless of
    nrpages state.

    Note: nfs,cifs,9p doesn't need similar fix because the never call
    cleancache_get_page() (nor directly, nor via mpage_readpage[s]()), so
    they are not affected by this bug.

    Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
    Link: http://lkml.kernel.org/r/20170424164135.22350-2-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Reviewed-by: Jan Kara
    Acked-by: Konrad Rzeszutek Wilk
    Cc: Alexander Viro
    Cc: Ross Zwisler
    Cc: Jens Axboe
    Cc: Johannes Weiner
    Cc: Alexey Kuznetsov
    Cc: Christoph Hellwig
    Cc: Nikolay Borisov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • kjournald2 is central to the transaction commit processing. As such any
    potential allocation from this kernel thread has to be GFP_NOFS. Make
    sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20170306131408.9828-8-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • now that we have memalloc_nofs_{save,restore} api we can mark the whole
    transaction context as implicitly GFP_NOFS. All allocations will
    automatically inherit GFP_NOFS this way. This means that we do not have
    to mark any of those requests with GFP_NOFS and moreover all the
    ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
    GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.

    [akpm@linux-foundation.org: tweak comments]
    Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • kmem_zalloc_large and _xfs_buf_map_pages use memalloc_noio_{save,restore}
    API to prevent from reclaim recursion into the fs because vmalloc can
    invoke unconditional GFP_KERNEL allocations and these functions might be
    called from the NOFS contexts. The memalloc_noio_save will enforce
    GFP_NOIO context which is even weaker than GFP_NOFS and that seems to be
    unnecessary. Let's use memalloc_nofs_{save,restore} instead as it
    should provide exactly what we need here - implicit GFP_NOFS context.

    Link: http://lkml.kernel.org/r/20170306131408.9828-6-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Acked-by: Vlastimil Babka
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Jan Kara
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • GFP_NOFS context is used for the following 5 reasons currently:

    - to prevent from deadlocks when the lock held by the allocation
    context would be needed during the memory reclaim

    - to prevent from stack overflows during the reclaim because the
    allocation is performed from a deep context already

    - to prevent lockups when the allocation context depends on other
    reclaimers to make a forward progress indirectly

    - just in case because this would be safe from the fs POV

    - silence lockdep false positives

    Unfortunately overuse of this allocation context brings some problems to
    the MM. Memory reclaim is much weaker (especially during heavy FS
    metadata workloads), OOM killer cannot be invoked because the MM layer
    doesn't have enough information about how much memory is freeable by the
    FS layer.

    In many cases it is far from clear why the weaker context is even used
    and so it might be used unnecessarily. We would like to get rid of
    those as much as possible. One way to do that is to use the flag in
    scopes rather than isolated cases. Such a scope is declared when really
    necessary, tracked per task and all the allocation requests from within
    the context will simply inherit the GFP_NOFS semantic.

    Not only this is easier to understand and maintain because there are
    much less problematic contexts than specific allocation requests, this
    also helps code paths where FS layer interacts with other layers (e.g.
    crypto, security modules, MM etc...) and there is no easy way to convey
    the allocation context between the layers.

    Introduce memalloc_nofs_{save,restore} API to control the scope of
    GFP_NOFS allocation context. This is basically copying
    memalloc_noio_{save,restore} API we have for other restricted allocation
    context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
    just an alias for PF_FSTRANS which has been xfs specific until recently.
    There are no more PF_FSTRANS users anymore so let's just drop it.

    PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
    implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
    is renamed to current_gfp_context because it now cares about both
    PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
    their semantic. kmem_flags_convert() doesn't need to evaluate the flag
    anymore.

    This patch shouldn't introduce any functional changes.

    Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
    usage as much as possible and only use a properly documented
    memalloc_nofs_{save,restore} checkpoints where they are appropriate.

    [akpm@linux-foundation.org: fix comment typo, reflow comment]
    Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Jan Kara
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite
    some time ago. We would like to make this concept more generic and use
    it for other filesystems as well. Let's start by giving the flag a more
    generic name PF_MEMALLOC_NOFS which is in line with an exiting
    PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO
    contexts. Replace all PF_FSTRANS usage from the xfs code in the first
    step before we introduce a full API for it as xfs uses the flag directly
    anyway.

    This patch doesn't introduce any functional change.

    Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Acked-by: Vlastimil Babka
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Jan Kara
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Show MADV_FREE pages info of each vma in smaps. The interface is for
    diganose or monitoring purpose, userspace could use it to understand
    what happens in the application. Since userspace could dirty MADV_FREE
    pages without notice from kernel, this interface is the only place we
    can get accurate accounting info about MADV_FREE pages.

    [mhocko@kernel.org: update Documentation/filesystems/proc.txt]
    Link: http://lkml.kernel.org/r/89efde633559de1ec07444f2ef0f4963a97a2ce8.1487965799.git.shli@fb.com
    Signed-off-by: Shaohua Li
    Acked-by: Johannes Weiner
    Acked-by: Minchan Kim
    Acked-by: Michal Hocko
    Acked-by: Hillf Danton
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Use offset_in_page() macro instead of open-coding.

    Link: http://lkml.kernel.org/r/4dbc77ccaaed98b183cf4dba58a4fa325fd65048.1492758503.git.geliangtang@gmail.com
    Signed-off-by: Geliang Tang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geliang Tang
     
  • Configfs is the interface for ocfs2-tools to set configure to kernel and
    $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one
    used to configure heartbeat dead threshold. Kernel has a default value
    of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb
    to override it.

    Commit 45b997737a80 ("ocfs2/cluster: use per-attribute show and store
    methods") changed heartbeat dead threshold name while ocfs2-tools did
    not, so ocfs2-tools won't set this configurable and the default value is
    always used. So revert it.

    Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods")
    Link: http://lkml.kernel.org/r/1490665245-15374-1-git-send-email-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Use setup_timer() instead of init_timer() to simplify the code.

    Link: http://lkml.kernel.org/r/5e75bf07beb91e092d5aa36c36769949a480456a.1489060564.git.geliangtang@gmail.com
    Signed-off-by: Geliang Tang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geliang Tang
     
  • Pull quota, reiserfs, udf and ext2 updates from Jan Kara:
    "The branch contains changes to quota code so that it does not modify
    persistent flags in inode->i_flags (it was the only place in kernel
    doing that) and handle it inside filesystem's quotaon/off handlers
    instead.

    The branch also contains two UDF cleanups, a couple of reiserfs fixes
    and one fix for ext2 quota locking"

    * 'generic' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext4: Improve comments in ext4_quota_{on|off}()
    udf: use kmap_atomic for memcpy copying
    udf: use octal for permissions
    quota: Remove dquot_quotactl_ops
    reiserfs: Remove i_attrs_to_sd_attrs()
    reiserfs: Remove useless setting of i_flags
    jfs: Remove jfs_get_inode_flags()
    ext2: Remove ext2_get_inode_flags()
    ext4: Remove ext4_get_inode_flags()
    quota: Stop setting IMMUTABLE and NOATIME flags on quota files
    jfs: Set flags on quota files directly
    ext2: Set flags on quota files directly
    reiserfs: Set flags on quota files directly
    ext4: Set flags on quota files directly
    reiserfs: Protect dquot_writeback_dquots() by s_umount semaphore
    reiserfs: Make cancel_old_flush() reliable
    ext2: Call dquot_writeback_dquots() with s_umount held
    reiserfs: avoid a -Wmaybe-uninitialized warning

    Linus Torvalds
     
  • Pull fsnotify updates from Jan Kara:
    "The branch contains mainly a rework of fsnotify infrastructure fixing
    a shortcoming that we have waited for response to fanotify permission
    events with SRCU read lock held and when the process consuming events
    was slow to respond the kernel has stalled.

    It also contains several cleanups of unnecessary indirections in
    fsnotify framework and a bugfix from Amir fixing leakage of kernel
    internal errno to userspace"

    * 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (37 commits)
    fanotify: don't expose EOPENSTALE to userspace
    fsnotify: remove a stray unlock
    fsnotify: Move ->free_mark callback to fsnotify_ops
    fsnotify: Add group pointer in fsnotify_init_mark()
    fsnotify: Drop inode_mark.c
    fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark()
    fsnotify: Remove fsnotify_detach_group_marks()
    fsnotify: Rename fsnotify_clear_marks_by_group_flags()
    fsnotify: Inline fsnotify_clear_{inode|vfsmount}_mark_group()
    fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask()
    fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()
    fanotify: Release SRCU lock when waiting for userspace response
    fsnotify: Pass fsnotify_iter_info into handle_event handler
    fsnotify: Provide framework for dropping SRCU lock in ->handle_event
    fsnotify: Remove special handling of mark destruction on group shutdown
    fsnotify: Detach mark from object list when last reference is dropped
    fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()
    inotify: Do not drop mark reference under idr_lock
    fsnotify: Free fsnotify_mark_connector when there is no mark attached
    fsnotify: Lock object list with connector lock
    ...

    Linus Torvalds
     

03 May, 2017

9 commits

  • Pull security subsystem updates from James Morris:
    "Highlights:

    IMA:
    - provide ">" and " of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits)
    tpm: Fix reference count to main device
    tpm_tis: convert to using locality callbacks
    tpm: fix handling of the TPM 2.0 event logs
    tpm_crb: remove a cruft constant
    keys: select CONFIG_CRYPTO when selecting DH / KDF
    apparmor: Make path_max parameter readonly
    apparmor: fix parameters so that the permission test is bypassed at boot
    apparmor: fix invalid reference to index variable of iterator line 836
    apparmor: use SHASH_DESC_ON_STACK
    security/apparmor/lsm.c: set debug messages
    apparmor: fix boolreturn.cocci warnings
    Smack: Use GFP_KERNEL for smk_netlbl_mls().
    smack: fix double free in smack_parse_opts_str()
    KEYS: add SP800-56A KDF support for DH
    KEYS: Keyring asymmetric key restrict method with chaining
    KEYS: Restrict asymmetric key linkage using a specific keychain
    KEYS: Add a lookup_restriction function for the asymmetric key type
    KEYS: Add KEYCTL_RESTRICT_KEYRING
    KEYS: Consistent ordering for __key_link_begin and restrict check
    KEYS: Add an optional lookup_restriction hook to key_type
    ...

    Linus Torvalds
     
  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    tty: fix comment for __tty_alloc_driver()
    init/main: properly align the multi-line comment
    init/main: Fix double "the" in comment
    Fix dead URLs to ftp.kernel.org
    drivers: Clean up duplicated email address
    treewide: Fix typo in xml/driver-api/basics.xml
    tools/testing/selftests/powerpc: remove redundant CFLAGS in Makefile: "-Wall -O2 -Wall" -> "-O2 -Wall"
    selftests/timers: Spelling s/privledges/privileges/
    HID: picoLCD: Spelling s/REPORT_WRTIE_MEMORY/REPORT_WRITE_MEMORY/
    net: phy: dp83848: Fix Typo
    UBI: Fix typos
    Documentation: ftrace.txt: Correct nice value of 120 priority
    net: fec: Fix typo in error msg and comment
    treewide: Fix typos in printk

    Linus Torvalds
     
  • Pull livepatch updates from Jiri Kosina:

    - a per-task consistency model is being added for architectures that
    support reliable stack dumping (extending this, currently rather
    trivial set, is currently in the works).

    This extends the nature of the types of patches that can be applied
    by live patching infrastructure. The code stems from the design
    proposal made [1] back in November 2014. It's a hybrid of SUSE's
    kGraft and RH's kpatch, combining advantages of both: it uses
    kGraft's per-task consistency and syscall barrier switching combined
    with kpatch's stack trace switching. There are also a number of
    fallback options which make it quite flexible.

    Most of the heavy lifting done by Josh Poimboeuf with help from
    Miroslav Benes and Petr Mladek

    [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz

    - module load time patch optimization from Zhou Chengming

    - a few assorted small fixes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: add missing printk newlines
    livepatch: Cancel transition a safe way for immediate patches
    livepatch: Reduce the time of finding module symbols
    livepatch: make klp_mutex proper part of API
    livepatch: allow removal of a disabled patch
    livepatch: add /proc//patch_state
    livepatch: change to a per-task consistency model
    livepatch: store function sizes
    livepatch: use kstrtobool() in enabled_store()
    livepatch: move patching functions into patch.c
    livepatch: remove unnecessary object loaded check
    livepatch: separate enabled and patched states
    livepatch/s390: add TIF_PATCH_PENDING thread flag
    livepatch/s390: reorganize TIF thread flag bits
    livepatch/powerpc: add TIF_PATCH_PENDING thread flag
    livepatch/x86: add TIF_PATCH_PENDING thread flag
    livepatch: create temporary klp_update_patch_state() stub
    x86/entry: define _TIF_ALLWORK_MASK flags explicitly
    stacktrace/x86: add function for detecting reliable stack traces

    Linus Torvalds
     
  • Pull networking updates from David Millar:
    "Here are some highlights from the 2065 networking commits that
    happened this development cycle:

    1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

    2) Add a generic XDP driver, so that anyone can test XDP even if they
    lack a networking device whose driver has explicit XDP support
    (me).

    3) Sparc64 now has an eBPF JIT too (me)

    4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
    Starovoitov)

    5) Make netfitler network namespace teardown less expensive (Florian
    Westphal)

    6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

    7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

    8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

    9) Multiqueue support in stmmac driver (Joao Pinto)

    10) Remove TCP timewait recycling, it never really could possibly work
    well in the real world and timestamp randomization really zaps any
    hint of usability this feature had (Soheil Hassas Yeganeh)

    11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
    Aleksandrov)

    12) Add socket busy poll support to epoll (Sridhar Samudrala)

    13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
    and several others)

    14) IPSEC hw offload infrastructure (Steffen Klassert)"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
    tipc: refactor function tipc_sk_recv_stream()
    tipc: refactor function tipc_sk_recvmsg()
    net: thunderx: Optimize page recycling for XDP
    net: thunderx: Support for XDP header adjustment
    net: thunderx: Add support for XDP_TX
    net: thunderx: Add support for XDP_DROP
    net: thunderx: Add basic XDP support
    net: thunderx: Cleanup receive buffer allocation
    net: thunderx: Optimize CQE_TX handling
    net: thunderx: Optimize RBDR descriptor handling
    net: thunderx: Support for page recycling
    ipx: call ipxitf_put() in ioctl error path
    net: sched: add helpers to handle extended actions
    qed*: Fix issues in the ptp filter config implementation.
    qede: Fix concurrency issue in PTP Tx path processing.
    stmmac: Add support for SIMATIC IOT2000 platform
    net: hns: fix ethtool_get_strings overflow in hns driver
    tcp: fix wraparound issue in tcp_lp
    bpf, arm64: fix jit branch offset related to ldimm64
    bpf, arm64: implement jiting of BPF_XADD
    ...

    Linus Torvalds
     
  • Pull fs/compat.c cleanups from Al Viro:
    "More moving of compat syscalls from fs/compat.c to fs/*.c where the
    native counterparts live.

    And death to compat_sys_getdents64() - the only architecture that used
    to need it was ia64, and _that_ has lost biarch support quite a few
    years ago"

    * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/compat.c: trim unused includes
    move compat_rw_copy_check_uvector() over to fs/read_write.c
    fhandle: move compat syscalls from compat.c
    open: move compat syscalls from compat.c
    stat: move compat syscalls from compat.c
    fcntl: move compat syscalls from compat.c
    readdir: move compat syscalls from compat.c
    statfs: move compat syscalls from compat.c
    utimes: move compat syscalls from compat.c
    move compat select-related syscalls to fs/select.c
    Remove compat_sys_getdents64()

    Linus Torvalds
     
  • Pull splice updates from Al Viro:
    "These actually missed the last cycle; the branch itself is from last
    December"

    * 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    make nr_pages calculation in default_file_splice_read() a bit less ugly
    splice/tee/vmsplice: validate flags
    splice_pipe_desc: kill ->flags
    remove spd_release_page()

    Linus Torvalds
     
  • Pull iov_iter updates from Al Viro:
    "Cleanups that sat in -next + -stable fodder that has just missed 4.11.

    There's more iov_iter work in my local tree, but I'd prefer to push
    the stuff that had been in -next first"

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    iov_iter: don't revert iov buffer if csum error
    generic_file_read_iter(): make use of iov_iter_revert()
    generic_file_direct_write(): make use of iov_iter_revert()
    orangefs: use iov_iter_revert()
    sctp: switch to copy_from_iter_full()
    net/9p: switch to copy_from_iter_full()
    switch memcpy_from_msg() to copy_from_iter_full()
    rds: make use of iov_iter_revert()

    Linus Torvalds
     
  • Pull CIFS fixes from Steve French:
    "Three cifs/smb3 fixes - including two for stable"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: don't check for failure from mempool_alloc()
    Do not return number of bytes written for ioctl CIFS_IOC_COPYCHUNK_FILE
    Fix match_prepath()

    Linus Torvalds
     
  • Pull pstore updates from Kees Cook:
    "This has a large internal refactoring along with several smaller
    fixes.

    - constify compression structures; Bhumika Goyal

    - restore powerpc dumping; Ankit Kumar

    - fix more bugs in the rarely exercises module unloading logic

    - reorganize filesystem locking to fix problems noticed by lockdep

    - refactor internal pstore APIs to make development and review
    easier:
    - improve error reporting
    - add kernel-doc structure and function comments
    - avoid insane argument passing by using a common record
    structure"

    * tag 'pstore-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
    pstore: Solve lockdep warning by moving inode locks
    pstore: Fix flags to enable dumps on powerpc
    pstore: Remove unused vmalloc.h in pmsg
    pstore: simplify write_user_compat()
    pstore: Remove write_buf() callback
    pstore: Replace arguments for write_buf_user() API
    pstore: Replace arguments for write_buf() API
    pstore: Replace arguments for erase() API
    pstore: Do not duplicate record metadata
    pstore: Allocate records on heap instead of stack
    pstore: Pass record contents instead of copying
    pstore: Always allocate buffer for decompression
    pstore: Replace arguments for write() API
    pstore: Replace arguments for read() API
    pstore: Switch pstore_mkfile to pass record
    pstore: Move record decompression to function
    pstore: Extract common arguments into structure
    pstore: Add kernel-doc for struct pstore_info
    pstore: Improve register_pstore() error reporting
    pstore: Avoid race in module unloading
    ...

    Linus Torvalds
     

02 May, 2017

4 commits

  • Pul x86/process updates from Ingo Molnar:
    "The main change in this cycle was to add the ARCH_[GET|SET]_CPUID
    prctl() ABI extension to control the availability of the CPUID
    instruction, analogously to the existing PR_GET|SET_TSC ABI that
    controls RDTSC.

    Motivation: the 'rr' user-space record-and-replay execution debugger
    would like to trap and emulate the CPUID instruction - which
    instruction is normally unprivileged.

    Trapping CPUID is possible on IvyBridge and later Intel CPUs - expose
    this hardware capability"

    * 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/syscalls/32: Ignore arch_prctl for other architectures
    um/arch_prctl: Fix fallout from x86 arch_prctl() rework
    x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
    x86/cpufeature: Detect CPUID faulting support
    x86/syscalls/32: Wire up arch_prctl on x86-32
    x86/arch_prctl: Add do_arch_prctl_common()
    x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64()
    x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl()
    x86/arch_prctl: Rename 'code' argument to 'option'
    x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES
    x86/process: Optimize TIF_NOTSC switch
    x86/process: Correct and optimize TIF_BLOCKSTEP switch
    x86/process: Optimize TIF checks in __switch_to_xtra()

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - another round of rq-clock handling debugging, robustization and
    fixes

    - PELT accounting improvements

    - CPU hotplug related ->cpus_allowed affinity handling fixes all
    around the tree

    - ... plus misc fixes, cleanups and updates"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
    sched/x86: Update reschedule warning text
    crypto: N2 - Replace racy task affinity logic
    cpufreq/sparc-us2e: Replace racy task affinity logic
    cpufreq/sparc-us3: Replace racy task affinity logic
    cpufreq/sh: Replace racy task affinity logic
    cpufreq/ia64: Replace racy task affinity logic
    ACPI/processor: Replace racy task affinity logic
    ACPI/processor: Fix error handling in __acpi_processor_start()
    sparc/sysfs: Replace racy task affinity logic
    powerpc/smp: Replace open coded task affinity logic
    ia64/sn/hwperf: Replace racy task affinity logic
    ia64/salinfo: Replace racy task affinity logic
    workqueue: Provide work_on_cpu_safe()
    ia64/topology: Remove cpus_allowed manipulation
    sched/fair: Move the PELT constants into a generated header
    sched/fair: Increase PELT accuracy for small tasks
    sched/fair: Fix comments
    sched/Documentation: Add 'sched-pelt' tool
    sched/fair: Fix corner case in __accumulate_sum()
    sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
    ...

    Linus Torvalds
     
  • Pull uaccess unification updates from Al Viro:
    "This is the uaccess unification pile. It's _not_ the end of uaccess
    work, but the next batch of that will go into the next cycle. This one
    mostly takes copy_from_user() and friends out of arch/* and gets the
    zero-padding behaviour in sync for all architectures.

    Dealing with the nocache/writethrough mess is for the next cycle;
    fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
    sold on access_ok() in there, BTW; just not in this pile), same for
    reducing __copy_... callsites, strn*... stuff, etc. - there will be a
    pile about as large as this one in the next merge window.

    This one sat in -next for weeks. -3KLoC"

    * 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
    HAVE_ARCH_HARDENED_USERCOPY is unconditional now
    CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
    m32r: switch to RAW_COPY_USER
    hexagon: switch to RAW_COPY_USER
    microblaze: switch to RAW_COPY_USER
    get rid of padding, switch to RAW_COPY_USER
    ia64: get rid of copy_in_user()
    ia64: sanitize __access_ok()
    ia64: get rid of 'segment' argument of __do_{get,put}_user()
    ia64: get rid of 'segment' argument of __{get,put}_user_check()
    ia64: add extable.h
    powerpc: get rid of zeroing, switch to RAW_COPY_USER
    esas2r: don't open-code memdup_user()
    alpha: fix stack smashing in old_adjtimex(2)
    don't open-code kernel_setsockopt()
    mips: switch to RAW_COPY_USER
    mips: get rid of tail-zeroing in primitives
    mips: make copy_from_user() zero tail explicitly
    mips: clean and reorder the forest of macros...
    mips: consolidate __invoke_... wrappers
    ...

    Linus Torvalds
     
  • Pull block layer updates from Jens Axboe:

    - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
    was initially a fork of CFQ, but subsequently changed to implement
    fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
    to be used on desktop type single drives, providing good fairness.
    From Paolo.

    - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
    using a scalable token based algorithm that throttles IO based on
    live completion IO stats, similary to blk-wbt. From Omar.

    - A series from Jan, moving users to separately allocated backing
    devices. This continues the work of separating backing device life
    times, solving various problems with hot removal.

    - A series of updates for lightnvm, mostly from Javier. Includes a
    'pblk' target that exposes an open channel SSD as a physical block
    device.

    - A series of fixes and improvements for nbd from Josef.

    - A series from Omar, removing queue sharing between devices on mostly
    legacy drivers. This helps us clean up other bits, if we know that a
    queue only has a single device backing. This has been overdue for
    more than a decade.

    - Fixes for the blk-stats, and improvements to unify the stats and user
    windows. This both improves blk-wbt, and enables other users to
    register a need to receive IO stats for a device. From Omar.

    - blk-throttle improvements from Shaohua. This provides a scalable
    framework for implementing scalable priotization - particularly for
    blk-mq, but applicable to any type of block device. The interface is
    marked experimental for now.

    - Bucketized IO stats for IO polling from Stephen Bates. This improves
    efficiency of polled workloads in the presence of mixed block size
    IO.

    - A few fixes for opal, from Scott.

    - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
    From a variety of folks, mostly Sagi and James Smart.

    - A series from Bart, improving our exposed info and capabilities from
    the blk-mq debugfs support.

    - A series from Christoph, cleaning up how handle WRITE_ZEROES.

    - A series from Christoph, cleaning up the block layer handling of how
    we track errors in a request. On top of being a nice cleanup, it also
    shrinks the size of struct request a bit.

    - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
    never used by platforms, and the latter has outlived it's usefulness.

    - Various little bug fixes and cleanups from a wide variety of folks.

    * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
    block: hide badblocks attribute by default
    blk-mq: unify hctx delay_work and run_work
    block: add kblock_mod_delayed_work_on()
    blk-mq: unify hctx delayed_run_work and run_work
    nbd: fix use after free on module unload
    MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
    blk-mq-sched: alloate reserved tags out of normal pool
    mtip32xx: use runtime tag to initialize command header
    scsi: Implement blk_mq_ops.show_rq()
    blk-mq: Add blk_mq_ops.show_rq()
    blk-mq: Show operation, cmd_flags and rq_flags names
    blk-mq: Make blk_flags_show() callers append a newline character
    blk-mq: Move the "state" debugfs attribute one level down
    blk-mq: Unregister debugfs attributes earlier
    blk-mq: Only unregister hctxs for which registration succeeded
    blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
    blk-mq: Let blk_mq_debugfs_register() look up the queue name
    blk-mq: Register /queue/mq after having registered /queue
    ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
    ide-pm: always pass 0 error to __blk_end_request_all
    ..

    Linus Torvalds
     

29 Apr, 2017

1 commit


28 Apr, 2017

9 commits

  • mempool_alloc() cannot fail if the gfp flags allow it to
    sleep, and both GFP_FS allows for sleeping.

    So these tests of the return value from mempool_alloc()
    cannot be needed.

    Signed-off-by: NeilBrown
    Signed-off-by: Steve French

    NeilBrown
     
  • commit 620d8745b35d ("Introduce cifs_copy_file_range()") changes the
    behaviour of the cifs ioctl call CIFS_IOC_COPYCHUNK_FILE. In case of
    successful writes, it now returns the number of bytes written. This
    return value is treated as an error by the xfstest cifs/001. Depending
    on the errno set at that time, this may or may not result in the test
    failing.

    The patch fixes this by setting the return value to 0 in case of
    successful writes.

    Fixes: commit 620d8745b35d ("Introduce cifs_copy_file_range()")
    Reported-by: Eryu Guan
    Signed-off-by: Sachin Prabhu
    Acked-by: Pavel Shilovsky
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French

    Sachin Prabhu
     
  • Incorrect return value for shares not using the prefix path means that
    we will never match superblocks for these shares.

    Fixes: commit c1d8b24d1819 ("Compare prepaths when comparing superblocks")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sachin Prabhu
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Sachin Prabhu
     
  • Lockdep complains about a possible deadlock between mount and unlink
    (which is technically impossible), but fixing this improves possible
    future multiple-backend support, and keeps locking in the right order.

    The lockdep warning could be triggered by unlinking a file in the
    pstore filesystem:

    -> #1 (&sb->s_type->i_mutex_key#14){++++++}:
    lock_acquire+0xc9/0x220
    down_write+0x3f/0x70
    pstore_mkfile+0x1f4/0x460
    pstore_get_records+0x17a/0x320
    pstore_fill_super+0xa4/0xc0
    mount_single+0x89/0xb0
    pstore_mount+0x13/0x20
    mount_fs+0xf/0x90
    vfs_kern_mount+0x66/0x170
    do_mount+0x190/0xd50
    SyS_mount+0x90/0xd0
    entry_SYSCALL_64_fastpath+0x1c/0xb1

    -> #0 (&psinfo->read_mutex){+.+.+.}:
    __lock_acquire+0x1ac0/0x1bb0
    lock_acquire+0xc9/0x220
    __mutex_lock+0x6e/0x990
    mutex_lock_nested+0x16/0x20
    pstore_unlink+0x3f/0xa0
    vfs_unlink+0xb5/0x190
    do_unlinkat+0x24c/0x2a0
    SyS_unlinkat+0x16/0x30
    entry_SYSCALL_64_fastpath+0x1c/0xb1

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&sb->s_type->i_mutex_key#14);
    lock(&psinfo->read_mutex);
    lock(&sb->s_type->i_mutex_key#14);
    lock(&psinfo->read_mutex);

    Reported-by: Marta Lofstedt
    Reported-by: Chris Wilson
    Signed-off-by: Kees Cook
    Acked-by: Namhyung Kim

    Kees Cook
     
  • Since the vmalloc code has been removed from write_pmsg() in the commit
    "5bf6d1b pstore/pmsg: drop bounce buffer", remove the unused header
    vmalloc.h.

    Signed-off-by: Geliang Tang
    Signed-off-by: Kees Cook

    Geliang Tang
     
  • Pull nfsd fixes from Bruce Fields:
    "Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs
    in our NFSv2/v3 xdr code that could crash the server or leak memory"

    * tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux:
    nfsd: stricter decoding of write-like NFSv2/v3 ops
    nfsd4: minor NFSv2/v3 write decoding cleanup
    nfsd: check for oversized NFSv2/v3 arguments

    Linus Torvalds
     
  • Pull ceph fix from Ilya Dryomov:
    "A fix for a kernel stack overflow bug in ceph setattr code, marked for
    stable"

    * tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client:
    ceph: fix recursion between ceph_set_acl() and __ceph_setattr()

    Linus Torvalds
     
  • Pull vfs fixes from Al Viro:

    - fix orangefs handling of faults on write() - I'd missed that one back
    when orangefs was going through review.

    - readdir counterpart of "9p: cope with bogus responses from server in
    p9_client_{read,write}" - server might be lying or broken, and we'd
    better not overrun the kmalloc'ed buffer we are copying the results
    into.

    - NFS O_DIRECT read/write can leave iov_iter advanced by too much;
    that's what had been causing iov_iter_pipe() warnings davej had been
    seeing.

    - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should
    go in before 4.11.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
    fix nfs O_DIRECT advancing iov_iter too much
    p9_client_readdir() fix
    orangefs_bufmap_copy_from_iovec(): fix EFAULT handling

    Linus Torvalds
     
  • The change in commit 1e2f82d1e9d1 ("statx: Kill fd-with-NULL-path
    support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
    statx() is inconsistent.

    It results in the error EINVAL for a NULL pathname. Other system calls
    with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.

    The solution is simply to remove the EINVAL check. As I already pointed
    out in [1], user_path_at*() and filename_lookup() will handle the NULL
    pathname as per the other APIs, to correctly produce the error EFAULT.

    [1] https://lkml.org/lkml/2017/4/26/561

    Signed-off-by: Michael Kerrisk
    Cc: David Howells
    Cc: Al Viro
    Cc: Eric Sandeen
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     

27 Apr, 2017

3 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • With the new statx() syscall, the following both allow the attributes of
    the file attached to a file descriptor to be retrieved:

    statx(dfd, NULL, 0, ...);

    and:

    statx(dfd, "", AT_EMPTY_PATH, ...);

    Change the code to reject the first option, though this means copying
    the path and engaging pathwalk for the fstat() equivalent. dfd can be a
    non-directory provided path is "".

    [ The timing of this isn't wonderful, but applying this now before we
    have statx() in any released kernel, before anybody starts using the
    NULL special case. - Linus ]

    Fixes: a528d35e8bfc ("statx: Add a system call to make enhanced file info available")
    Reported-by: Michael Kerrisk
    Signed-off-by: David Howells
    cc: Eric Sandeen
    cc: fstests@vger.kernel.org
    cc: linux-api@vger.kernel.org
    cc: linux-man@vger.kernel.org
    Signed-off-by: Linus Torvalds

    David Howells
     
  • …uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess

    Al Viro