06 Aug, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     

05 Aug, 2020

2 commits

  • …rnel/git/brauner/linux

    Pull checkpoint-restore updates from Christian Brauner:
    "This enables unprivileged checkpoint/restore of processes.

    Given that this work has been going on for quite some time the first
    sentence in this summary is hopefully more exciting than the actual
    final code changes required. Unprivileged checkpoint/restore has seen
    a frequent increase in interest over the last two years and has thus
    been one of the main topics for the combined containers &
    checkpoint/restore microconference since at least 2018 (cf. [1]).

    Here are just the three most frequent use-cases that were brought forward:

    - The JVM developers are integrating checkpoint/restore into a Java
    VM to significantly decrease the startup time.

    - In high-performance computing environment a resource manager will
    typically be distributing jobs where users are always running as
    non-root. Long-running and "large" processes with significant
    startup times are supposed to be checkpointed and restored with
    CRIU.

    - Container migration as a non-root user.

    In all of these scenarios it is either desirable or required to run
    without CAP_SYS_ADMIN. The userspace implementation of
    checkpoint/restore CRIU already has the pull request for supporting
    unprivileged checkpoint/restore up (cf. [2]).

    To enable unprivileged checkpoint/restore a new dedicated capability
    CAP_CHECKPOINT_RESTORE is introduced. This solution has last been
    discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1]
    "Update on Task Migration at Google Using CRIU") with Adrian and
    Nicolas providing the implementation now over the last months. In
    essence, this allows the CRIU binary to be installed with the
    CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling
    unprivileged users to restore processes.

    To make this possible the following permissions are altered:

    - Selecting a specific PID via clone3() set_tid relaxed from userns
    CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

    - Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed
    from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

    - Accessing /proc/pid/map_files relaxed from init userns
    CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE.

    - Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns
    CAP_CHECKPOINT_RESTORE.

    Of these four changes the /proc/self/exe change deserves a few words
    because the reasoning behind even restricting /proc/self/exe changes
    in the first place is just full of historical quirks and tracking this
    down was a questionable version of fun that I'd like to spare others.

    In short, it is trivial to change /proc/self/exe as an unprivileged
    user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace()
    or by simply intercepting the elf loader in userspace during exec.
    Nicolas was nice enough to even provide a POC for the latter (cf. [3])
    to illustrate this fact.

    The original patchset which introduced PR_SET_MM_MAP had no
    permissions around changing the exe link. They too argued that it is
    trivial to spoof the exe link already which is true. The argument
    brought up against this was that the Tomoyo LSM uses the exe link in
    tomoyo_manager() to detect whether the calling process is a policy
    manager. This caused changing the exe links to be guarded by userns
    CAP_SYS_ADMIN.

    All in all this rather seems like a "better guard it with something
    rather than nothing" argument which imho doesn't qualify as a great
    security policy. Again, because spoofing the exe link is possible for
    the calling process so even if this were security relevant it was
    broken back then and would be broken today. So technically, dropping
    all permissions around changing the exe link would probably be
    possible and would send a clearer message to any userspace that relies
    on /proc/self/exe for security reasons that they should stop doing
    this but for now we're only relaxing the exe link permissions from
    userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE.

    There's a final uapi change in here. Changing the exe link used to
    accidently return EINVAL when the caller lacked the necessary
    permissions instead of the more correct EPERM. This pr contains a
    commit fixing this. I assume that userspace won't notice or care and
    if they do I will revert this commit. But since we are changing the
    permissions anyway it seems like a good opportunity to try this fix.

    With these changes merged unprivileged checkpoint/restore will be
    possible and has already been tested by various users"

    [1] LPC 2018
    1. "Task Migration at Google Using CRIU"
    https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095
    2. "Securely Migrating Untrusted Workloads with CRIU"
    https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400
    LPC 2019
    1. "CRIU and the PID dance"
    https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s
    2. "Update on Task Migration at Google Using CRIU"
    https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s

    [2] https://github.com/checkpoint-restore/criu/pull/1155

    [3] https://github.com/nviennot/run_as_exe

    * tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    selftests: add clone3() CAP_CHECKPOINT_RESTORE test
    prctl: exe link permission error changed from -EINVAL to -EPERM
    prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe
    proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE
    pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid
    pid: use checkpoint_restore_ns_capable() for set_tid
    capabilities: Introduce CAP_CHECKPOINT_RESTORE

    Linus Torvalds
     
  • Pull seccomp updates from Kees Cook:
    "There are a bunch of clean ups and selftest improvements along with
    two major updates to the SECCOMP_RET_USER_NOTIF filter return:
    EPOLLHUP support to more easily detect the death of a monitored
    process, and being able to inject fds when intercepting syscalls that
    expect an fd-opening side-effect (needed by both container folks and
    Chrome). The latter continued the refactoring of __scm_install_fd()
    started by Christoph, and in the process found and fixed a handful of
    bugs in various callers.

    - Improved selftest coverage, timeouts, and reporting

    - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)

    - Refactor __scm_install_fd() into __receive_fd() and fix buggy
    callers

    - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
    Dhillon)"

    * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
    selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
    seccomp: Introduce addfd ioctl to seccomp user notifier
    fs: Expand __receive_fd() to accept existing fd
    pidfd: Replace open-coded receive_fd()
    fs: Add receive_fd() wrapper for __receive_fd()
    fs: Move __scm_install_fd() to __receive_fd()
    net/scm: Regularize compat handling of scm_detach_fds()
    pidfd: Add missing sock updates for pidfd_getfd()
    net/compat: Add missing sock updates for SCM_RIGHTS
    selftests/seccomp: Check ENOSYS under tracing
    selftests/seccomp: Refactor to use fixture variants
    selftests/harness: Clean up kern-doc for fixtures
    seccomp: Use -1 marker for end of mode 1 syscall list
    seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
    selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
    selftests/seccomp: Make kcmp() less required
    seccomp: Use pr_fmt
    selftests/seccomp: Improve calibration loop
    selftests/seccomp: use 90s as timeout
    selftests/seccomp: Expand benchmark to per-filter measurements
    ...

    Linus Torvalds
     

04 Aug, 2020

1 commit

  • Pull core block updates from Jens Axboe:
    "Good amount of cleanups and tech debt removals in here, and as a
    result, the diffstat shows a nice net reduction in code.

    - Softirq completion cleanups (Christoph)

    - Stop using ->queuedata (Christoph)

    - Cleanup bd claiming (Christoph)

    - Use check_events, moving away from the legacy media change
    (Christoph)

    - Use inode i_blkbits consistently (Christoph)

    - Remove old unused writeback congestion bits (Christoph)

    - Cleanup/unify submission path (Christoph)

    - Use bio_uninit consistently, instead of bio_disassociate_blkg
    (Christoph)

    - sbitmap cleared bits handling (John)

    - Request merging blktrace event addition (Jan)

    - sysfs add/remove race fixes (Luis)

    - blk-mq tag fixes/optimizations (Ming)

    - Duplicate words in comments (Randy)

    - Flush deferral cleanup (Yufen)

    - IO context locking/retry fixes (John)

    - struct_size() usage (Gustavo)

    - blk-iocost fixes (Chengming)

    - blk-cgroup IO stats fixes (Boris)

    - Various little fixes"

    * tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
    block: blk-timeout: delete duplicated word
    block: blk-mq-sched: delete duplicated word
    block: blk-mq: delete duplicated word
    block: genhd: delete duplicated words
    block: elevator: delete duplicated word and fix typos
    block: bio: delete duplicated words
    block: bfq-iosched: fix duplicated word
    iocost_monitor: start from the oldest usage index
    iocost: Fix check condition of iocg abs_vdebt
    block: Remove callback typedefs for blk_mq_ops
    block: Use non _rcu version of list functions for tag_set_list
    blk-cgroup: show global disk stats in root cgroup io.stat
    blk-cgroup: make iostat functions visible to stat printing
    block: improve discard bio alignment in __blkdev_issue_discard()
    block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
    block: defer flush request no matter whether we have elevator
    block: make blk_timeout_init() static
    block: remove retry loop in ioc_release_fn()
    block: remove unnecessary ioc nested locking
    block: integrate bd_start_claiming into __blkdev_get
    ...

    Linus Torvalds
     

26 Jul, 2020

1 commit

  • This patch refactored target bpf_iter_init_seq_priv_t callback
    function to accept additional information. This will be needed
    in later patches for map element targets since a particular
    map should be passed to traverse elements for that particular
    map. In the future, other information may be passed to target
    as well, e.g., pid, cgroup id, etc. to customize the iterator.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com

    Yonghong Song
     

20 Jul, 2020

1 commit

  • Opening files in /proc/pid/map_files when the current user is
    CAP_CHECKPOINT_RESTORE capable in the root namespace is useful for
    checkpointing and restoring to recover files that are unreachable via
    the file system such as deleted files, or memfd files.

    Signed-off-by: Adrian Reber
    Signed-off-by: Nicolas Viennot
    Reviewed-by: Cyrill Gorcunov
    Reviewed-by: Serge Hallyn
    Link: https://lore.kernel.org/r/20200719100418.2112740-5-areber@redhat.com
    Signed-off-by: Christian Brauner

    Adrian Reber
     

11 Jul, 2020

1 commit


04 Jul, 2020

2 commits


24 Jun, 2020

1 commit


21 Jun, 2020

1 commit

  • Pull tracing fixes from Steven Rostedt:

    - Have recordmcount work with > 64K sections (to support LTO)

    - kprobe RCU fixes

    - Correct a kprobe critical section with missing mutex

    - Remove redundant arch_disarm_kprobe() call

    - Fix lockup when kretprobe triggers within kprobe_flush_task()

    - Fix memory leak in fetch_op_data operations

    - Fix sleep in atomic in ftrace trace array sample code

    - Free up memory on failure in sample trace array code

    - Fix incorrect reporting of function_graph fields in format file

    - Fix quote within quote parsing in bootconfig

    - Fix return value of bootconfig tool

    - Add testcases for bootconfig tool

    - Fix maybe uninitialized warning in ftrace pid file code

    - Remove unused variable in tracing_iter_reset()

    - Fix some typos

    * tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ftrace: Fix maybe-uninitialized compiler warning
    tools/bootconfig: Add testcase for show-command and quotes test
    tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig
    tools/bootconfig: Fix to use correct quotes for value
    proc/bootconfig: Fix to use correct quotes for value
    tracing: Remove unused event variable in tracing_iter_reset
    tracing/probe: Fix memleak in fetch_op_data operations
    trace: Fix typo in allocate_ftrace_ops()'s comment
    tracing: Make ftrace packed events have align of 1
    sample-trace-array: Remove trace_array 'sample-instance'
    sample-trace-array: Fix sleeping function called from invalid context
    kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
    kprobes: Remove redundant arch_disarm_kprobe() call
    kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
    kprobes: Use non RCU traversal APIs on kprobe_tables if possible
    kprobes: Suppress the suspicious RCU warning on kprobes
    recordmcount: support >64k sections

    Linus Torvalds
     

18 Jun, 2020

1 commit


17 Jun, 2020

1 commit

  • Fix /proc/bootconfig to select double or single quotes
    corrctly according to the value.

    If a bootconfig value includes a double quote character,
    we must use single-quotes to quote that value.

    This modifies if() condition and blocks for avoiding
    double-quote in value check in 2 places. Anyway, since
    xbc_array_for_each_value() can handle the array which
    has a single node correctly.
    Thus,

    if (vnode && xbc_node_is_array(vnode)) {
    xbc_array_for_each_value(vnode) /* vnode->next != NULL */
    ...
    } else {
    snprintf(val); /* val is an empty string if !vnode */
    }

    is equivalent to

    if (vnode) {
    xbc_array_for_each_value(vnode) /* vnode->next can be NULL */
    ...
    } else {
    snprintf(""); /* value is always empty */
    }

    Link: http://lkml.kernel.org/r/159230244786.65555.3763894451251622488.stgit@devnote2

    Cc: stable@vger.kernel.org
    Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

14 Jun, 2020

2 commits

  • Pull more Kbuild updates from Masahiro Yamada:

    - fix build rules in binderfs sample

    - fix build errors when Kbuild recurses to the top Makefile

    - covert '---help---' in Kconfig to 'help'

    * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    treewide: replace '---help---' in Kconfig files with 'help'
    kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
    samples: binderfs: really compile this sample and fix build issues

    Linus Torvalds
     
  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

13 Jun, 2020

2 commits

  • Pull proc fix from Eric Biederman:
    "Much to my surprise syzbot found a very old bug in proc that the
    recent changes made easier to reproce. This bug is subtle enough it
    looks like it fooled everyone who should know better"

    * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: Use new_inode not new_inode_pseudo

    Linus Torvalds
     
  • Recently syzbot reported that unmounting proc when there is an ongoing
    inotify watch on the root directory of proc could result in a use
    after free when the watch is removed after the unmount of proc
    when the watcher exits.

    Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount
    of proc") made it easier to unmount proc and allowed syzbot to see the
    problem, but looking at the code it has been around for a long time.

    Looking at the code the fsnotify watch should have been removed by
    fsnotify_sb_delete in generic_shutdown_super. Unfortunately the inode
    was allocated with new_inode_pseudo instead of new_inode so the inode
    was not on the sb->s_inodes list. Which prevented
    fsnotify_unmount_inodes from finding the inode and removing the watch
    as well as made it so the "VFS: Busy inodes after unmount" warning
    could not find the inodes to warn about them.

    Make all of the inodes in proc visible to generic_shutdown_super,
    and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo.
    The only functional difference is that new_inode places the inodes
    on the sb->s_inodes list.

    I wrote a small test program and I can verify that without changes it
    can trigger this issue, and by replacing new_inode_pseudo with
    new_inode the issues goes away.

    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com
    Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com
    Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the directory of the current thread")
    Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry")
    Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc")
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

11 Jun, 2020

4 commits

  • Pull sysctl fixes from Al Viro:
    "Fixups to regressions in sysctl series"

    * 'work.sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    sysctl: reject gigantic reads/write to sysctl files
    cdrom: fix an incorrect __user annotation on cdrom_sysctl_info
    trace: fix an incorrect __user annotation on stack_trace_sysctl
    random: fix an incorrect __user annotation on proc_do_entropy
    net/sysctl: remove leftover __user annotations on neigh_proc_dointvec*
    net/sysctl: use cpumask_parse in flow_limit_cpu_sysctl

    Linus Torvalds
     
  • Pull proc fix from Eric Biederman:
    "Syzbot found a NULL pointer dereference if kzalloc of s_fs_info fails"

    * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: s_fs_info may be NULL when proc_kill_sb is called

    Linus Torvalds
     
  • syzbot found that proc_fill_super() fails before filling up sb->s_fs_info,
    deactivate_locked_super() will be called and sb->s_fs_info will be NULL.
    The proc_kill_sb() does not expect fs_info to be NULL which is wrong.

    Link: https://lore.kernel.org/lkml/0000000000002d7ca605a7b8b1c5@google.com
    Reported-by: syzbot+4abac52934a48af5ff19@syzkaller.appspotmail.com
    Fixes: fa10fed30f25 ("proc: allow to mount many instances of proc in one pid namespace")
    Signed-off-by: Alexey Gladkov
    Signed-off-by: Eric W. Biederman

    Alexey Gladkov
     
  • Instead of triggering a WARN_ON deep down in the page allocator just
    give up early on allocations that are way larger than the usual sysctl
    values.

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Reported-by: Vegard Nossum
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

10 Jun, 2020

4 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert the last few remaining mmap_sem rwsem calls to use the new mmap
    locking API. These were missed by coccinelle for some reason (I think
    coccinelle does not support some of the preprocessor constructs in these
    files ?)

    [akpm@linux-foundation.org: convert linux-next leftovers]
    [akpm@linux-foundation.org: more linux-next leftovers]
    [akpm@linux-foundation.org: more linux-next leftovers]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

09 Jun, 2020

5 commits

  • Merge still more updates from Andrew Morton:
    "Various trees. Mainly those parts of MM whose linux-next dependents
    are now merged. I'm still sitting on ~160 patches which await merges
    from -next.

    Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug,
    panic, lib, sysctl, mm/gup, mm/pagemap"

    * emailed patches from Andrew Morton : (52 commits)
    doc: cgroup: update note about conditions when oom killer is invoked
    module: move the set_fs hack for flush_icache_range to m68k
    nommu: use flush_icache_user_range in brk and mmap
    binfmt_flat: use flush_icache_user_range
    exec: use flush_icache_user_range in read_code
    exec: only build read_code when needed
    m68k: implement flush_icache_user_range
    arm: rename flush_cache_user_range to flush_icache_user_range
    xtensa: implement flush_icache_user_range
    sh: implement flush_icache_user_range
    asm-generic: add a flush_icache_user_range stub
    mm: rename flush_icache_user_range to flush_icache_user_page
    arm,sparc,unicore32: remove flush_icache_user_range
    riscv: use asm-generic/cacheflush.h
    powerpc: use asm-generic/cacheflush.h
    openrisc: use asm-generic/cacheflush.h
    m68knommu: use asm-generic/cacheflush.h
    microblaze: use asm-generic/cacheflush.h
    ia64: use asm-generic/cacheflush.h
    hexagon: use asm-generic/cacheflush.h
    ...

    Linus Torvalds
     
  • After a recent change introduced by Vlastimil's series [0], kernel is
    able now to handle sysctl parameters on kernel command line; also, the
    series introduced a simple infrastructure to convert legacy boot
    parameters (that duplicate sysctls) into sysctl aliases.

    This patch converts the watchdog parameters softlockup_panic and
    {hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure.
    It fixes the documentation too, since the alias only accepts values 0 or
    1, not the full range of integers.

    We also took the opportunity here to improve the documentation of the
    previously converted hung_task_panic (see the patch series [0]) and put
    the alias table in alphabetical order.

    [0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz

    Signed-off-by: Guilherme G. Piccoli
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Cc: Kees Cook
    Cc: Iurii Zaikin
    Cc: Luis Chamberlain
    Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com
    Signed-off-by: Linus Torvalds

    Guilherme G. Piccoli
     
  • We can now handle sysctl parameters on kernel command line and have
    infrastructure to convert legacy command line options that duplicate
    sysctl to become a sysctl alias.

    This patch converts the hung_task_panic parameter. Note that the sysctl
    handler is more strict and allows only 0 and 1, while the legacy
    parameter allowed any non-zero value. But there is little reason anyone
    would not be using 1.

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Christian Brauner
    Cc: David Rientjes
    Cc: "Eric W . Biederman"
    Cc: Greg Kroah-Hartman
    Cc: "Guilherme G . Piccoli"
    Cc: Iurii Zaikin
    Cc: Ivan Teterevkov
    Cc: Luis Chamberlain
    Cc: Masami Hiramatsu
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200427180433.7029-4-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • We can now handle sysctl parameters on kernel command line, but
    historically some parameters introduced their own command line
    equivalent, which we don't want to remove for compatibility reasons.

    We can, however, convert them to the generic infrastructure with a table
    translating the legacy command line parameters to their sysctl names,
    and removing the one-off param handlers.

    This patch adds the support and makes the first conversion to
    demonstrate it, on the (deprecated) numa_zonelist_order parameter.

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Reviewed-by: Luis Chamberlain
    Acked-by: Kees Cook
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Christian Brauner
    Cc: David Rientjes
    Cc: "Eric W . Biederman"
    Cc: Greg Kroah-Hartman
    Cc: "Guilherme G . Piccoli"
    Cc: Iurii Zaikin
    Cc: Ivan Teterevkov
    Cc: Masami Hiramatsu
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200427180433.7029-3-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Patch series "support setting sysctl parameters from kernel command line", v3.

    This series adds support for something that seems like many people
    always wanted but nobody added it yet, so here's the ability to set
    sysctl parameters via kernel command line options in the form of
    sysctl.vm.something=1

    The important part is Patch 1. The second, not so important part is an
    attempt to clean up legacy one-off parameters that do the same thing as
    a sysctl. I don't want to remove them completely for compatibility
    reasons, but with generic sysctl support the idea is to remove the
    one-off param handlers and treat the parameters as aliases for the
    sysctl variants.

    I have identified several parameters that mention sysctl counterparts in
    Documentation/admin-guide/kernel-parameters.txt but there might be more.
    The conversion also has varying level of success:

    - numa_zonelist_order is converted in Patch 2 together with adding the
    necessary infrastructure. It's easy as it doesn't really do anything
    but warn on deprecated value these days.

    - hung_task_panic is converted in Patch 3, but there's a downside that
    now it only accepts 0 and 1, while previously it was any integer
    value

    - nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic,
    so there's no straighforward conversion possible

    - traceoff_on_warning is a flag without value and it would be required
    to handle that somehow in the conversion infractructure, which seems
    pointless for a single flag

    This patch (of 5):

    A recently proposed patch to add vm_swappiness command line parameter in
    addition to existing sysctl [1] made me wonder why we don't have a
    general support for passing sysctl parameters via command line.

    Googling found only somebody else wondering the same [2], but I haven't
    found any prior discussion with reasons why not to do this.

    Settings the vm_swappiness issue aside (the underlying issue might be
    solved in a different way), quick search of kernel-parameters.txt shows
    there are already some that exist as both sysctl and kernel parameter -
    hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning.

    A general mechanism would remove the need to add more of those one-offs
    and might be handy in situations where configuration by e.g.
    /etc/sysctl.d/ is impractical.

    Hence, this patch adds a new parse_args() pass that looks for parameters
    prefixed by 'sysctl.' and tries to interpret them as writes to the
    corresponding sys/ files using an temporary in-kernel procfs mount.
    This mechanism was suggested by Eric W. Biederman [3], as it handles
    all dynamically registered sysctl tables, even though we don't handle
    modular sysctls. Errors due to e.g. invalid parameter name or value
    are reported in the kernel log.

    The processing is hooked right before the init process is loaded, as
    some handlers might be more complicated than simple setters and might
    need some subsystems to be initialized. At the moment the init process
    can be started and eventually execute a process writing to /proc/sys/
    then it should be also fine to do that from the kernel.

    Sysctls registered later on module load time are not set by this
    mechanism - it's expected that in such scenarios, setting sysctl values
    from userspace is practical enough.

    [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
    [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
    [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Reviewed-by: Luis Chamberlain
    Reviewed-by: Masami Hiramatsu
    Acked-by: Kees Cook
    Acked-by: Michal Hocko
    Cc: Iurii Zaikin
    Cc: Ivan Teterevkov
    Cc: Michal Hocko
    Cc: David Rientjes
    Cc: Matthew Wilcox
    Cc: "Eric W . Biederman"
    Cc: "Guilherme G . Piccoli"
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Cc: Greg Kroah-Hartman
    Cc: Christian Brauner
    Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz
    Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

08 Jun, 2020

1 commit

  • …git/jj/linux-apparmor

    Pull apparmor updates from John Johansen:
    "Features:
    - Replace zero-length array with flexible-array
    - add a valid state flags check
    - add consistency check between state and dfa diff encode flags
    - add apparmor subdir to proc attr interface
    - fail unpack if profile mode is unknown
    - add outofband transition and use it in xattr match
    - ensure that dfa state tables have entries

    Cleanups:
    - Use true and false for bool variable
    - Remove semicolon
    - Clean code by removing redundant instructions
    - Replace two seq_printf() calls by seq_puts() in aa_label_seq_xprint()
    - remove duplicate check of xattrs on profile attachment
    - remove useless aafs_create_symlink

    Bug fixes:
    - Fix memory leak of profile proxy
    - fix introspection of of task mode for unconfined tasks
    - fix nnp subset test for unconfined
    - check/put label on apparmor_sk_clone_security()"

    * tag 'apparmor-pr-2020-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
    apparmor: Fix memory leak of profile proxy
    apparmor: fix introspection of of task mode for unconfined tasks
    apparmor: check/put label on apparmor_sk_clone_security()
    apparmor: Use true and false for bool variable
    security/apparmor/label.c: Clean code by removing redundant instructions
    apparmor: Replace zero-length array with flexible-array
    apparmor: ensure that dfa state tables have entries
    apparmor: remove duplicate check of xattrs on profile attachment.
    apparmor: add outofband transition and use it in xattr match
    apparmor: fail unpack if profile mode is unknown
    apparmor: fix nnp subset test for unconfined
    apparmor: remove useless aafs_create_symlink
    apparmor: add proc subdir to attrs
    apparmor: add consistency check between state and dfa diff encode flags
    apparmor: add a valid state flags check
    AppArmor: Remove semicolon
    apparmor: Replace two seq_printf() calls by seq_puts() in aa_label_seq_xprint()

    Linus Torvalds
     

05 Jun, 2020

3 commits

  • Merge yet more updates from Andrew Morton:

    - More MM work. 100ish more to go. Mike Rapoport's "mm: remove
    __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

    - Various other little subsystems

    * emailed patches from Andrew Morton : (127 commits)
    lib/ubsan.c: fix gcc-10 warnings
    tools/testing/selftests/vm: remove duplicate headers
    selftests: vm: pkeys: fix multilib builds for x86
    selftests: vm: pkeys: use the correct page size on powerpc
    selftests/vm/pkeys: override access right definitions on powerpc
    selftests/vm/pkeys: test correct behaviour of pkey-0
    selftests/vm/pkeys: introduce a sub-page allocator
    selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
    selftests/vm/pkeys: associate key on a mapped page and detect write violation
    selftests/vm/pkeys: associate key on a mapped page and detect access violation
    selftests/vm/pkeys: improve checks to determine pkey support
    selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
    selftests/vm/pkeys: fix number of reserved powerpc pkeys
    selftests/vm/pkeys: introduce powerpc support
    selftests/vm/pkeys: introduce generic pkey abstractions
    selftests: vm: pkeys: use the correct huge page size
    selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
    selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
    selftests/vm/pkeys: fix pkey_disable_clear()
    selftests: vm: pkeys: add helpers for pkey bits
    ...

    Linus Torvalds
     
  • "catch" is reserved keyword in C++, rename it to something both gcc and
    g++ accept.

    Rename "ign" for symmetry.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200331210905.GA31680@avx2
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Pull proc updates from Eric Biederman:
    "This has four sets of changes:

    - modernize proc to support multiple private instances

    - ensure we see the exit of each process tid exactly

    - remove has_group_leader_pid

    - use pids not tasks in posix-cpu-timers lookup

    Alexey updated proc so each mount of proc uses a new superblock. This
    allows people to actually use mount options with proc with no fear of
    messing up another mount of proc. Given the kernel's internal mounts
    of proc for things like uml this was a real problem, and resulted in
    Android's hidepid mount options being ignored and introducing security
    issues.

    The rest of the changes are small cleanups and fixes that came out of
    my work to allow this change to proc. In essence it is swapping the
    pids in de_thread during exec which removes a special case the code
    had to handle. Then updating the code to stop handling that special
    case"

    * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: proc_pid_ns takes super_block as an argument
    remove the no longer needed pid_alive() check in __task_pid_nr_ns()
    posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
    posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
    posix-cpu-timers: Extend rcu_read_lock removing task_struct references
    signal: Remove has_group_leader_pid
    exec: Remove BUG_ON(has_group_leader_pid)
    posix-cpu-timer: Unify the now redundant code in lookup_task
    posix-cpu-timer: Tidy up group_leader logic in lookup_task
    proc: Ensure we see the exit of each process tid exactly once
    rculist: Add hlists_swap_heads_rcu
    proc: Use PIDTYPE_TGID in next_tgid
    Use proc_pid_ns() to get pid_namespace from the proc superblock
    proc: use named enums for better readability
    proc: use human-readable values for hidepid
    docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
    proc: add option to mount only a pids subset
    proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
    proc: allow to mount many instances of proc in one pid namespace
    proc: rename struct proc_fs_info to proc_fs_opts

    Linus Torvalds
     

04 Jun, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     

03 Jun, 2020

3 commits

  • Merge updates from Andrew Morton:
    "A few little subsystems and a start of a lot of MM patches.

    Subsystems affected by this patch series: squashfs, ocfs2, parisc,
    vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
    swap, memcg, pagemap, memory-failure, vmalloc, kasan"

    * emailed patches from Andrew Morton : (128 commits)
    kasan: move kasan_report() into report.c
    mm/mm_init.c: report kasan-tag information stored in page->flags
    ubsan: entirely disable alignment checks under UBSAN_TRAP
    kasan: fix clang compilation warning due to stack protector
    x86/mm: remove vmalloc faulting
    mm: remove vmalloc_sync_(un)mappings()
    x86/mm/32: implement arch_sync_kernel_mappings()
    x86/mm/64: implement arch_sync_kernel_mappings()
    mm/ioremap: track which page-table levels were modified
    mm/vmalloc: track which page-table levels were modified
    mm: add functions to track page directory modifications
    s390: use __vmalloc_node in stack_alloc
    powerpc: use __vmalloc_node in alloc_vm_stack
    arm64: use __vmalloc_node in arch_alloc_vmap_stack
    mm: remove vmalloc_user_node_flags
    mm: switch the test_vmalloc module to use __vmalloc_node
    mm: remove __vmalloc_node_flags_caller
    mm: remove both instances of __vmalloc_node_flags
    mm: remove the prot argument to __vmalloc_node
    mm: remove the pgprot argument to __vmalloc
    ...

    Linus Torvalds
     
  • Now, when reading /proc/PID/smaps, the PMD migration entry in page table
    is simply ignored. To improve the accuracy of /proc/PID/smaps, its
    parsing and processing is added.

    To test the patch, we run pmbench to eat 400 MB memory in background,
    then run /usr/bin/migratepages and `cat /proc/PID/smaps` every second.
    The issue as follows can be reproduced within 60 seconds.

    Before the patch, for the fully populated 400 MB anonymous VMA, some THP
    pages under migration may be lost as below.

    7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
    Size: 409600 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Rss: 407552 kB
    Pss: 407552 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 407552 kB
    Referenced: 301056 kB
    Anonymous: 407552 kB
    LazyFree: 0 kB
    AnonHugePages: 405504 kB
    ShmemPmdMapped: 0 kB
    FilePmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    Locked: 0 kB
    THPeligible: 1
    VmFlags: rd wr mr mw me ac

    After the patch, it will be always,

    7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
    Size: 409600 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Rss: 409600 kB
    Pss: 409600 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 409600 kB
    Referenced: 294912 kB
    Anonymous: 409600 kB
    LazyFree: 0 kB
    AnonHugePages: 407552 kB
    ShmemPmdMapped: 0 kB
    FilePmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    Locked: 0 kB
    THPeligible: 1
    VmFlags: rd wr mr mw me ac

    Signed-off-by: "Huang, Ying"
    Signed-off-by: Andrew Morton
    Reviewed-by: Zi Yan
    Acked-by: Michal Hocko
    Acked-by: Kirill A. Shutemov
    Acked-by: Vlastimil Babka
    Cc: Andrea Arcangeli
    Cc: Alexey Dobriyan
    Cc: Konstantin Khlebnikov
    Cc: "Jérôme Glisse"
    Cc: Yang Shi
    Link: http://lkml.kernel.org/r/20200403123059.1846960-1-ying.huang@intel.com
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • After an NFS page has been written it is considered "unstable" until a
    COMMIT request succeeds. If the COMMIT fails, the page will be
    re-written.

    These "unstable" pages are currently accounted as "reclaimable", either
    in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a
    'reclaimable' count. This might have made sense when sending the COMMIT
    required a separate action by the VFS/MM (e.g. releasepage() used to
    send a COMMIT). However now that all writes generated by ->writepages()
    will automatically be followed by a COMMIT (since commit 919e3bd9a875
    ("NFS: Ensure we commit after writeback is complete")) it makes more
    sense to treat them as writeback pages.

    So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in
    NR_WRITEBACK and WB_WRITEBACK.

    A particular effect of this change is that when
    wb_check_background_flush() calls wb_over_bg_threshold(), the latter
    will report 'true' a lot less often as the 'unstable' pages are no
    longer considered 'dirty' (as there is nothing that writeback can do
    about them anyway).

    Currently wb_check_background_flush() will trigger writeback to NFS even
    when there are relatively few dirty pages (if there are lots of unstable
    pages), this can result in small writes going to the server (10s of
    Kilobytes rather than a Megabyte) which hurts throughput. With this
    patch, there are fewer writes which are each larger on average.

    Where the NR_UNSTABLE_NFS count was included in statistics
    virtual-files, the entry is retained, but the value is hard-coded as
    zero. static trace points and warning printks which mentioned this
    counter no longer report it.

    [akpm@linux-foundation.org: re-layout comment]
    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: NeilBrown
    Signed-off-by: Andrew Morton
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Acked-by: Trond Myklebust
    Acked-by: Michal Hocko [mm]
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name
    Signed-off-by: Linus Torvalds

    NeilBrown
     

02 Jun, 2020

2 commits

  • Pull documentation updates from Jonathan Corbet:
    "A fair amount of stuff this time around, dominated by yet another
    massive set from Mauro toward the completion of the RST conversion. I
    *really* hope we are getting close to the end of this. Meanwhile,
    those patches reach pretty far afield to update document references
    around the tree; there should be no actual code changes there. There
    will be, alas, more of the usual trivial merge conflicts.

    Beyond that we have more translations, improvements to the sphinx
    scripting, a number of additions to the sysctl documentation, and lots
    of fixes"

    * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
    Documentation: fixes to the maintainer-entry-profile template
    zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
    tracing: Fix events.rst section numbering
    docs: acpi: fix old http link and improve document format
    docs: filesystems: add info about efivars content
    Documentation: LSM: Correct the basic LSM description
    mailmap: change email for Ricardo Ribalda
    docs: sysctl/kernel: document unaligned controls
    Documentation: admin-guide: update bug-hunting.rst
    docs: sysctl/kernel: document ngroups_max
    nvdimm: fixes to maintainter-entry-profile
    Documentation/features: Correct RISC-V kprobes support entry
    Documentation/features: Refresh the arch support status files
    Revert "docs: sysctl/kernel: document ngroups_max"
    docs: move locking-specific documents to locking/
    docs: move digsig docs to the security book
    docs: move the kref doc into the core-api book
    docs: add IRQ documentation at the core-api book
    docs: debugging-via-ohci1394.txt: add it to the core-api book
    docs: fix references for ipmi.rst file
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Will Deacon:
    "A sizeable pile of arm64 updates for 5.8.

    Summary below, but the big two features are support for Branch Target
    Identification and Clang's Shadow Call stack. The latter is currently
    arm64-only, but the high-level parts are all in core code so it could
    easily be adopted by other architectures pending toolchain support

    Branch Target Identification (BTI):

    - Support for ARMv8.5-BTI in both user- and kernel-space. This allows
    branch targets to limit the types of branch from which they can be
    called and additionally prevents branching to arbitrary code,
    although kernel support requires a very recent toolchain.

    - Function annotation via SYM_FUNC_START() so that assembly functions
    are wrapped with the relevant "landing pad" instructions.

    - BPF and vDSO updates to use the new instructions.

    - Addition of a new HWCAP and exposure of BTI capability to userspace
    via ID register emulation, along with ELF loader support for the
    BTI feature in .note.gnu.property.

    - Non-critical fixes to CFI unwind annotations in the sigreturn
    trampoline.

    Shadow Call Stack (SCS):

    - Support for Clang's Shadow Call Stack feature, which reserves
    platform register x18 to point at a separate stack for each task
    that holds only return addresses. This protects function return
    control flow from buffer overruns on the main stack.

    - Save/restore of x18 across problematic boundaries (user-mode,
    hypervisor, EFI, suspend, etc).

    - Core support for SCS, should other architectures want to use it
    too.

    - SCS overflow checking on context-switch as part of the existing
    stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.

    CPU feature detection:

    - Removed numerous "SANITY CHECK" errors when running on a system
    with mismatched AArch32 support at EL1. This is primarily a concern
    for KVM, which disabled support for 32-bit guests on such a system.

    - Addition of new ID registers and fields as the architecture has
    been extended.

    Perf and PMU drivers:

    - Minor fixes and cleanups to system PMU drivers.

    Hardware errata:

    - Unify KVM workarounds for VHE and nVHE configurations.

    - Sort vendor errata entries in Kconfig.

    Secure Monitor Call Calling Convention (SMCCC):

    - Update to the latest specification from Arm (v1.2).

    - Allow PSCI code to query the SMCCC version.

    Software Delegated Exception Interface (SDEI):

    - Unexport a bunch of unused symbols.

    - Minor fixes to handling of firmware data.

    Pointer authentication:

    - Add support for dumping the kernel PAC mask in vmcoreinfo so that
    the stack can be unwound by tools such as kdump.

    - Simplification of key initialisation during CPU bringup.

    BPF backend:

    - Improve immediate generation for logical and add/sub instructions.

    vDSO:

    - Minor fixes to the linker flags for consistency with other
    architectures and support for LLVM's unwinder.

    - Clean up logic to initialise and map the vDSO into userspace.

    ACPI:

    - Work around for an ambiguity in the IORT specification relating to
    the "num_ids" field.

    - Support _DMA method for all named components rather than only PCIe
    root complexes.

    - Minor other IORT-related fixes.

    Miscellaneous:

    - Initialise debug traps early for KGDB and fix KDB cacheflushing
    deadlock.

    - Minor tweaks to early boot state (documentation update, set
    TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).

    - Refactoring and cleanup"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
    KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
    KVM: arm64: Check advertised Stage-2 page size capability
    arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
    ACPI/IORT: Remove the unused __get_pci_rid()
    arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
    arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
    arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
    arm64/cpufeature: Introduce ID_MMFR5 CPU register
    arm64/cpufeature: Introduce ID_DFR1 CPU register
    arm64/cpufeature: Introduce ID_PFR2 CPU register
    arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
    arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
    arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
    arm64: mm: Add asid_gen_match() helper
    firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
    arm64: vdso: Fix CFI directives in sigreturn trampoline
    arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
    ...

    Linus Torvalds