23 Aug, 2018

1 commit

  • Rather than in vm_area_alloc(). To ensure that the various oddball
    stack-based vmas are in a good state. Some of the callers were zeroing
    them out, others were not.

    Acked-by: Kirill A. Shutemov
    Cc: Russell King
    Cc: Dmitry Vyukov
    Cc: Oleg Nesterov
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

21 Aug, 2018

3 commits

  • Pull tracing updates from Steven Rostedt:

    - Restructure of lockdep and latency tracers

    This is the biggest change. Joel Fernandes restructured the hooks
    from irqs and preemption disabling and enabling. He got rid of a lot
    of the preprocessor #ifdef mess that they caused.

    He turned both lockdep and the latency tracers to use trace events
    inserted in the preempt/irqs disabling paths. But unfortunately,
    these started to cause issues in corner cases. Thus, parts of the
    code was reverted back to where lockdep and the latency tracers just
    get called directly (without using the trace events). But because the
    original change cleaned up the code very nicely we kept that, as well
    as the trace events for preempt and irqs disabling, but they are
    limited to not being called in NMIs.

    - Have trace events use SRCU for "rcu idle" calls. This was required
    for the preempt/irqs off trace events. But it also had to not allow
    them to be called in NMI context. Waiting till Paul makes an NMI safe
    SRCU API.

    - New notrace SRCU API to allow trace events to use SRCU.

    - Addition of mcount-nop option support

    - SPDX headers replacing GPL templates.

    - Various other fixes and clean ups.

    - Some fixes are marked for stable, but were not fully tested before
    the merge window opened.

    * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
    tracing: Fix SPDX format headers to use C++ style comments
    tracing: Add SPDX License format tags to tracing files
    tracing: Add SPDX License format to bpf_trace.c
    blktrace: Add SPDX License format header
    s390/ftrace: Add -mfentry and -mnop-mcount support
    tracing: Add -mcount-nop option support
    tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
    tracing: Handle CC_FLAGS_FTRACE more accurately
    Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
    Uprobes: Simplify uprobe_register() body
    tracepoints: Free early tracepoints after RCU is initialized
    uprobes: Use synchronize_rcu() not synchronize_sched()
    tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()
    ftrace: Remove unused pointer ftrace_swapper_pid
    tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
    tracing/irqsoff: Handle preempt_count for different configs
    tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"
    tracing: irqsoff: Account for additional preempt_disable
    trace: Use rcu_dereference_raw for hooks from trace-event subsystem
    tracing/kprobes: Fix within_notrace_func() to check only notrace functions
    ...

    Linus Torvalds
     
  • Pull livepatching updates from Jiri Kosina:
    "Code cleanups from Kamalesh Babulal"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: Validate module/old func name length
    livepatch: Remove reliable stacktrace check in klp_try_switch_task()

    Linus Torvalds
     
  • Jiri Kosina
     

20 Aug, 2018

1 commit

  • Pull networking fixes from David Miller:

    1) Fix races in IPVS, from Tan Hu.

    2) Missing unbind in matchall classifier, from Hangbin Liu.

    3) Missing act_ife action release, from Vlad Buslov.

    4) Cure lockdep splats in ila, from Cong Wang.

    5) veth queue leak on link delete, from Toshiaki Makita.

    6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From
    Kees Cook.

    7) RCU usage fixup in XDP, from Tariq Toukan.

    8) Two TCP ULP fixes from Daniel Borkmann.

    9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner
    Kallweit.

    10) Always take tcf_lock with BH disabled, otherwise we can deadlock
    with rate estimator code paths. From Vlad Buslov.

    11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly.
    From Jian-Hong Pan.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
    ip6_vti: fix creating fallback tunnel device for vti6
    ip_vti: fix a null pointer deferrence when create vti fallback tunnel
    r8169: don't use MSI-X on RTL8106e
    net: lan743x_ptp: convert to ktime_get_clocktai_ts64
    net: sched: always disable bh when taking tcf_lock
    ip6_vti: simplify stats handling in vti6_xmit
    bpf: fix redirect to map under tail calls
    r8169: add missing Kconfig dependency
    tools/bpf: fix bpf selftest test_cgroup_storage failure
    bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
    bpf, sockmap: fix map elem deletion race with smap_stop_sock
    bpf, sockmap: fix leakage of smap_psock_map_entry
    tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
    tcp, ulp: add alias for all ulp modules
    bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
    samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM
    net/xdp: Fix suspicious RCU usage warning
    net/mlx5e: Delete unneeded function argument
    Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb
    isdn: Disable IIOCDBGVAR
    ...

    Linus Torvalds
     

19 Aug, 2018

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2018-08-18

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit
    restrictions, from Yonghong.

    2) Fix a suspicious RCU rcu_dereference_check() warning triggered
    from removing a device's XDP memory allocator by using the correct
    rhashtable lookup function, from Tariq.

    3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races
    as well as enforcing module aliases for ULPs. Another fix for BPF
    map redirect to make them work again with tail calls, from Daniel.

    4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Aug, 2018

8 commits

  • Merge updates from Andrew Morton:

    - a few misc things

    - a few Y2038 fixes

    - ntfs fixes

    - arch/sh tweaks

    - ocfs2 updates

    - most of MM

    * emailed patches from Andrew Morton : (111 commits)
    mm/hmm.c: remove unused variables align_start and align_end
    fs/userfaultfd.c: remove redundant pointer uwq
    mm, vmacache: hash addresses based on pmd
    mm/list_lru: introduce list_lru_shrink_walk_irq()
    mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one()
    mm/list_lru.c: move locking from __list_lru_walk_one() to its caller
    mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node()
    mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP
    mm/sparse: delete old sparse_init and enable new one
    mm/sparse: add new sparse_init_nid() and sparse_init()
    mm/sparse: move buffer init/fini to the common place
    mm/sparse: use the new sparse buffer functions in non-vmemmap
    mm/sparse: abstract sparse buffer allocations
    mm/hugetlb.c: don't zero 1GiB bootmem pages
    mm, page_alloc: double zone's batchsize
    mm/oom_kill.c: document oom_lock
    mm/hugetlb: remove gigantic page support for HIGHMEM
    mm, oom: remove sleep from under oom_lock
    kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()
    mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
    ...

    Linus Torvalds
     
  • The CMA memory allocator doesn't support standard gfp flags for memory
    allocation, so there is no point having it as a parameter for
    dma_alloc_from_contiguous() function. Replace it by a boolean no_warn
    argument, which covers all the underlaying cma_alloc() function
    supports.

    This will help to avoid giving false feeling that this function supports
    standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
    what has already been an issue: see commit dd65a941f6ba ("arm64:
    dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").

    Link: http://lkml.kernel.org/r/20180709122020eucas1p21a71b092975cb4a3b9954ffc63f699d1~-sqUFoa-h2939329393eucas1p2Y@eucas1p2.samsung.com
    Signed-off-by: Marek Szyprowski
    Acked-by: Michał Nazarewicz
    Acked-by: Vlastimil Babka
    Reviewed-by: Christoph Hellwig
    Cc: Laura Abbott
    Cc: Michal Hocko
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marek Szyprowski
     
  • cma_alloc() doesn't really support gfp flags other than __GFP_NOWARN, so
    convert gfp_mask parameter to boolean no_warn parameter.

    This will help to avoid giving false feeling that this function supports
    standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
    what has already been an issue: see commit dd65a941f6ba ("arm64:
    dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").

    Link: http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com
    Signed-off-by: Marek Szyprowski
    Acked-by: Michal Hocko
    Acked-by: Michał Nazarewicz
    Acked-by: Laura Abbott
    Acked-by: Vlastimil Babka
    Reviewed-by: Christoph Hellwig
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marek Szyprowski
     
  • KASAN learns about hotadded memory via the memory hotplug notifier.
    devm_memremap_pages() intentionally skips calling memory hotplug
    notifiers. So KASAN doesn't know anything about new memory added by
    devm_memremap_pages(). This causes a crash when KASAN tries to access
    non-existent shadow memory:

    BUG: unable to handle kernel paging request at ffffed0078000000
    RIP: 0010:check_memory_region+0x82/0x1e0
    Call Trace:
    memcpy+0x1f/0x50
    pmem_do_bvec+0x163/0x720
    pmem_make_request+0x305/0xac0
    generic_make_request+0x54f/0xcf0
    submit_bio+0x9c/0x370
    submit_bh_wbc+0x4c7/0x700
    block_read_full_page+0x5ef/0x870
    do_read_cache_page+0x2b8/0xb30
    read_dev_sector+0xbd/0x3f0
    read_lba.isra.0+0x277/0x670
    efi_partition+0x41a/0x18f0
    check_partition+0x30d/0x5e9
    rescan_partitions+0x18c/0x840
    __blkdev_get+0x859/0x1060
    blkdev_get+0x23f/0x810
    __device_add_disk+0x9c8/0xde0
    pmem_attach_disk+0x9a8/0xf50
    nvdimm_bus_probe+0xf3/0x3c0
    driver_probe_device+0x493/0xbd0
    bus_for_each_drv+0x118/0x1b0
    __device_attach+0x1cd/0x2b0
    bus_probe_device+0x1ac/0x260
    device_add+0x90d/0x1380
    nd_async_device_register+0xe/0x50
    async_run_entry_fn+0xc3/0x5d0
    process_one_work+0xa0a/0x1810
    worker_thread+0x87/0xe80
    kthread+0x2d7/0x390
    ret_from_fork+0x3a/0x50

    Add kasan_add_zero_shadow()/kasan_remove_zero_shadow() - post mm_init()
    interface to map/unmap kasan_zero_page at requested virtual addresses.
    And use it to add/remove the shadow memory for hotplugged/unplugged
    device memory.

    Link: http://lkml.kernel.org/r/20180629164932.740-1-aryabinin@virtuozzo.com
    Fixes: 41e94a851304 ("add devm_memremap_pages")
    Signed-off-by: Andrey Ryabinin
    Reported-by: Dave Chinner
    Reviewed-by: Dan Williams
    Tested-by: Dan Williams
    Cc: Dmitry Vyukov
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • Patch series "Directed kmem charging", v8.

    The Linux kernel's memory cgroup allows limiting the memory usage of the
    jobs running on the system to provide isolation between the jobs. All
    the kernel memory allocated in the context of the job and marked with
    __GFP_ACCOUNT will also be included in the memory usage and be limited
    by the job's limit.

    The kernel memory can only be charged to the memcg of the process in
    whose context kernel memory was allocated. However there are cases
    where the allocated kernel memory should be charged to the memcg
    different from the current processes's memcg. This patch series
    contains two such concrete use-cases i.e. fsnotify and buffer_head.

    The fsnotify event objects can consume a lot of system memory for large
    or unlimited queues if there is either no or slow listener. The events
    are allocated in the context of the event producer. However they should
    be charged to the event consumer. Similarly the buffer_head objects can
    be allocated in a memcg different from the memcg of the page for which
    buffer_head objects are being allocated.

    To solve this issue, this patch series introduces mechanism to charge
    kernel memory to a given memcg. In case of fsnotify events, the memcg
    of the consumer can be used for charging and for buffer_head, the memcg
    of the page can be charged. For directed charging, the caller can use
    the scope API memalloc_[un]use_memcg() to specify the memcg to charge
    for all the __GFP_ACCOUNT allocations within the scope.

    This patch (of 2):

    A lot of memory can be consumed by the events generated for the huge or
    unlimited queues if there is either no or slow listener. This can cause
    system level memory pressure or OOMs. So, it's better to account the
    fsnotify kmem caches to the memcg of the listener.

    However the listener can be in a different memcg than the memcg of the
    producer and these allocations happen in the context of the event
    producer. This patch introduces remote memcg charging API which the
    producer can use to charge the allocations to the memcg of the listener.

    There are seven fsnotify kmem caches and among them allocations from
    dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
    inotify_inode_mark_cachep happens in the context of syscall from the
    listener. So, SLAB_ACCOUNT is enough for these caches.

    The objects from fsnotify_mark_connector_cachep are not accounted as
    they are small compared to the notification mark or events and it is
    unclear whom to account connector to since it is shared by all events
    attached to the inode.

    The allocations from the event caches happen in the context of the event
    producer. For such caches we will need to remote charge the allocations
    to the listener's memcg. Thus we save the memcg reference in the
    fsnotify_group structure of the listener.

    This patch has also moved the members of fsnotify_group to keep the size
    same, at least for 64 bit build, even with additional member by filling
    the holes.

    [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
    Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
    Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
    Signed-off-by: Shakeel Butt
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jan Kara
    Cc: Amir Goldstein
    Cc: Greg Thelen
    Cc: Vladimir Davydov
    Cc: Roman Gushchin
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shakeel Butt
     
  • Commits 109980b894e9 ("bpf: don't select potentially stale ri->map
    from buggy xdp progs") and 7c3001313396 ("bpf: fix ri->map_owner
    pointer on bpf_prog_realloc") tried to mitigate that buggy programs
    using bpf_redirect_map() helper call do not leave stale maps behind.
    Idea was to add a map_owner cookie into the per CPU struct redirect_info
    which was set to prog->aux by the prog making the helper call as a
    proof that the map is not stale since the prog is implicitly holding
    a reference to it. This owner cookie could later on get compared with
    the program calling into BPF whether they match and therefore the
    redirect could proceed with processing the map safely.

    In (obvious) hindsight, this approach breaks down when tail calls are
    involved since the original caller's prog->aux pointer does not have
    to match the one from one of the progs out of the tail call chain,
    and therefore the xdp buffer will be dropped instead of redirected.
    A way around that would be to fix the issue differently (which also
    allows to remove related work in fast path at the same time): once
    the life-time of a redirect map has come to its end we use it's map
    free callback where we need to wait on synchronize_rcu() for current
    outstanding xdp buffers and remove such a map pointer from the
    redirect info if found to be present. At that time no program is
    using this map anymore so we simply invalidate the map pointers to
    NULL iff they previously pointed to that instance while making sure
    that the redirect path only reads out the map once.

    Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine")
    Fixes: 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy xdp progs")
    Reported-by: Sebastiano Miano
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Pull modules updates from Jessica Yu:
    "Summary of modules changes for the 4.19 merge window:

    - Fix modules kallsyms for livepatch. Livepatch modules can have
    SHN_UNDEF symbols in their module symbol tables for later symbol
    resolution, but kallsyms shouldn't be returning these symbols

    - Some code cleanups and minor reshuffling in load_module() were done
    to log the module name when module signature verification fails"

    * tag 'modules-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    kernel/module: Use kmemdup to replace kmalloc+memcpy
    ARM: module: fix modsign build error
    modsign: log module name in the event of an error
    module: replace VMLINUX_SYMBOL_STR() with __stringify() or string literal
    module: print sensible error code
    module: setup load info before module_sig_check()
    module: make it clear when we're handling the module copy in info->hdr
    module: exclude SHN_UNDEF symbols from kallsyms api

    Linus Torvalds
     
  • Pull fsnotify updates from Jan Kara:
    "fsnotify cleanups from Amir and a small inotify improvement"

    * tag 'fsnotify_for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    inotify: Add flag IN_MASK_CREATE for inotify_add_watch()
    fanotify: factor out helpers to add/remove mark
    fsnotify: add helper to get mask from connector
    fsnotify: let connector point to an abstract object
    fsnotify: pass connp and object type to fsnotify_add_mark()
    fsnotify: use typedef fsnotify_connp_t for brevity

    Linus Torvalds
     

17 Aug, 2018

8 commits

  • The Linux kernel adopted the SPDX License format headers to ease license
    compliance management, and uses the C++ '//' style comments for the SPDX
    header tags. Some files in the tracing directory used the C style /* */
    comments for them. To be consistent across all files, replace the /* */
    C style SPDX tags with the C++ // SPDX tags.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Add the SPDX License header to ease license compliance management.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Add the SPDX License header to ease license compliance management.

    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The current code in sock_map_ctx_update_elem() allows for BPF_EXIST
    and BPF_NOEXIST map update flags. While on array-like maps this approach
    is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others
    enforce map update flags to be BPF_ANY such that xchg() can be used
    directly, the current implementation in sock map does not guarantee
    that such operation with BPF_EXIST / BPF_NOEXIST is atomic.

    The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the
    socket from the slot which is then tested for NULL / non-NULL. However
    later after __sock_map_ctx_update_elem(), the actual update is done
    through osock = xchg(&stab->sock_map[i], sock). Problem is that in
    the meantime a different CPU could have updated / deleted a socket
    on that specific slot and thus flag contraints won't hold anymore.

    I've been thinking whether best would be to just break UAPI and do
    an enforcement of BPF_ANY to check if someone actually complains,
    however trouble is that already in BPF kselftest we use BPF_NOEXIST
    for the map update, and therefore it might have been copied into
    applications already. The fix to keep the current behavior intact
    would be to add a map lock similar to the sock hash bucket lock only
    for covering the whole map.

    Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • The smap_start_sock() and smap_stop_sock() are each protected under
    the sock->sk_callback_lock from their call-sites except in the case
    of sock_map_delete_elem() where we drop the old socket from the map
    slot. This is racy because the same sock could be part of multiple
    sock maps, so we run smap_stop_sock() in parallel, and given at that
    point psock->strp_enabled might be true on both CPUs, we might for
    example wrongly restore the sk->sk_data_ready / sk->sk_write_space.
    Therefore, hold the sock->sk_callback_lock as well on delete. Looks
    like 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add
    multi-map support") had this right, but later on e9db4ef6bf4c ("bpf:
    sockhash fix omitted bucket lock in sock_close") removed it again
    from delete leaving this smap_stop_sock() instance unprotected.

    Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • While working on sockmap I noticed that we do not always kfree the
    struct smap_psock_map_entry list elements which track psocks attached
    to maps. In the case of sock_hash_ctx_update_elem(), these map entries
    are allocated outside of __sock_map_ctx_update_elem() with their
    linkage to the socket hash table filled. In the case of sock array,
    the map entries are allocated inside of __sock_map_ctx_update_elem()
    and added with their linkage to the psock->maps. Both additions are
    under psock->maps_lock each.

    Now, we drop these elements from their psock->maps list in a few
    occasions: i) in sock array via smap_list_map_remove() when an entry
    is either deleted from the map from user space, or updated via
    user space or BPF program where we drop the old socket at that map
    slot, or the sock array is freed via sock_map_free() and drops all
    its elements; ii) for sock hash via smap_list_hash_remove() in exactly
    the same occasions as just described for sock array; iii) in the
    bpf_tcp_close() where we remove the elements from the list via
    psock_map_pop() and iterate over them dropping themselves from either
    sock array or sock hash; and last but not least iv) once again in
    smap_gc_work() which is a callback for deferring the work once the
    psock refcount hit zero and thus the socket is being destroyed.

    Problem is that the only case where we kfree() the list entry is
    in case iv), which at that point should have an empty list in
    normal cases. So in cases from i) to iii) we unlink the elements
    without freeing where they go out of reach from us. Hence fix is
    to properly kfree() them as well to stop the leakage. Given these
    are all handled under psock->maps_lock there is no need for deferred
    RCU freeing.

    I later also ran with kmemleak detector and it confirmed the finding
    as well where in the state before the fix the object goes unreferenced
    while after the patch no kmemleak report related to BPF showed up.

    [...]
    unreferenced object 0xffff880378eadae0 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................
    50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 PMu]............
    backtrace:
    [] sock_map_ctx_update_elem.isra.21+0xd8/0x210
    [] bpf_sock_map_update+0x29/0x60
    [] ___bpf_prog_run+0x1e1f/0x4960
    [] 0xffffffffffffffff
    unreferenced object 0xffff880378ead240 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................
    00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 .Du]............
    backtrace:
    [] sock_map_ctx_update_elem.isra.21+0xd8/0x210
    [] sock_map_update_elem+0x125/0x240
    [] map_update_elem+0x4eb/0x7b0
    [] __x64_sys_bpf+0x1f9/0x360
    [] do_syscall_64+0x9a/0x300
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [] 0xffffffffffffffff
    [...]

    Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
    Fixes: 54fedb42c653 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
    Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Commit 394e40a29788 ("bpf: extend bpf_prog_array to store pointers
    to the cgroup storage") refactored the bpf_prog_array_copy_core()
    to accommodate new structure bpf_prog_array_item which contains
    bpf_prog array itself.

    In the old code, we had
    perf_event_query_prog_array():
    mutex_lock(...)
    bpf_prog_array_copy_call():
    prog = rcu_dereference_check(array, 1)->progs
    bpf_prog_array_copy_core(prog, ...)
    mutex_unlock(...)

    With the above commit, we had
    perf_event_query_prog_array():
    mutex_lock(...)
    bpf_prog_array_copy_call():
    bpf_prog_array_copy_core(array, ...):
    item = rcu_dereference(array)->items;
    ...
    mutex_unlock(...)

    The new code will trigger a lockdep rcu checking warning.
    The fix is to change rcu_dereference() to rcu_dereference_check()
    to prevent such a warning.

    Reported-by: syzbot+6e72317008eef84a216b@syzkaller.appspotmail.com
    Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
    Cc: Roman Gushchin
    Signed-off-by: Yonghong Song
    Acked-by: Alexei Starovoitov
    Acked-by: Roman Gushchin
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     
  • Add the SPDX License header to ease license compliance management.

    Acked-by: Jens Axboe
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

16 Aug, 2018

8 commits

  • -mcount-nop gcc option generates the calls to the profiling functions
    as nops which allows to avoid patching mcount jump with NOP instructions
    initially.

    -mcount-nop gcc option will be activated if platform selects
    HAVE_NOP_MCOUNT and gcc actually supports it.
    In addition to that CC_USING_NOP_MCOUNT is defined and could be used by
    architectures to adapt ftrace patching behavior.

    Link: http://lkml.kernel.org/r/patch-3.thread-aa7b8d.git-e02ed2dc082b.your-ad-here.call-01533557518-ext-9465@work.hours

    Signed-off-by: Vasily Gorbik
    Signed-off-by: Steven Rostedt (VMware)

    Vasily Gorbik
     
  • Pull drm updates from Dave Airlie:
    "This is the main drm pull request for 4.19.

    Rob has some new hardware support for new qualcomm hw that I'll send
    along separately. This has the display part of it, the remaining pull
    is for the acceleration engine.

    This also contains a wound-wait/wait-die mutex rework, Peter has acked
    it for merging via my tree.

    Otherwise mostly the usual level of activity. Summary:

    core:
    - Wound-wait/wait-die mutex rework
    - Add writeback connector type
    - Add "content type" property for HDMI
    - Move GEM bo to drm_framebuffer
    - Initial gpu scheduler documentation
    - GPU scheduler fixes for dying processes
    - Console deferred fbcon takeover support
    - Displayport support for CEC tunneling over AUX

    panel:
    - otm8009a panel driver fixes
    - Innolux TV123WAM and G070Y2-L01 panel driver
    - Ilitek ILI9881c panel driver
    - Rocktech RK070ER9427 LCD
    - EDT ETM0700G0EDH6 and EDT ETM0700G0BDH6
    - DLC DLC0700YZG-1
    - BOE HV070WSA-100
    - newhaven, nhd-4.3-480272ef-atxl LCD
    - DataImage SCF0700C48GGU18
    - Sharp LQ035Q7DB03
    - p079zca: Refactor to support multiple panels

    tinydrm:
    - ILI9341 display panel

    New driver:
    - vkms - virtual kms driver to testing.

    i915:
    - Icelake:
    Display enablement
    DSI support
    IRQ support
    Powerwell support
    - GPU reset fixes and improvements
    - Full ppgtt support refactoring
    - PSR fixes and improvements
    - Execlist improvments
    - GuC related fixes

    amdgpu:
    - Initial amdgpu documentation
    - JPEG engine support on VCN
    - CIK uses powerplay by default
    - Move to using core PCIE functionality for gens/lanes
    - DC/Powerplay interface rework
    - Stutter mode support for RV
    - Vega12 Powerplay updates
    - GFXOFF fixes
    - GPUVM fault debugging
    - Vega12 GFXOFF
    - DC improvements
    - DC i2c/aux changes
    - UVD 7.2 fixes
    - Powerplay fixes for Polaris12, CZ/ST
    - command submission bo_list fixes

    amdkfd:
    - Raven support
    - Power management fixes

    udl:
    - Cleanups and fixes

    nouveau:
    - misc fixes and cleanups.

    msm:
    - DPU1 support display controller in sdm845
    - GPU coredump support.

    vmwgfx:
    - Atomic modesetting validation fixes
    - Support for multisample surfaces

    armada:
    - Atomic modesetting support completed.

    exynos:
    - IPPv2 fixes
    - Move g2d to component framework
    - Suspend/resume support cleanups
    - Driver cleanups

    imx:
    - CSI configuration improvements
    - Driver cleanups
    - Use atomic suspend/resume helpers
    - ipu-v3 V4L2 XRGB32/XBGR32 support

    pl111:
    - Add Nomadik LCDC variant

    v3d:
    - GPU scheduler jobs management

    sun4i:
    - R40 display engine support
    - TCON TOP driver

    mediatek:
    - MT2712 SoC support

    rockchip:
    - vop fixes

    omapdrm:
    - Workaround for DRA7 errata i932
    - Fix mm_list locking

    mali-dp:
    - Writeback implementation
    PM improvements
    - Internal error reporting debugfs

    tilcdc:
    - Single fix for deferred probing

    hdlcd:
    - Teardown fixes

    tda998x:
    - Converted to a bridge driver.

    etnaviv:
    - Misc fixes"

    * tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm: (1506 commits)
    drm/amdgpu/sriov: give 8s for recover vram under RUNTIME
    drm/scheduler: fix param documentation
    drm/i2c: tda998x: correct PLL divider calculation
    drm/i2c: tda998x: get rid of private fill_modes function
    drm/i2c: tda998x: move mode_valid() to bridge
    drm/i2c: tda998x: register bridge outside of component helper
    drm/i2c: tda998x: cleanup from previous changes
    drm/i2c: tda998x: allocate tda998x_priv inside tda998x_create()
    drm/i2c: tda998x: convert to bridge driver
    drm/scheduler: fix timeout worker setup for out of order job completions
    drm/amd/display: display connected to dp-1 does not light up
    drm/amd/display: update clk for various HDMI color depths
    drm/amd/display: program display clock on cache match
    drm/amd/display: Add NULL check for enabling dp ss
    drm/amd/display: add vbios table check for enabling dp ss
    drm/amd/display: Don't share clk source between DP and HDMI
    drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo
    drm/amd/display: Use calculated disp_clk_khz value for dce110
    drm/amd/display: Implement custom degamma lut on dcn
    drm/amd/display: Destroy aux_engines only once
    ...

    Linus Torvalds
     
  • Pull networking updates from David Miller:
    "Highlights:

    - Gustavo A. R. Silva keeps working on the implicit switch fallthru
    changes.

    - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
    Luca Coelho.

    - Re-enable ASPM in r8169, from Kai-Heng Feng.

    - Add virtual XFRM interfaces, which avoids all of the limitations of
    existing IPSEC tunnels. From Steffen Klassert.

    - Convert GRO over to use a hash table, so that when we have many
    flows active we don't traverse a long list during accumluation.

    - Many new self tests for routing, TC, tunnels, etc. Too many
    contributors to mention them all, but I'm really happy to keep
    seeing this stuff.

    - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

    - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

    - Add IPSEC offload support to netdevsim, from Shannon Nelson.

    - Add support for slotting with non-uniform distribution to netem
    packet scheduler, from Yousuk Seung.

    - Add UDP GSO support to mlx5e, from Boris Pismenny.

    - Support offloading of Team LAG in NFP, from John Hurley.

    - Allow to configure TX queue selection based upon RX queue, from
    Amritha Nambiar.

    - Support ethtool ring size configuration in aquantia, from Anton
    Mikaev.

    - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

    - Support list based batching and stack traversal of SKBs, this is
    very exciting work. From Edward Cree.

    - Busyloop optimizations in vhost_net, from Toshiaki Makita.

    - Introduce the ETF qdisc, which allows time based transmissions. IGB
    can offload this in hardware. From Vinicius Costa Gomes.

    - Add parameter support to devlink, from Moshe Shemesh.

    - Several multiplication and division optimizations for BPF JIT in
    nfp driver, from Jiong Wang.

    - Lots of prepatory work to make more of the packet scheduler layer
    lockless, when possible, from Vlad Buslov.

    - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
    Toke Høiland-Jørgensen.

    - Support regions and region snapshots in devlink, from Alex Vesker.

    - Allow to attach XDP programs to both HW and SW at the same time on
    a given device, with initial support in nfp. From Jakub Kicinski.

    - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

    - Use PHYLIB in r8169 driver, from Heiner Kallweit.

    - All sorts of changes to support Spectrum 2 in mlxsw driver, from
    Ido Schimmel.

    - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

    - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
    Maxwell.

    - Support for templates in packet scheduler classifier, from Jiri
    Pirko.

    - IPV6 support in RDS, from Ka-Cheong Poon.

    - Native tproxy support in nf_tables, from Máté Eckl.

    - Maintain IP fragment queue in an rbtree, but optimize properly for
    in-order frags. From Peter Oskolkov.

    - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
    bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
    hv/netvsc: Fix NULL dereference at single queue mode fallback
    net: filter: mark expected switch fall-through
    xen-netfront: fix warn message as irq device name has '/'
    cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
    net: dsa: mv88e6xxx: missing unlock on error path
    rds: fix building with IPV6=m
    inet/connection_sock: prefer _THIS_IP_ to current_text_addr
    net: dsa: mv88e6xxx: bitwise vs logical bug
    net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
    ieee802154: hwsim: using right kind of iteration
    net: hns3: Add vlan filter setting by ethtool command -K
    net: hns3: Set tx ring' tc info when netdev is up
    net: hns3: Remove tx ring BD len register in hns3_enet
    net: hns3: Fix desc num set to default when setting channel
    net: hns3: Fix for phy link issue when using marvell phy driver
    net: hns3: Fix for information of phydev lost problem when down/up
    net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
    net: hns3: Add support for serdes loopback selftest
    bnxt_en: take coredump_record structure off stack
    ...

    Linus Torvalds
     
  • Pull Kconfig consolidation from Masahiro Yamada:
    "Consolidation of Kconfig files by Christoph Hellwig.

    Move the source statements of arch-independent Kconfig files instead
    of duplicating the includes in every arch/$(SRCARCH)/Kconfig"

    * tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kconfig: add a Memory Management options" menu
    kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt
    kconfig: use a menu in arch/Kconfig to reduce clutter
    kconfig: include kernel/Kconfig.preempt from init/Kconfig
    Kconfig: consolidate the "Kernel hacking" menu
    kconfig: include common Kconfig files from top-level Kconfig
    kconfig: remove duplicate SWAP symbol defintions
    um: create a proper drivers Kconfig
    um: cleanup Kconfig files
    um: stop abusing KBUILD_KCONFIG

    Linus Torvalds
     
  • Pull Kbuild updates from Masahiro Yamada:

    - verify depmod is installed before modules_install

    - support build salt in case build ids must be unique between builds

    - allow users to specify additional host compiler flags via HOST*FLAGS,
    and rename internal variables to KBUILD_HOST*FLAGS

    - update buildtar script to drop vax support, add arm64 support

    - update builddeb script for better debarch support

    - document the pit-fall of if_changed usage

    - fix parallel build of UML with O= option

    - make 'samples' target depend on headers_install to fix build errors

    - remove deprecated host-progs variable

    - add a new coccinelle script for refcount_t vs atomic_t check

    - improve double-test coccinelle script

    - misc cleanups and fixes

    * tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
    coccicheck: return proper error code on fail
    Coccinelle: doubletest: reduce side effect false positives
    kbuild: remove deprecated host-progs variable
    kbuild: make samples really depend on headers_install
    um: clean up archheaders recipe
    kbuild: add %asm-generic to no-dot-config-targets
    um: fix parallel building with O= option
    scripts: Add Python 3 support to tracing/draw_functrace.py
    builddeb: Add automatic support for sh{3,4}{,eb} architectures
    builddeb: Add automatic support for riscv* architectures
    builddeb: Add automatic support for m68k architecture
    builddeb: Add automatic support for or1k architecture
    builddeb: Add automatic support for sparc64 architecture
    builddeb: Add automatic support for mips{,64}r6{,el} architectures
    builddeb: Add automatic support for mips64el architecture
    builddeb: Add automatic support for ppc64 and powerpcspe architectures
    builddeb: Introduce functions to simplify kconfig tests in set_debarch
    builddeb: Drop check for 32-bit s390
    builddeb: Change architecture detection fallback to use dpkg-architecture
    builddeb: Skip architecture detection when KBUILD_DEBARCH is set
    ...

    Linus Torvalds
     
  • Pull printk updates from Petr Mladek:

    - Different vendors have a different expectation about a console
    quietness. Make it configurable to reduce bike-shedding about the
    upstream default

    - Decide about the message visibility when the message is stored. It
    avoids races caused by a delayed console handling

    - Always store printk() messages into the per-CPU buffers again in NMI.
    The only exception is when flushing trace log in panic(). There the
    risk of loosing messages is worth an eventual reordering

    - Handle invalid %pO printf modifiers correctly

    - Better handle %p printf modifier tests before crng is initialized

    - Some clean up

    * tag 'printk-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    lib/vsprintf: Do not handle %pO[^F] as %px
    printk: Fix warning about unused suppress_message_printing
    printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
    printk: Create helper function to queue deferred console handling
    printk: Split the code for storing a message into the log buffer
    printk: Clean up syslog_print_all()
    printk: Remove unnecessary kmalloc() from syslog during clear
    printk: Make CONSOLE_LOGLEVEL_QUIET configurable
    printk: make sure to print log on console.
    lib/test_printf.c: accept "ptrval" as valid result for plain 'p' tests

    Linus Torvalds
     
  • Pull audit patches from Paul Moore:
    "Twelve audit patches for v4.19 and they run the full gamut from fixes
    to features.

    Notable changes include the ability to use the "exe" audit filter
    field in a wider variety of filter types, a fix for our comparison of
    GID/EGID in audit filter rules, better association of related audit
    records (connecting related audit records together into one audit
    event), and a fix for a potential use-after-free in audit_add_watch().

    All the patches pass the audit-testsuite and merge cleanly on your
    current master branch"

    * tag 'audit-pr-20180814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
    audit: fix use-after-free in audit_add_watch
    audit: use ktime_get_coarse_real_ts64() for timestamps
    audit: use ktime_get_coarse_ts64() for time access
    audit: simplify audit_enabled check in audit_watch_log_rule_change()
    audit: check audit_enabled in audit_tree_log_remove_rule()
    cred: conditionally declare groups-related functions
    audit: eliminate audit_enabled magic number comparison
    audit: rename FILTER_TYPE to FILTER_EXCLUDE
    audit: Fix extended comparison of GID/EGID
    audit: tie ANOM_ABEND records to syscall
    audit: tie SECCOMP records to syscall
    audit: allow other filter list types for AUDIT_EXE

    Linus Torvalds
     
  • Pull security subsystem updates from James Morris:

    - kstrdup() return value fix from Eric Biggers

    - Add new security_load_data hook to differentiate security checking of
    kernel-loaded binaries in the case of there being no associated file
    descriptor, from Mimi Zohar.

    - Add ability to IMA to specify a policy at build-time, rather than
    just via command line params or by loading a custom policy, from
    Mimi.

    - Allow IMA and LSMs to prevent sysfs firmware load fallback (e.g. if
    using signed firmware), from Mimi.

    - Allow IMA to deny loading of kexec kernel images, as they cannot be
    measured by IMA, from Mimi.

    * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    security: check for kstrdup() failure in lsm_append()
    security: export security_kernel_load_data function
    ima: based on policy warn about loading firmware (pre-allocated buffer)
    module: replace the existing LSM hook in init_module
    ima: add build time policy
    ima: based on policy require signed firmware (sysfs fallback)
    firmware: add call to LSM hook before firmware sysfs fallback
    ima: based on policy require signed kexec kernel images
    kexec: add call to LSM hook in original kexec_load syscall
    security: define new LSM hook named security_kernel_load_data
    MAINTAINERS: remove the outdated "LINUX SECURITY MODULE (LSM) FRAMEWORK" entry

    Linus Torvalds
     

15 Aug, 2018

7 commits

  • Pull arm64 updates from Will Deacon:
    "A bunch of good stuff in here. Worth noting is that we've pulled in
    the x86/mm branch from -tip so that we can make use of the core
    ioremap changes which allow us to put down huge mappings in the
    vmalloc area without screwing up the TLB. Much of the positive
    diffstat is because of the rseq selftest for arm64.

    Summary:

    - Wire up support for qspinlock, replacing our trusty ticket lock
    code

    - Add an IPI to flush_icache_range() to ensure that stale
    instructions fetched into the pipeline are discarded along with the
    I-cache lines

    - Support for the GCC "stackleak" plugin

    - Support for restartable sequences, plus an arm64 port for the
    selftest

    - Kexec/kdump support on systems booting with ACPI

    - Rewrite of our syscall entry code in C, which allows us to zero the
    GPRs on entry from userspace

    - Support for chained PMU counters, allowing 64-bit event counters to
    be constructed on current CPUs

    - Ensure scheduler topology information is kept up-to-date with CPU
    hotplug events

    - Re-enable support for huge vmalloc/IO mappings now that the core
    code has the correct hooks to use break-before-make sequences

    - Miscellaneous, non-critical fixes and cleanups"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
    arm64: alternative: Use true and false for boolean values
    arm64: kexec: Add comment to explain use of __flush_icache_range()
    arm64: sdei: Mark sdei stack helper functions as static
    arm64, kaslr: export offset in VMCOREINFO ELF notes
    arm64: perf: Add cap_user_time aarch64
    efi/libstub: Only disable stackleak plugin for arm64
    arm64: drop unused kernel_neon_begin_partial() macro
    arm64: kexec: machine_kexec should call __flush_icache_range
    arm64: svc: Ensure hardirq tracing is updated before return
    arm64: mm: Export __sync_icache_dcache() for xen-privcmd
    drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory
    arm64: Add support for STACKLEAK gcc plugin
    arm64: Add stack information to on_accessible_stack
    drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported
    arm64: fix ACPI dependencies
    rseq/selftests: Add support for arm64
    arm64: acpi: fix alignment fault in accessing ACPI
    efi/arm: map UEFI memory map even w/o runtime services enabled
    efi/arm: preserve early mapping of UEFI memory map longer for BGRT
    drivers: acpi: add dependency of EFI for arm64
    ...

    Linus Torvalds
     
  • Commit 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    breaks non-SMP builds.

    [ I suspect the 'bool' fields should just be made to be bitfields and be
    exposed regardless of configuration, but that's a separate cleanup
    that I'll leave to the owners of this file for later. - Linus ]

    Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
    Cc: Dave Hansen
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Abel Vesa
    Signed-off-by: Linus Torvalds

    Abel Vesa
     
  • Pull documentation update from Jonathan Corbet:
    "This was a moderately busy cycle for docs, with the usual collection
    of small fixes and updates.

    We also have new ktime_get_*() docs from Arnd, some kernel-doc fixes,
    a new set of Italian translations (non so se vale la pena, ma non fa
    male - speriamo bene), and some extensive early memory-management
    documentation improvements from Mike Rapoport"

    * tag 'docs-4.19' of git://git.lwn.net/linux: (52 commits)
    Documentation: corrections to console/console.txt
    Documentation: add ioctl number entry for v4l2-subdev.h
    Remove gendered language from management style documentation
    scripts/kernel-doc: Escape all literal braces in regexes
    docs/mm: add description of boot time memory management
    docs/mm: memblock: add overview documentation
    docs/mm: memblock: add kernel-doc description for memblock types
    docs/mm: memblock: add kernel-doc comments for memblock_add[_node]
    docs/mm: memblock: update kernel-doc comments
    mm/memblock: add a name for memblock flags enumeration
    docs/mm: bootmem: add overview documentation
    docs/mm: bootmem: add kernel-doc description of 'struct bootmem_data'
    docs/mm: bootmem: fix kernel-doc warnings
    docs/mm: nobootmem: fixup kernel-doc comments
    mm/bootmem: drop duplicated kernel-doc comments
    Documentation: vm.txt: Adding 'nr_hugepages_mempolicy' parameter description.
    doc:it_IT: translation for kernel-hacking
    docs: Fix the reference labels in Locking.rst
    doc: tracing: Fix a typo of trace_stat
    mm: Introduce new type vm_fault_t
    ...

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:
    "These add a new framework for CPU idle time injection, to be used by
    all of the idle injection code in the kernel in the future, fix some
    issues and add a number of relatively small extensions in multiple
    places.

    Specifics:

    - Add a new framework for CPU idle time injection (Daniel Lezcano).

    - Add AVS support to the armada-37xx cpufreq driver (Gregory
    CLEMENT).

    - Add support for current CPU frequency reporting to the ACPI CPPC
    cpufreq driver (George Cherian).

    - Rework the cooling device registration in the imx6q/thermal driver
    (Bastian Stender).

    - Make the pcc-cpufreq driver refuse to work with dynamic scaling
    governors on systems with many CPUs to avoid scalability issues
    with it (Rafael Wysocki).

    - Fix the intel_pstate driver to report different maximum CPU
    frequencies on systems where they really are different and to
    ignore the turbo active ratio if hardware-managend P-states (HWP)
    are in use; make it use the match_string() helper (Xie Yisheng,
    Srinivas Pandruvada).

    - Fix a minor deferred probe issue in the qcom-kryo cpufreq driver
    (Niklas Cassel).

    - Add a tracepoint for the tracking of frequency limits changes (from
    Andriod) to the cpufreq core (Ruchi Kandoi).

    - Fix a circular lock dependency between CPU hotplug and sysfs
    locking in the cpufreq core reported by lockdep (Waiman Long).

    - Avoid excessive error reports on driver registration failures in
    the ARM cpuidle driver (Sudeep Holla).

    - Add a new device links flag to the driver core to make links go
    away automatically on supplier driver removal (Vivek Gautam).

    - Eliminate potential race condition between system-wide power
    management transitions and system shutdown (Pingfan Liu).

    - Add a quirk to save NVS memory on system suspend for the ASUS 1025C
    laptop (Willy Tarreau).

    - Make more systems use suspend-to-idle (instead of ACPI S3) by
    default (Tristian Celestin).

    - Get rid of stack VLA usage in the low-level hibernation code on
    64-bit x86 (Kees Cook).

    - Fix error handling in the hibernation core and mark an expected
    fall-through switch in it (Chengguang Xu, Gustavo Silva).

    - Extend the generic power domains (genpd) framework to support
    attaching a device to a power domain by name (Ulf Hansson).

    - Fix device reference counting and user limits initialization in the
    devfreq core (Arvind Yadav, Matthias Kaehlcke).

    - Fix a few issues in the rk3399_dmc devfreq driver and improve its
    documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner).

    - Drop a redundant error message from the exynos-ppmu devfreq driver
    (Markus Elfring)"

    * tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
    PM / reboot: Eliminate race between reboot and suspend
    PM / hibernate: Mark expected switch fall-through
    cpufreq: intel_pstate: Ignore turbo active ratio in HWP
    cpufreq: Fix a circular lock dependency problem
    cpu/hotplug: Add a cpus_read_trylock() function
    x86/power/hibernate_64: Remove VLA usage
    cpufreq: trace frequency limits change
    cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP
    cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
    cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER
    cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
    cpufreq: armada-37xx: Add AVS support
    dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding
    PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload.
    PM / devfreq: Init user limits from OPP limits, not viceversa
    PM / devfreq: rk3399_dmc: fix spelling mistakes.
    PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer.
    dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional.
    PM / devfreq: rk3399_dmc: remove wait for dcf irq event.
    dt-bindings: clock: add rk3399 DDR3 standard speed bins.
    ...

    Linus Torvalds
     
  • Pull dma-mapping updates from Christoph Hellwig:

    - a series from Robin to fix bus imposed dma limits by adding a
    separate mask for them to struct device instead of trying to squeeze
    a second meaning out of the existing dma mask as we did before.

    This has ACKs from the various other subsystems touched

    - a small swiotlb cleanup from Kees (acked by Konrad)

    - conversion of nios2 and sh to the new generic dma-noncoherent code.

    Various other architecture conversions will come through the
    architectures maintainers trees.

    * tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping:
    sh: use generic dma_noncoherent_ops
    sh: split arch/sh/mm/consistent.c
    sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case
    sh: introduce a sh_cacheop_vaddr helper
    sh: simplify get_arch_dma_ops
    OF: Don't set default coherent DMA mask
    ACPI/IORT: Don't set default coherent DMA mask
    iommu/dma: Respect bus DMA limit for IOVAs
    of/device: Set bus DMA mask as appropriate
    ACPI/IORT: Set bus DMA mask as appropriate
    dma-mapping: Generalise dma_32bit_limit flag
    ACPI/IORT: Support address size limit for root complexes
    of/platform: Initialise default DMA masks
    nios2: use generic dma_noncoherent_ops
    swiotlb: clean up reporting
    dma-mapping: relax warning for per-device areas

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "First pull request for this merge window, there will also be a
    followup request with some stragglers.

    This pull request contains:

    - Fix for a thundering heard issue in the wbt block code (Anchal
    Agarwal)

    - A few NVMe pull requests:
    * Improved tracepoints (Keith)
    * Larger inline data support for RDMA (Steve Wise)
    * RDMA setup/teardown fixes (Sagi)
    * Effects log suppor for NVMe target (Chaitanya Kulkarni)
    * Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
    * TP4004 (ANA) support (Christoph)
    * Various NVMe fixes

    - Block io-latency controller support. Much needed support for
    properly containing block devices. (Josef)

    - Series improving how we handle sense information on the stack
    (Kees)

    - Lightnvm fixes and updates/improvements (Mathias/Javier et al)

    - Zoned device support for null_blk (Matias)

    - AIX partition fixes (Mauricio Faria de Oliveira)

    - DIF checksum code made generic (Max Gurtovoy)

    - Add support for discard in iostats (Michael Callahan / Tejun)

    - Set of updates for BFQ (Paolo)

    - Removal of async write support for bsg (Christoph)

    - Bio page dirtying and clone fixups (Christoph)

    - Set of bcache fix/changes (via Coly)

    - Series improving blk-mq queue setup/teardown speed (Ming)

    - Series improving merging performance on blk-mq (Ming)

    - Lots of other fixes and cleanups from a slew of folks"

    * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
    blkcg: Make blkg_root_lookup() work for queues in bypass mode
    bcache: fix error setting writeback_rate through sysfs interface
    null_blk: add lock drop/acquire annotation
    Blk-throttle: reduce tail io latency when iops limit is enforced
    block: paride: pd: mark expected switch fall-throughs
    block: Ensure that a request queue is dissociated from the cgroup controller
    block: Introduce blk_exit_queue()
    blkcg: Introduce blkg_root_lookup()
    block: Remove two superfluous #include directives
    blk-mq: count the hctx as active before allocating tag
    block: bvec_nr_vecs() returns value for wrong slab
    bcache: trivial - remove tailing backslash in macro BTREE_FLAG
    bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
    bcache: set max writeback rate when I/O request is idle
    bcache: add code comments for bset.c
    bcache: fix mistaken comments in request.c
    bcache: fix mistaken code comments in bcache.h
    bcache: add a comment in super.c
    bcache: avoid unncessary cache prefetch bch_btree_node_get()
    bcache: display rate debug parameters to 0 when writeback is not running
    ...

    Linus Torvalds
     
  • Merge L1 Terminal Fault fixes from Thomas Gleixner:
    "L1TF, aka L1 Terminal Fault, is yet another speculative hardware
    engineering trainwreck. It's a hardware vulnerability which allows
    unprivileged speculative access to data which is available in the
    Level 1 Data Cache when the page table entry controlling the virtual
    address, which is used for the access, has the Present bit cleared or
    other reserved bits set.

    If an instruction accesses a virtual address for which the relevant
    page table entry (PTE) has the Present bit cleared or other reserved
    bits set, then speculative execution ignores the invalid PTE and loads
    the referenced data if it is present in the Level 1 Data Cache, as if
    the page referenced by the address bits in the PTE was still present
    and accessible.

    While this is a purely speculative mechanism and the instruction will
    raise a page fault when it is retired eventually, the pure act of
    loading the data and making it available to other speculative
    instructions opens up the opportunity for side channel attacks to
    unprivileged malicious code, similar to the Meltdown attack.

    While Meltdown breaks the user space to kernel space protection, L1TF
    allows to attack any physical memory address in the system and the
    attack works across all protection domains. It allows an attack of SGX
    and also works from inside virtual machines because the speculation
    bypasses the extended page table (EPT) protection mechanism.

    The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646

    The mitigations provided by this pull request include:

    - Host side protection by inverting the upper address bits of a non
    present page table entry so the entry points to uncacheable memory.

    - Hypervisor protection by flushing L1 Data Cache on VMENTER.

    - SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
    by offlining the sibling CPU threads. The knobs are available on
    the kernel command line and at runtime via sysfs

    - Control knobs for the hypervisor mitigation, related to L1D flush
    and SMT control. The knobs are available on the kernel command line
    and at runtime via sysfs

    - Extensive documentation about L1TF including various degrees of
    mitigations.

    Thanks to all people who have contributed to this in various ways -
    patches, review, testing, backporting - and the fruitful, sometimes
    heated, but at the end constructive discussions.

    There is work in progress to provide other forms of mitigations, which
    might be less horrible performance wise for a particular kind of
    workloads, but this is not yet ready for consumption due to their
    complexity and limitations"

    * 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
    x86/microcode: Allow late microcode loading with SMT disabled
    tools headers: Synchronise x86 cpufeatures.h for L1TF additions
    x86/mm/kmmio: Make the tracer robust against L1TF
    x86/mm/pat: Make set_memory_np() L1TF safe
    x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
    x86/speculation/l1tf: Invert all not present mappings
    cpu/hotplug: Fix SMT supported evaluation
    KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
    x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
    x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
    Documentation/l1tf: Remove Yonah processors from not vulnerable list
    x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
    x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
    x86: Don't include linux/irq.h from asm/hardirq.h
    x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
    x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
    x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
    x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
    x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
    cpu/hotplug: detect SMT disabled by BIOS
    ...

    Linus Torvalds
     

14 Aug, 2018

3 commits

  • Petr Mladek
     
  • Merge cpufreq changes for 4.19.

    These are driver extensions, some driver and core fixes and a new
    tracepoint for the tracking of frequency limits changes (coming from
    Android).

    * pm-cpufreq:
    cpufreq: intel_pstate: Ignore turbo active ratio in HWP
    cpufreq: Fix a circular lock dependency problem
    cpu/hotplug: Add a cpus_read_trylock() function
    cpufreq: trace frequency limits change
    cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP
    cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
    cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER
    cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
    cpufreq: armada-37xx: Add AVS support
    dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding
    cpufreq: imx6q/thermal: imx: register cooling device depending on OF
    cpufreq: intel_pstate: use match_string() helper

    Rafael J. Wysocki
     
  • Merge changes in the PM core, system-wide PM infrastructure, generic
    power domains (genpd) framework, ACPI PM infrastructure and cpuidle
    for 4.19.

    * pm-core:
    driver core: Add flag to autoremove device link on supplier unbind
    driver core: Rename flag AUTOREMOVE to AUTOREMOVE_CONSUMER

    * pm-domains:
    PM / Domains: Introduce dev_pm_domain_attach_by_name()
    PM / Domains: Introduce option to attach a device by name to genpd
    PM / Domains: dt: Add a power-domain-names property

    * pm-sleep:
    PM / reboot: Eliminate race between reboot and suspend
    PM / hibernate: Mark expected switch fall-through
    x86/power/hibernate_64: Remove VLA usage
    PM / hibernate: cast PAGE_SIZE to int when comparing with error code

    * acpi-pm:
    ACPI / PM: save NVS memory for ASUS 1025C laptop
    ACPI / PM: Default to s2idle in all machines supporting LP S0

    * pm-cpuidle:
    ARM: cpuidle: silence error on driver registration failure

    Rafael J. Wysocki