29 Mar, 2019

1 commit

  • net_hash_mix() currently uses kernel address of a struct net,
    and is used in many places that could be used to reveal this
    address to a patient attacker, thus defeating KASLR, for
    the typical case (initial net namespace, &init_net is
    not dynamically allocated)

    I believe the original implementation tried to avoid spending
    too many cycles in this function, but security comes first.

    Also provide entropy regardless of CONFIG_NET_NS.

    Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes")
    Signed-off-by: Eric Dumazet
    Reported-by: Amit Klein
    Reported-by: Benny Pinkas
    Cc: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Mar, 2019

2 commits

  • With a recent link mode advertisement code update this helper
    providing local pause capability translation used for flow
    control link mode negotiation got broken.
    For eth drivers using this helper, the issue is apparent only
    if either PAUSE or ASYM_PAUSE is being advertised.

    Fixes: 3c1bcc8614db ("net: ethernet: Convert phydev advertize and supported from u32 to link mode")
    Signed-off-by: Claudiu Manoil
    Signed-off-by: David S. Miller

    Claudiu Manoil
     
  • Pull networking fixes from David Miller:
    "Fixes here and there, a couple new device IDs, as usual:

    1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.

    2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.

    3) Fix documentation for some eBPF helpers, from Quentin Monnet.

    4) Some UAPI bpf header sync with tools, also from Quentin Monnet.

    5) Set descriptor ownership bit at the right time for jumbo frames in
    stmmac driver, from Aaro Koskinen.

    6) Set IFF_UP properly in tun driver, from Eric Dumazet.

    7) Fix load/store doubleword instruction generation in powerpc eBPF
    JIT, from Naveen N. Rao.

    8) nla_nest_start() return value checks all over, from Kangjie Lu.

    9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
    merge window. From Marcelo Ricardo Leitner and Xin Long.

    10) Fix memory corruption with large MTUs in stmmac, from Aaro
    Koskinen.

    11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
    Dumazet.

    12) Fix topology subscription cancellation in tipc, from Erik Hugne.

    13) Memory leak in genetlink error path, from Yue Haibing.

    14) Valid control actions properly in packet scheduler, from Davide
    Caratti.

    15) Even if we get EEXIST, we still need to rehash if a shrink was
    delayed. From Herbert Xu.

    16) Fix interrupt mask handling in interrupt handler of r8169, from
    Heiner Kallweit.

    17) Fix leak in ehea driver, from Wen Yang"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
    dpaa2-eth: fix race condition with bql frame accounting
    chelsio: use BUG() instead of BUG_ON(1)
    net: devlink: skip info_get op call if it is not defined in dumpit
    net: phy: bcm54xx: Encode link speed and activity into LEDs
    tipc: change to check tipc_own_id to return in tipc_net_stop
    net: usb: aqc111: Extend HWID table by QNAP device
    net: sched: Kconfig: update reference link for PIE
    net: dsa: qca8k: extend slave-bus implementations
    net: dsa: qca8k: remove leftover phy accessors
    dt-bindings: net: dsa: qca8k: support internal mdio-bus
    dt-bindings: net: dsa: qca8k: fix example
    net: phy: don't clear BMCR in genphy_soft_reset
    bpf, libbpf: clarify bump in libbpf version info
    bpf, libbpf: fix version info and add it to shared object
    rxrpc: avoid clang -Wuninitialized warning
    tipc: tipc clang warning
    net: sched: fix cleanup NULL pointer exception in act_mirr
    r8169: fix cable re-plugging issue
    net: ethernet: ti: fix possible object reference leak
    net: ibm: fix possible object reference leak
    ...

    Linus Torvalds
     

27 Mar, 2019

1 commit


26 Mar, 2019

1 commit

  • This reverts commit 1aec4211204d9463d1fd209eb50453de16254599.

    Steven Rostedt reports that it causes a hang at bootup and bisected it
    to this commit.

    The troigger is apparently a module alias for "parport_lowlevel" that
    points to "parport_pc", which causes a hang with

    modprobe -q -- parport_lowlevel

    blocking forever with a backtrace like this:

    wait_for_completion_killable+0x1c/0x28
    call_usermodehelper_exec+0xa7/0x108
    __request_module+0x351/0x3d8
    get_lowlevel_driver+0x28/0x41 [parport]
    __parport_register_driver+0x39/0x1f4 [parport]
    daisy_drv_init+0x31/0x4f [parport]
    parport_bus_init+0x5d/0x7b [parport]
    parport_default_proc_register+0x26/0x1000 [parport]
    do_one_initcall+0xc2/0x1e0
    do_init_module+0x50/0x1d4
    load_module+0x1c2e/0x21b3
    sys_init_module+0xef/0x117

    Supid says:
    "Due to the new device model daisy driver will now try to find the
    parallel ports while trying to register its driver so that it can bind
    with them. Now, since daisy driver is loaded while parport bus is
    initialising the list of parport is still empty and it tries to load
    the lowlevel driver, which has an alias set to parport_pc, now causes
    a deadlock"

    But I don't think the daisy driver should be loaded by the parport
    initialization in the first place, so let's revert the whole change.

    If the daisy driver can just initialize separately on its own (like a
    driver should), instead of hooking into the parport init sequence
    directly, this issue probably would go away.

    Reported-and-bisected-by: Steven Rostedt (VMware)
    Reported-by: Michal Kubecek
    Acked-by: Greg Kroah-Hartman
    Cc: Sudip Mukherjee
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

25 Mar, 2019

3 commits

  • Pull x86 fixes from Thomas Gleixner:
    "A set of x86 fixes:

    - Prevent potential NULL pointer dereferences in the HPET and HyperV
    code

    - Exclude the GART aperture from /proc/kcore to prevent kernel
    crashes on access

    - Use the correct macros for Cyrix I/O on Geode processors

    - Remove yet another kernel address printk leak

    - Announce microcode reload completion as requested by quite some
    people. Microcode loading has become popular recently.

    - Some 'Make Clang' happy fixlets

    - A few cleanups for recently added code"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/gart: Exclude GART aperture from kcore
    x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
    x86/mm/pti: Make local symbols static
    x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors
    x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
    x86/microcode: Announce reload operation's completion
    x86/hyperv: Prevent potential NULL pointer dereference
    x86/hpet: Prevent potential NULL pointer dereference
    x86/lib: Fix indentation issue, remove extra tab
    x86/boot: Restrict header scope to make Clang happy
    x86/mm: Don't leak kernel addresses
    x86/cpufeature: Fix various quality problems in the header

    Linus Torvalds
     
  • Pull irq fixes from Thomas Gleixner:
    "A set of fixes for the interrupt subsystem:

    - Remove secondary GIC support on systems w/o device-tree support

    - A set of small fixlets in various irqchip drivers

    - static and fall-through annotations

    - Kernel doc and typo fixes"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Mark expected switch case fall-through
    genirq/devres: Remove excess parameter from kernel doc
    irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static
    irqchip/mbigen: Don't clear eventid when freeing an MSI
    irqchip/stm32: Don't set rising configuration registers at init
    irqchip/stm32: Don't clear rising/falling config registers at init
    dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support
    irqchip/mmp: Make mmp_irq_domain_ops static
    irqchip/brcmstb-l2: Make two init functions static
    genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
    irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
    irqchip/gic: Drop support for secondary GIC in non-DT systems
    irqchip/imx-irqsteer: Fix of_property_read_u32() error handling

    Linus Torvalds
     
  • Pull auxdisplay updates from Miguel Ojeda:
    "A few fixes and improvements for auxdisplay:

    - Series to fix a memory leak in hd44780 while introducing
    charlcd_free(). From Andy Shevchenko

    - Series to clean up the Kconfig menus and a couple of improvements
    for charlcd. From Mans Rullgard"

    * tag 'auxdisplay-for-linus-v5.1-rc2' of git://github.com/ojeda/linux:
    auxdisplay: charlcd: make backlight initial state configurable
    auxdisplay: charlcd: simplify init message display
    auxdisplay: deconfuse configuration
    auxdisplay: hd44780: Convert to use charlcd_free()
    auxdisplay: panel: Convert to use charlcd_free()
    auxdisplay: charlcd: Introduce charlcd_free() helper
    auxdisplay: charlcd: Move to_priv() to charlcd namespace
    auxdisplay: hd44780: Fix memory leak on ->remove()

    Linus Torvalds
     

24 Mar, 2019

3 commits

  • Pull io_uring fixes and improvements from Jens Axboe:
    "The first five in this series are heavily inspired by the work Al did
    on the aio side to fix the races there.

    The last two re-introduce a feature that was in io_uring before it got
    merged, but which I pulled since we didn't have a good way to have
    BVEC iters that already have a stable reference. These aren't
    necessarily related to block, it's just how io_uring pins fixed
    buffers"

    * tag 'io_uring-20190323' of git://git.kernel.dk/linux-block:
    block: add BIO_NO_PAGE_REF flag
    iov_iter: add ITER_BVEC_FLAG_NO_REF flag
    io_uring: mark me as the maintainer
    io_uring: retry bulk slab allocs as single allocs
    io_uring: fix poll races
    io_uring: fix fget/fput handling
    io_uring: add prepped flag
    io_uring: make io_read/write return an integer
    io_uring: use regular request ref counts

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "A set of fixes/changes that should go into this series. This contains:

    - Kernel doc / comment updates (Bart, Shenghui)

    - Un-export of core-only used function (Bart)

    - Fix race on loop file access (Dongli)

    - pf/pcd queue cleanup fixes (me)

    - Use appropriate helper for RESTART bit set (Yufen)

    - Use named identifier for classic poll (Yufen)"

    * tag 'for-linus-20190323' of git://git.kernel.dk/linux-block:
    sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
    blkcg: Fix kernel-doc warnings
    blk-iolatency: #include "blk.h"
    block: Unexport blk_mq_add_to_requeue_list()
    block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
    blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
    loop: access lo_backing_file only when the loop device is Lo_bound
    blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART
    paride/pcd: cleanup queues when detection fails
    paride/pf: cleanup queues when detection fails

    Linus Torvalds
     
  • Pull ceph fixes from Ilya Dryomov:
    "A follow up for the new alloc_size logic and a blacklisting fix,
    marked for stable"

    * tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client:
    rbd: drop wait_for_latest_osdmap()
    libceph: wait for latest osdmap in ceph_monc_blacklist_add()
    rbd: set io_min, io_opt and discard_granularity to alloc_size

    Linus Torvalds
     

23 Mar, 2019

2 commits

  • On machines where the GART aperture is mapped over physical RAM,
    /proc/kcore contains the GART aperture range. Accessing the GART range via
    /proc/kcore results in a kernel crash.

    vmcore used to have the same issue, until it was fixed with commit
    2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging
    existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
    when attempting to read the aperture region, and so it won't read from the
    actual memory.

    Apply the same workaround for kcore. First implement the same hook
    infrastructure for kcore, then reuse the hook functions introduced in the
    previous vmcore fix. Just with some minor adjustment, rename some functions
    for more general usage, and simplify the hook infrastructure a bit as there
    is no module usage yet.

    Suggested-by: Baoquan He
    Signed-off-by: Kairui Song
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Jiri Bohac
    Acked-by: Baoquan He
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Cc: Omar Sandoval
    Cc: Dave Young
    Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com

    Kairui Song
     
  • "sbitmap_batch_clear" should be "sbitmap_deferred_clear"

    Acked-by: Omar Sandoval
    Signed-off-by: Shenghui Wang
    Signed-off-by: Jens Axboe

    Shenghui Wang
     

22 Mar, 2019

3 commits

  • use RCU when accessing the action chain, to avoid use after free in the
    traffic path when 'goto chain' is replaced on existing TC actions (see
    script below). Since the control action is read in the traffic path
    without holding the action spinlock, we need to explicitly ensure that
    a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
    to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
    dereferences in tcf_action_goto_chain_exec() when the following script:

    # tc chain add dev dd0 chain 42 ingress protocol ip flower \
    > ip_proto udp action pass index 4
    # tc filter add dev dd0 ingress protocol ip flower \
    > ip_proto udp action csum udp goto chain 42 index 66
    # tc chain del dev dd0 chain 42 ingress
    (start UDP traffic towards dd0)
    # tc action replace action csum udp pass index 66

    was run repeatedly for several hours.

    Suggested-by: Cong Wang
    Suggested-by: Vlad Buslov
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • callers of tcf_gact_goto_chain_index() can potentially read an old value
    of the chain index, or even dereference a NULL 'goto_chain' pointer,
    because 'goto_chain' and 'tcfa_action' are read in the traffic path
    without caring of concurrent write in the control path. The most recent
    value of chain index can be read also from a->tcfa_action (it's encoded
    there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to
    dereference 'goto_chain': just read the chain id from the control action.

    Fixes: e457d86ada27 ("net: sched: add couple of goto_chain helpers")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • - pass a pointer to struct tcf_proto in each actions's init() handler,
    to allow validating the control action, checking whether the chain
    exists and (eventually) refcounting it.
    - remove code that validates the control action after a successful call
    to the action's init() handler, and replace it with a test that forbids
    addition of actions having 'goto_chain' and NULL goto_chain pointer at
    the same time.
    - add tcf_action_check_ctrlact(), that will validate the control action
    and eventually allocate the action 'goto_chain' within the init()
    handler.
    - add tcf_action_set_ctrlact(), that will assign the control action and
    swap the current 'goto_chain' pointer with the new given one.

    This disallows 'goto_chain' on actions that don't initialize it properly
    in their init() handler, i.e. calling tcf_action_check_ctrlact() after
    successful IDR reservation and then calling tcf_action_set_ctrlact()
    to assign 'goto_chain' and 'tcf_action' consistently.

    By doing this, the kernel does not leak anymore refcounts when a valid
    'goto chain' handle is replaced in TC actions, causing kmemleak splats
    like the following one:

    # tc chain add dev dd0 chain 42 ingress protocol ip flower \
    > ip_proto tcp action drop
    # tc chain add dev dd0 chain 43 ingress protocol ip flower \
    > ip_proto udp action drop
    # tc filter add dev dd0 ingress matchall \
    > action gact goto chain 42 index 66
    # tc filter replace dev dd0 ingress matchall \
    > action gact goto chain 43 index 66
    # echo scan >/sys/kernel/debug/kmemleak

    unreferenced object 0xffff93c0ee09f000 (size 1024):
    comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] tc_ctl_chain+0x3d2/0x4c0
    [] rtnetlink_rcv_msg+0x263/0x2d0
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x1a0/0x250
    [] netlink_sendmsg+0x2c1/0x3c0
    [] sock_sendmsg+0x36/0x40
    [] ___sys_sendmsg+0x280/0x2f0
    [] __sys_sendmsg+0x5e/0xa0
    [] do_syscall_64+0x5b/0x180
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [] 0xffffffffffffffff

    Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
    Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     

21 Mar, 2019

4 commits


20 Mar, 2019

1 commit

  • Because map updates are distributed lazily, an OSD may not know about
    the new blacklist for quite some time after "osd blacklist add" command
    is completed. This makes it possible for a blacklisted but still alive
    client to overwrite a post-blacklist update, resulting in data
    corruption.

    Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
    the post-blacklist epoch for all post-blacklist requests ensures that
    all such requests "wait" for the blacklist to come into force on their
    respective OSDs.

    Cc: stable@vger.kernel.org
    Fixes: 6305a3b41515 ("libceph: support for blacklisting clients")
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jason Dillaman

    Ilya Dryomov
     

19 Mar, 2019

5 commits

  • There is no usage of 'nr_expired'.

    The 'nr_expired' was introduced by commit 1d9bd5161ba3 ("blk-mq: replace
    timeout synchronization with a RCU and generation based scheme"). Its usage
    was removed since commit 12f5b9314545 ("blk-mq: Remove generation
    seqeunce").

    Signed-off-by: Dongli Zhang
    Signed-off-by: Jens Axboe

    Dongli Zhang
     
  • sctp_hdr(skb) only works when skb->transport_header is set properly.

    But in Netfilter, skb->transport_header for ipv6 is not guaranteed
    to be right value for sctphdr. It would cause to fail to check the
    checksum for sctp packets.

    So fix it by using offset, which is always right in all places.

    v1->v2:
    - Fix the changelog.

    Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code")
    Reported-by: Li Shuang
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • When using fanouts with AF_PACKET, the demux functions such as
    fanout_demux_cpu will return an index in the fanout socket array, which
    corresponds to the selected socket.

    The ordering of this array depends on the order the sockets were added
    to a given fanout group, so for FANOUT_CPU this means sockets are bound
    to cpus in the order they are configured, which is OK.

    However, when stopping then restarting the interface these sockets are
    bound to, the sockets are reassigned to the fanout group in the reverse
    order, due to the fact that they were inserted at the head of the
    interface's AF_PACKET socket list.

    This means that traffic that was directed to the first socket in the
    fanout group is now directed to the last one after an interface restart.

    In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
    socket that used to receive traffic from the last CPU after an interface
    restart.

    This commit introduces a helper to add a socket at the tail of a list,
    then uses it to register AF_PACKET sockets.

    Note that this changes the order in which sockets are listed in /proc and
    with sock_diag.

    Fixes: dc99f600698d ("packet: Add fanout support")
    Signed-off-by: Maxime Chevallier
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Maxime Chevallier
     
  • If bio_iov_iter_get_pages() is called on an iov_iter that is flagged
    with NO_REF, then we don't need to add a page reference for the pages
    that we add.

    Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows
    not to drop a reference to these pages.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • For ITER_BVEC, if we're holding on to kernel pages, the caller
    doesn't need to grab a reference to the bvec pages, and drop that
    same reference on IO completion. This is essentially safe for any
    ITER_BVEC, but some use cases end up reusing pages and uncondtionally
    dropping a page reference on completion. And example of that is
    sendfile(2), that ends up being a splice_in + splice_out on the
    pipe pages.

    Add a flag that tells us it's fine to not grab a page reference
    to the bvec pages, since that caller knows not to drop a reference
    when it's done with the pages.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Mar, 2019

2 commits

  • To prevent a hardware memory leak when a DEVX DCT object is destroyed
    without calling DRAIN DCT before, (e.g. under cleanup flow), need to
    manage its creation and destruction via mlx5 core.

    In that case the DRAIN DCT command will be called and only once that it
    will be completed the DESTROY DCT command will be called. Otherwise, the
    DESTROY DCT may fail and a hardware leak may occur.

    As of that change the DRAIN DCT command should not be exposed any more
    from DEVX, it's managed internally by the driver to work as expected by
    the device specification.

    Fixes: 7efce3691d33 ("IB/mlx5: Add obj create and destroy functionality")
    Signed-off-by: Yishai Hadas
    Reviewed-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Yishai Hadas
     
  • Pull more Kbuild updates from Masahiro Yamada:

    - add more Build-Depends to Debian source package

    - prefix header search paths with $(srctree)/

    - make modpost show verbose section mismatch warnings

    - avoid hard-coded CROSS_COMPILE for h8300

    - fix regression for Debian make-kpkg command

    - add semantic patch to detect missing put_device()

    - fix some warnings of 'make deb-pkg'

    - optimize NOSTDINC_FLAGS evaluation

    - add warnings about redundant generic-y

    - clean up Makefiles and scripts

    * tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kconfig: remove stale lxdialog/.gitignore
    kbuild: force all architectures except um to include mandatory-y
    kbuild: warn redundant generic-y
    Revert "modsign: Abort modules_install when signing fails"
    kbuild: Make NOSTDINC_FLAGS a simply expanded variable
    kbuild: deb-pkg: avoid implicit effects
    coccinelle: semantic code search for missing put_device()
    kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG
    kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb
    kbuild: deb-pkg: add CONFIG_ prefix to kernel config options
    kbuild: add workaround for Debian make-kpkg
    kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG}
    unicore32: simplify linker script generation for decompressor
    h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
    kbuild: move archive command to scripts/Makefile.lib
    modpost: always show verbose warning for section mismatch
    ia64: prefix header search path with $(srctree)/
    libfdt: prefix header search paths with $(srctree)/
    deb-pkg: generate correct build dependencies

    Linus Torvalds
     

17 Mar, 2019

6 commits

  • The charlcd_free() is a counterpart to charlcd_alloc()
    and should be called symmetrically on tear down.

    Reviewed-by: Geert Uytterhoeven
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Miguel Ojeda

    Andy Shevchenko
     
  • Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes
    the common Kbuild.asm file. Factor out the duplicated include directives
    to scripts/Makefile.asm-generic so that no architecture would opt out
    of the mandatory-y mechanism.

    um is not forced to include mandatory-y since it is a very exceptional
    case which does not support UAPI.

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     
  • Pull pidfd system call from Christian Brauner:
    "This introduces the ability to use file descriptors from /proc//
    as stable handles on struct pid. Even if a pid is recycled the handle
    will not change. For a start these fds can be used to send signals to
    the processes they refer to.

    With the ability to use /proc/ fds as stable handles on struct
    pid we can fix a long-standing issue where after a process has exited
    its pid can be reused by another process. If a caller sends a signal
    to a reused pid it will end up signaling the wrong process.

    With this patchset we enable a variety of use cases. One obvious
    example is that we can now safely delegate an important part of
    process management - sending signals - to processes other than the
    parent of a given process by sending file descriptors around via scm
    rights and not fearing that the given process will have been recycled
    in the meantime. It also allows for easy testing whether a given
    process is still alive or not by sending signal 0 to a pidfd which is
    quite handy.

    There has been some interest in this feature e.g. from systems
    management (systemd, glibc) and container managers. I have requested
    and gotten comments from glibc to make sure that this syscall is
    suitable for their needs as well. In the future I expect it to take on
    most other pid-based signal syscalls. But such features are left for
    the future once they are needed.

    This has been sitting in linux-next for quite a while and has not
    caused any issues. It comes with selftests which verify basic
    functionality and also test that a recycled pid cannot be signaled via
    a pidfd.

    Jon has written about a prior version of this patchset. It should
    cover the basic functionality since not a lot has changed since then:

    https://lwn.net/Articles/773459/

    The commit message for the syscall itself is extensively documenting
    the syscall, including it's functionality and extensibility"

    * tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    selftests: add tests for pidfd_send_signal()
    signal: add pidfd_send_signal() syscall

    Linus Torvalds
     
  • Pull device-dax updates from Dan Williams:
    "New device-dax infrastructure to allow persistent memory and other
    "reserved" / performance differentiated memories, to be assigned to
    the core-mm as "System RAM".

    Some users want to use persistent memory as additional volatile
    memory. They are willing to cope with potential performance
    differences, for example between DRAM and 3D Xpoint, and want to use
    typical Linux memory management apis rather than a userspace memory
    allocator layered over an mmap() of a dax file. The administration
    model is to decide how much Persistent Memory (pmem) to use as System
    RAM, create a device-dax-mode namespace of that size, and then assign
    it to the core-mm. The rationale for device-dax is that it is a
    generic memory-mapping driver that can be layered over any "special
    purpose" memory, not just pmem. On subsequent boots udev rules can be
    used to restore the memory assignment.

    One implication of using pmem as RAM is that mlock() no longer keeps
    data off persistent media. For this reason it is recommended to enable
    NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
    at rest. We considered making this recommendation an actively enforced
    requirement, but in the end decided to leave it as a distribution /
    administrator policy to allow for emulation and test environments that
    lack security capable NVDIMMs.

    Summary:

    - Replace the /sys/class/dax device model with /sys/bus/dax, and
    include a compat driver so distributions can opt-in to the new ABI.

    - Allow for an alternative driver for the device-dax address-range

    - Introduce the 'kmem' driver to hotplug / assign a device-dax
    address-range to the core-mm.

    - Arrange for the device-dax target-node to be onlined so that the
    newly added memory range can be uniquely referenced by numa apis"

    NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
    we currently have special - and very annoying rules in the kernel about
    accessing PMEM only with the "MC safe" accessors, because machine checks
    inside the regular repeat string copy functions can be fatal in some
    (not described) circumstances.

    And apparently the PMEM modules can cause that a lot more than regular
    RAM. The argument is that this happens because PMEM doesn't necessarily
    get scrubbed at boot like RAM does, but that is planned to be added for
    the user space tooling.

    Quoting Dan from another email:
    "The exposure can be reduced in the volatile-RAM case by scanning for
    and clearing errors before it is onlined as RAM. The userspace tooling
    for that can be in place before v5.1-final. There's also runtime
    notifications of errors via acpi_nfit_uc_error_notify() from
    background scrubbers on the DIMM devices. With that mechanism the
    kernel could proactively clear newly discovered poison in the volatile
    case, but that would be additional development more suitable for v5.2.

    I understand the concern, and the need to highlight this issue by
    tapping the brakes on feature development, but I don't see PMEM as RAM
    making the situation worse when the exposure is also there via DAX in
    the PMEM case. Volatile-RAM is arguably a safer use case since it's
    possible to repair pages where the persistent case needs active
    application coordination"

    * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    device-dax: "Hotplug" persistent memory for use like normal RAM
    mm/resource: Let walk_system_ram_range() search child resources
    mm/memory-hotplug: Allow memory resources to be children
    mm/resource: Move HMM pr_debug() deeper into resource code
    mm/resource: Return real error codes from walk failures
    device-dax: Add a 'modalias' attribute to DAX 'bus' devices
    device-dax: Add a 'target_node' attribute
    device-dax: Auto-bind device after successful new_id
    acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
    device-dax: Add /sys/class/dax backwards compatibility
    device-dax: Add support for a dax override driver
    device-dax: Move resource pinning+mapping into the common driver
    device-dax: Introduce bus + driver model
    device-dax: Start defining a dax bus model
    device-dax: Remove multi-resource infrastructure
    device-dax: Kill dax_region base
    device-dax: Kill dax_region ida

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Bugfixes:
    - Fix an Oops in SUNRPC back channel tracepoints
    - Fix a SUNRPC client regression when handling oversized replies
    - Fix the minimal size for SUNRPC reply buffer allocation
    - rpc_decode_header() must always return a non-zero value on error
    - Fix a typo in pnfs_update_layout()

    Cleanup:
    - Remove redundant check for the reply length in call_decode()"

    * tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    SUNRPC: Remove redundant check for the reply length in call_decode()
    SUNRPC: Handle the SYSTEM_ERR rpc error
    SUNRPC: rpc_decode_header() must always return a non-zero value on error
    SUNRPC: Use the ENOTCONN error on socket disconnect
    SUNRPC: Fix the minimal size for reply buffer allocation
    SUNRPC: Fix a client regression when handling oversized replies
    pNFS: Fix a typo in pnfs_update_layout
    fix null pointer deref in tracepoints in back channel

    Linus Torvalds
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2019-03-16

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix a umem memory leak on cleanup in AF_XDP, from Björn.

    2) Fix BTF to properly resolve forward-declared enums into their corresponding
    full enum definition types during deduplication, from Andrii.

    3) Fix libbpf to reject invalid flags in xsk_socket__create(), from Magnus.

    4) Fix accessing invalid pointer returned from bpf_tcp_sock() and
    bpf_sk_fullsock() after bpf_sk_release() was called, from Martin.

    5) Fix generation of load/store DW instructions in PPC JIT, from Naveen.

    6) Various fixes in BPF helper function documentation in bpf.h UAPI header
    used to bpf-helpers(7) man page, from Quentin.

    7) Fix segfault in BPF test_progs when prog loading failed, from Yonghong.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Mar, 2019

6 commits

  • When the umem is cleaned up, the task that created it might already be
    gone. If the task was gone, the xdp_umem_release function did not free
    the pages member of struct xdp_umem.

    It turned out that the task lookup was not needed at all; The code was
    a left-over when we moved from task accounting to user accounting [1].

    This patch fixes the memory leak by removing the task lookup logic
    completely.

    [1] https://lore.kernel.org/netdev/20180131135356.19134-3-bjorn.topel@gmail.com/

    Link: https://lore.kernel.org/netdev/c1cb2ca8-6a14-3980-8672-f3de0bb38dfd@suse.cz/
    Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt")
    Reported-by: Jiri Slaby
    Signed-off-by: Björn Töpel
    Signed-off-by: Daniel Borkmann

    Björn Töpel
     
  • Adds missing sphinx documentation to the
    socket.c's functions. Also fixes some whitespaces.

    I also changed the style of older documentation as an
    effort to have an uniform documentation style.

    Signed-off-by: Pedro Tammela
    Signed-off-by: David S. Miller

    Pedro Tammela
     
  • Pull KVM updates from Paolo Bonzini:
    "ARM:
    - some cleanups
    - direct physical timer assignment
    - cache sanitization for 32-bit guests

    s390:
    - interrupt cleanup
    - introduction of the Guest Information Block
    - preparation for processor subfunctions in cpu models

    PPC:
    - bug fixes and improvements, especially related to machine checks
    and protection keys

    x86:
    - many, many cleanups, including removing a bunch of MMU code for
    unnecessary optimizations
    - AVIC fixes

    Generic:
    - memcg accounting"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
    kvm: vmx: fix formatting of a comment
    KVM: doc: Document the life cycle of a VM and its resources
    MAINTAINERS: Add KVM selftests to existing KVM entry
    Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
    KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
    KVM: PPC: Fix compilation when KVM is not enabled
    KVM: Minor cleanups for kvm_main.c
    KVM: s390: add debug logging for cpu model subfunctions
    KVM: s390: implement subfunction processor calls
    arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
    KVM: arm/arm64: Remove unused timer variable
    KVM: PPC: Book3S: Improve KVM reference counting
    KVM: PPC: Book3S HV: Fix build failure without IOMMU support
    Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
    x86: kvmguest: use TSC clocksource if invariant TSC is exposed
    KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
    KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
    KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
    KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
    KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
    ...

    Linus Torvalds
     
  • Pull tracing fixes and cleanups from Steven Rostedt:
    "This contains a series of last minute clean ups, small fixes and error
    checks"

    * tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing/probe: Verify alloc_trace_*probe() result
    tracing/probe: Check event/group naming rule at parsing
    tracing/probe: Check the size of argument name and body
    tracing/probe: Check event name length correctly
    tracing/probe: Check maxactive error cases
    tracing: kdb: Fix ftdump to not sleep
    trace/probes: Remove kernel doc style from non kernel doc comment
    tracing/probes: Make reserved_field_names static

    Linus Torvalds
     
  • Pull ARM updates from Russell King:

    - An improvement from Ard Biesheuvel, who noted that the identity map
    setup was taking a long time due to flush_cache_louis().

    - Update a comment about dma_ops from Wolfram Sang.

    - Remove use of "-p" with ld, where this flag has been a no-op since
    2004.

    - Remove the printing of the virtual memory layout, which is no longer
    useful since we hide pointers.

    - Correct SCU help text.

    - Remove legacy TWD registration method.

    - Add pgprot_device() implementation for mapping PCI sysfs resource
    files.

    - Initialise PFN limits earlier for kmemleak.

    - Fix argument count to match macro definition (affects clang builds)

    - Use unified assembler language almost everywhere for clang, and other
    clang improvements (from Stefan Agner, Nathan Chancellor).

    - Support security extension for noMMU and other noMMU cleanups (from
    Vladimir Murzin).

    - Remove unnecessary SMP bringup code (which was incorrectly copy'n'
    pasted from the ARM platform implementations) and remove it from the
    arch code to discourge further copys of it appearing.

    - Add Cortex A9 erratum preventing kexec working on some SoCs.

    - AMBA bus identification updates from Mike Leach.

    - More use of raw spinlocks to avoid -RT kernel issues (from Yang Shi
    and Sebastian Andrzej Siewior).

    - MCPM hyp/svc mode mismatch fixes from Marek Szyprowski.

    * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (32 commits)
    ARM: 8849/1: NOMMU: Fix encodings for PMSAv8's PRBAR4/PRLAR4
    ARM: 8848/1: virt: Align GIC version check with arm64 counterpart
    ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used
    ARM: 8845/1: use unified assembler in c files
    ARM: 8844/1: use unified assembler in assembly files
    ARM: 8843/1: use unified assembler in headers
    ARM: 8841/1: use unified assembler in macros
    ARM: 8840/1: use a raw_spinlock_t in unwind
    ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
    ARM: 8837/1: coresight: etmv4: Update ID register table to add UCI support
    ARM: 8836/1: drivers: amba: Update component matching to use the CoreSight UCI values.
    ARM: 8838/1: drivers: amba: Updates to component identification for driver matching.
    ARM: 8833/1: Ensure that NEON code always compiles with Clang
    ARM: avoid Cortex-A9 livelock on tight dmb loops
    ARM: smp: remove arch-provided "pen_release"
    ARM: actions: remove boot_lock and pen_release
    ARM: oxnas: remove CPU hotplug implementation
    ARM: qcom: remove unnecessary boot_lock
    ARM: 8832/1: NOMMU: Limit visibility for CONFIG_FLASH_{MEM_BASE,SIZE}
    ARM: 8831/1: NOMMU: pmsa-v8: remove unneeded semicolon
    ...

    Linus Torvalds
     
  • Pull NTB updates from Jon Mason:

    - fixes for switchtec debugability and mapping table entries

    - NTB transport improvements

    - a reworking of the peer_db_addr for better abstraction

    * tag 'ntb-5.1' of git://github.com/jonmason/ntb:
    NTB: add new parameter to peer_db_addr() db_bit and db_data
    NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA
    NTB: ntb_transport: Free MWs in ntb_transport_link_cleanup()
    ntb_hw_switchtec: Added support of >=4G memory windows
    ntb_hw_switchtec: NT req id mapping table register entry number should be 512
    ntb_hw_switchtec: debug print 64bit aligned crosslink BAR Numbers

    Linus Torvalds