18 May, 2019

1 commit


15 May, 2019

1 commit

  • Commit f1b5618e013a ("vfs: Add a sample program for the new mount API")
    added sample programs that get built during the kernel build, but then
    cause 'git status' to worry about whether the resulting binaries should
    be managed by git.

    Tell git not to worry, and to ignore the sample binaries.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 May, 2019

2 commits

  • Ignore the pidfd-metadata binary so it doesn't show up in unwanted
    scenarios.

    Reported-by: Linus Torvalds
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • Pull rdma updates from Jason Gunthorpe:
    "This has been a smaller cycle than normal. One new driver was
    accepted, which is unusual, and at least one more driver remains in
    review on the list.

    Summary:

    - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4,
    vmw_pvrdma

    - Many patches from MatthewW converting radix tree and IDR users to
    use xarray

    - Introduction of tracepoints to the MAD layer

    - Build large SGLs at the start for DMA mapping and get the driver to
    split them

    - Generally clean SGL handling code throughout the subsystem

    - Support for restricting RDMA devices to net namespaces for
    containers

    - Progress to remove object allocation boilerplate code from drivers

    - Change in how the mlx5 driver shows representor ports linked to VFs

    - mlx5 uapi feature to access the on chip SW ICM memory

    - Add a new driver for 'EFA'. This is HW that supports user space
    packet processing through QPs in Amazon's cloud"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits)
    RDMA/ipoib: Allow user space differentiate between valid dev_port
    IB/core, ipoib: Do not overreact to SM LID change event
    RDMA/device: Don't fire uevent before device is fully initialized
    lib/scatterlist: Remove leftover from sg_page_iter comment
    RDMA/efa: Add driver to Kconfig/Makefile
    RDMA/efa: Add the efa module
    RDMA/efa: Add EFA verbs implementation
    RDMA/efa: Add common command handlers
    RDMA/efa: Implement functions that submit and complete admin commands
    RDMA/efa: Add the ABI definitions
    RDMA/efa: Add the com service API definitions
    RDMA/efa: Add the efa_com.h file
    RDMA/efa: Add the efa.h header file
    RDMA/efa: Add EFA device definitions
    RDMA: Add EFA related definitions
    RDMA/umem: Remove hugetlb flag
    RDMA/bnxt_re: Use core helpers to get aligned DMA address
    RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size
    RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks
    RDMA/umem: Add API to find best driver supported page size in an MR
    ...

    Linus Torvalds
     

09 May, 2019

1 commit

  • Pull Kbuild updates from Masahiro Yamada:

    - allow users to invoke 'make' out of the source tree

    - refactor scripts/mkmakefile

    - deprecate KBUILD_SRC, which was used to track the source tree
    location for O= build.

    - fix recordmcount.pl in case objdump output is localized

    - turn unresolved symbols in external modules to errors from warnings
    by default; pass KBUILD_MODPOST_WARN=1 to get them back to warnings

    - generate modules.builtin.modinfo to collect .modinfo data from
    built-in modules

    - misc Makefile cleanups

    * tag 'kbuild-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (21 commits)
    .gitignore: add more all*.config patterns
    moduleparam: Save information about built-in modules in separate file
    Remove MODULE_ALIAS() calls that take undefined macro
    .gitignore: add leading and trailing slashes to generated directories
    scripts/tags.sh: fix direct execution of scripts/tags.sh
    scripts: override locale from environment when running recordmcount.pl
    samples: kobject: allow CONFIG_SAMPLE_KOBJECT to become y
    samples: seccomp: turn CONFIG_SAMPLE_SECCOMP into a bool option
    kbuild: move Documentation to vmlinux-alldirs
    kbuild: move samples/ to KBUILD_VMLINUX_OBJS
    modpost: make KBUILD_MODPOST_WARN also configurable for external modules
    kbuild: check arch/$(SRCARCH)/include/generated before out-of-tree build
    kbuild: remove unneeded dependency for include/config/kernel.release
    memory: squash drivers/memory/Makefile.asm-offsets
    kbuild: use $(srctree) instead of KBUILD_SRC to check out-of-tree build
    kbuild: mkmakefile: generate a simple wrapper of top Makefile
    kbuild: mkmakefile: do not check the generated Makefile marker
    kbuild: allow Kbuild to start from any directory
    kbuild: pass $(MAKECMDGOALS) to sub-make as is
    kbuild: fix warning "overriding recipe for target 'Makefile'"
    ...

    Linus Torvalds
     

08 May, 2019

3 commits

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

    2) Add fib_sync_mem to control the amount of dirty memory we allow to
    queue up between synchronize RCU calls, from David Ahern.

    3) Make flow classifier more lockless, from Vlad Buslov.

    4) Add PHY downshift support to aquantia driver, from Heiner
    Kallweit.

    5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
    contention on SLAB spinlocks in heavy RPC workloads.

    6) Partial GSO offload support in XFRM, from Boris Pismenny.

    7) Add fast link down support to ethtool, from Heiner Kallweit.

    8) Use siphash for IP ID generator, from Eric Dumazet.

    9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
    entries, from David Ahern.

    10) Move skb->xmit_more into a per-cpu variable, from Florian
    Westphal.

    11) Improve eBPF verifier speed and increase maximum program size,
    from Alexei Starovoitov.

    12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
    spinlocks. From Neil Brown.

    13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

    14) Improve link partner cap detection in generic PHY code, from
    Heiner Kallweit.

    15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
    Maguire.

    16) Remove SKB list implementation assumptions in SCTP, your's truly.

    17) Various cleanups, optimizations, and simplifications in r8169
    driver. From Heiner Kallweit.

    18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

    19) Switch PHY drivers over to use dynamic featue detection, from
    Heiner Kallweit.

    20) Support flow steering without masking in dpaa2-eth, from Ioana
    Ciocoi.

    21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
    Pirko.

    22) Increase the strict parsing of current and future netlink
    attributes, also export such policies to userspace. From Johannes
    Berg.

    23) Allow DSA tag drivers to be modular, from Andrew Lunn.

    24) Remove legacy DSA probing support, also from Andrew Lunn.

    25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
    Haabendal.

    26) Add a generic tracepoint for TX queue timeouts to ease debugging,
    from Cong Wang.

    27) More indirect call optimizations, from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
    cxgb4: Fix error path in cxgb4_init_module
    net: phy: improve pause mode reporting in phy_print_status
    dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
    net: macb: Change interrupt and napi enable order in open
    net: ll_temac: Improve error message on error IRQ
    net/sched: remove block pointer from common offload structure
    net: ethernet: support of_get_mac_address new ERR_PTR error
    net: usb: smsc: fix warning reported by kbuild test robot
    staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
    net: dsa: support of_get_mac_address new ERR_PTR error
    net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
    vrf: sit mtu should not be updated when vrf netdev is the link
    net: dsa: Fix error cleanup path in dsa_init_module
    l2tp: Fix possible NULL pointer dereference
    taprio: add null check on sched_nest to avoid potential null pointer dereference
    net: mvpp2: cls: fix less than zero check on a u32 variable
    net_sched: sch_fq: handle non connected flows
    net_sched: sch_fq: do not assume EDT packets are ordered
    net: hns3: use devm_kcalloc when allocating desc_cb
    net: hns3: some cleanup for struct hns3_enet_ring
    ...

    Linus Torvalds
     
  • Pull mount ABI updates from Al Viro:
    "The syscalls themselves, finally.

    That's not all there is to that stuff, but switching individual
    filesystems to new methods is fortunately independent from everything
    else, so e.g. NFS series can go through NFS tree, etc.

    As those conversions get done, we'll be finally able to get rid of a
    bunch of duplication in fs/super.c introduced in the beginning of the
    entire thing. I expect that to be finished in the next window..."

    * 'work.mount-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Add a sample program for the new mount API
    vfs: syscall: Add fspick() to select a superblock for reconfiguration
    vfs: syscall: Add fsmount() to create a mount for a superblock
    vfs: syscall: Add fsconfig() for configuring and managing a context
    vfs: Implement logging through fs_context
    vfs: syscall: Add fsopen() to prepare for superblock creation
    Make anon_inodes unconditional
    teach move_mount(2) to work with OPEN_TREE_CLONE
    vfs: syscall: Add move_mount(2) to move mounts around
    vfs: syscall: Add open_tree(2) to reference or clone a mount

    Linus Torvalds
     
  • Pull driver core/kobject updates from Greg KH:
    "Here is the "big" set of driver core patches for 5.2-rc1

    There are a number of ACPI patches in here as well, as Rafael said
    they should go through this tree due to the driver core changes they
    required. They have all been acked by the ACPI developers.

    There are also a number of small subsystem-specific changes in here,
    due to some changes to the kobject core code. Those too have all been
    acked by the various subsystem maintainers.

    As for content, it's pretty boring outside of the ACPI changes:
    - spdx cleanups
    - kobject documentation updates
    - default attribute groups for kobjects
    - other minor kobject/driver core fixes

    All have been in linux-next for a while with no reported issues"

    * tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
    kobject: clean up the kobject add documentation a bit more
    kobject: Fix kernel-doc comment first line
    kobject: Remove docstring reference to kset
    firmware_loader: Fix a typo ("syfs" -> "sysfs")
    kobject: fix dereference before null check on kobj
    Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
    init/config: Do not select BUILD_BIN2C for IKCONFIG
    Provide in-kernel headers to make extending kernel easier
    kobject: Improve doc clarity kobject_init_and_add()
    kobject: Improve docs for kobject_add/del
    driver core: platform: Fix the usage of platform device name(pdev->name)
    livepatch: Replace klp_ktype_patch's default_attrs with groups
    cpufreq: schedutil: Replace default_attrs field with groups
    padata: Replace padata_attr_type default_attrs field with groups
    irqdesc: Replace irq_kobj_type's default_attrs field with groups
    net-sysfs: Replace ktype default_attrs field with groups
    block: Replace all ktype default_attrs with groups
    samples/kobject: Replace foo_ktype's default_attrs field with groups
    kobject: Add support for default attribute groups to kobj_type
    driver core: Postpone DMA tear-down until after devres release for probe failure
    ...

    Linus Torvalds
     

07 May, 2019

1 commit

  • This is a sample program showing userspace how to get race-free access
    to process metadata from a pidfd. It is rather easy to do and userspace
    can actually simply reuse code that currently parses a process's status
    file in procfs.
    The program can easily be extended into a generic helper suitable for
    inclusion in a libc to make it even easier for userspace to gain metadata
    access.

    Since this came up in a discussion because this API is going to be used
    in various service managers: A lot of programs will have a whitelist
    seccomp filter that returns for all new syscalls. This
    means that programs might get confused if CLONE_PIDFD works but the
    later pidfd_send_signal() syscall doesn't. Hence, here's a ahead of
    time check that pidfd_send_signal() is supported:

    bool pidfd_send_signal_supported()
    {
    int procfd = open("/proc/self", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
    if (procfd < 0)
    return false;

    /*
    * A process is always allowed to signal itself so
    * pidfd_send_signal() should never fail this test. If it does
    * it must mean it is not available, blocked by an LSM, seccomp,
    * or other.
    */
    return pidfd_send_signal(procfd, 0, NULL, 0) == 0;
    }

    Signed-off-by: Christian Brauner
    Co-developed-by: Jann Horn
    Signed-off-by: Jann Horn
    Reviewed-by: Oleg Nesterov
    Cc: Arnd Bergmann
    Cc: "Eric W. Biederman"
    Cc: Kees Cook
    Cc: Thomas Gleixner
    Cc: David Howells
    Cc: "Michael Kerrisk (man-pages)"
    Cc: Andy Lutomirsky
    Cc: Andrew Morton
    Cc: Aleksa Sarai
    Cc: Linus Torvalds
    Cc: Al Viro

    Christian Brauner
     

03 May, 2019

3 commits


26 Apr, 2019

2 commits


05 Apr, 2019

1 commit

  • clang started to error on invalid asm clobber usage in x86 headers
    and many bpf program samples failed to build with the message:

    CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o
    In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14:
    In file included from ../include/linux/in.h:23:
    In file included from ../include/uapi/linux/in.h:24:
    In file included from ../include/linux/socket.h:8:
    In file included from ../include/linux/uio.h:14:
    In file included from ../include/crypto/hash.h:16:
    In file included from ../include/linux/crypto.h:26:
    In file included from ../include/linux/uaccess.h:5:
    In file included from ../include/linux/sched.h:15:
    In file included from ../include/linux/sem.h:5:
    In file included from ../include/uapi/linux/sem.h:5:
    In file included from ../include/linux/ipc.h:9:
    In file included from ../include/linux/refcount.h:72:
    ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list
    r->refs.counter, e, "er", i, "cx");
    ^
    ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list
    r->refs.counter, e, "cx");
    ^
    2 errors generated.

    Override volatile() to workaround the problem.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

04 Apr, 2019

1 commit


28 Mar, 2019

1 commit


22 Mar, 2019

1 commit


21 Mar, 2019

1 commit

  • Add a sample program to demonstrate fsopen/fsmount/move_mount to mount
    something.

    To make it compile on all arches, irrespective of whether or not syscall
    numbers are assigned, define the syscall number to -1 if it isn't to cause
    the kernel to return -ENOSYS.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

11 Mar, 2019

1 commit

  • Pull networking fixes from David Miller:
    "First batch of fixes in the new merge window:

    1) Double dst_cache free in act_tunnel_key, from Wenxu.

    2) Avoid NULL deref in IN_DEV_MFORWARD() by failing early in the
    ip_route_input_rcu() path, from Paolo Abeni.

    3) Fix appletalk compile regression, from Arnd Bergmann.

    4) If SLAB objects reach the TCP sendpage method we are in serious
    trouble, so put a debugging check there. From Vasily Averin.

    5) Memory leak in hsr layer, from Mao Wenan.

    6) Only test GSO type on GSO packets, from Willem de Bruijn.

    7) Fix crash in xsk_diag_put_umem(), from Eric Dumazet.

    8) Fix VNIC mailbox length in nfp, from Dirk van der Merwe.

    9) Fix race in ipv4 route exception handling, from Xin Long.

    10) Missing DMA memory barrier in hns3 driver, from Jian Shen.

    11) Use after free in __tcf_chain_put(), from Vlad Buslov.

    12) Handle inet_csk_reqsk_queue_add() failures, from Guillaume Nault.

    13) Return value correction when ip_mc_may_pull() fails, from Eric
    Dumazet.

    14) Use after free in x25_device_event(), also from Eric"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
    gro_cells: make sure device is up in gro_cells_receive()
    vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
    net/x25: fix use-after-free in x25_device_event()
    isdn: mISDNinfineon: fix potential NULL pointer dereference
    net: hns3: fix to stop multiple HNS reset due to the AER changes
    ip: fix ip_mc_may_pull() return value
    net: keep refcount warning in reqsk_free()
    net: stmmac: Avoid one more sometimes uninitialized Clang warning
    net: dsa: mv88e6xxx: Set correct interface mode for CPU/DSA ports
    rxrpc: Fix client call queueing, waiting for channel
    tcp: handle inet_csk_reqsk_queue_add() failures
    net: ethernet: sun: Zero initialize class in default case in niu_add_ethtool_tcam_entry
    8139too : Add support for U.S. Robotics USR997901A 10/100 Cardbus NIC
    fou, fou6: avoid uninit-value in gue_err() and gue6_err()
    net: sched: fix potential use-after-free in __tcf_chain_put()
    vhost: silence an unused-variable warning
    vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
    connector: fix unsafe usage of ->real_parent
    vxlan: do not need BH again in vxlan_cleanup()
    net: hns3: add dma_rmb() for rx description
    ...

    Linus Torvalds
     

10 Mar, 2019

2 commits

  • Pull media updates from Mauro Carvalho Chehab:

    - remove sensor drivers that got converted from soc_camera

    - remaining soc_camera drivers got moved to staging

    - some documentation cleanups and improvements

    - the imx staging driver now supports imx7

    - the ov9640, mt9m001 and mt9m111 got converted from soc_camera

    - the vim2m driver now does what a m2m convert driver expects to do

    - epoll() fixes on media subsystems

    - several drivers fixes, typos, cleanups and improvements

    * tag 'media/v5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (346 commits)
    media: dvb/earth-pt1: fix wrong initialization for demod blocks
    media: vim2m: Address some coding style issues
    media: vim2m: don't use BUG()
    media: vim2m: speedup passthrough copy
    media: vim2m: add an horizontal scaler
    media: vim2m: don't accept YUYV anymore as output format
    media: vim2m: add vertical linear scaler
    media: vim2m: better handle cap/out buffers with different sizes
    media: vim2m: use different framesizes for bayer formats
    media: vim2m: add support for VIDIOC_ENUM_FRAMESIZES
    media: vim2m: ensure that width is multiple of two
    media: vim2m: improve debug messages
    media: vim2m: add bayer capture formats
    media: a few more typos at staging, pci, platform, radio and usb
    media: Documentation: fix several typos
    media: staging: fix several typos
    media: include: fix several typos
    media: common: fix several typos
    media: v4l2-core: fix several typos
    media: usb: fix several typos
    ...

    Linus Torvalds
     
  • Pull documentation updates from Jonathan Corbet:
    "A fairly routine cycle for docs - lots of typo fixes, some new
    documents, and more translations. There's also some LICENSES
    adjustments from Thomas"

    * tag 'docs-5.1' of git://git.lwn.net/linux: (74 commits)
    docs: Bring some order to filesystem documentation
    Documentation/locking/lockdep: Drop last two chars of sample states
    doc: rcu: Suspicious RCU usage is a warning
    docs: driver-api: iio: fix errors in documentation
    Documentation/process/howto: Update for 4.x -> 5.x versioning
    docs: Explicitly state that the 'Fixes:' tag shouldn't split lines
    doc: security: Add kern-doc for lsm_hooks.h
    doc: sctp: Merge and clean up rst files
    Docs: Correct /proc/stat path
    scripts/spdxcheck.py: fix C++ comment style detection
    doc: fix typos in license-rules.rst
    Documentation: fix admin-guide/README.rst minimum gcc version requirement
    doc: process: complete removal of info about -git patches
    doc: translations: sync translations 'remove info about -git patches'
    perf-security: wrap paragraphs on 72 columns
    perf-security: elaborate on perf_events/Perf privileged users
    perf-security: document collected perf_events/Perf data categories
    perf-security: document perf_events/Perf resource control
    sysfs.txt: add note on available attribute macros
    docs: kernel-doc: typo "if ... if" -> "if ... is"
    ...

    Linus Torvalds
     

09 Mar, 2019

1 commit

  • Pull livepatching updates from Jiri Kosina:

    - support for something we call 'atomic replace', and allows for much
    better handling of cumulative patches (which is something very useful
    for distros), from Jason Baron with help of Petr Mladek and Joe
    Lawrence

    - improvement of handling of tasks blocking finalization, from Miroslav
    Benes

    - update of MAINTAINERS file to reflect move towards group
    maintainership

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: (22 commits)
    livepatch/selftests: use "$@" to preserve argument list
    livepatch: Module coming and going callbacks can proceed with all listed patches
    livepatch: Proper error handling in the shadow variables selftest
    livepatch: return -ENOMEM on ptr_id() allocation failure
    livepatch: Introduce klp_for_each_patch macro
    livepatch: core: Return EOPNOTSUPP instead of ENOSYS
    selftests/livepatch: add DYNAMIC_DEBUG config dependency
    livepatch: samples: non static warnings fix
    livepatch: update MAINTAINERS
    livepatch: Remove signal sysfs attribute
    livepatch: Send a fake signal periodically
    selftests/livepatch: introduce tests
    livepatch: Remove ordering (stacking) of the livepatches
    livepatch: Atomic replace and cumulative patches documentation
    livepatch: Remove Nop structures when unused
    livepatch: Add atomic replace
    livepatch: Use lists to manage patches, objects and functions
    livepatch: Simplify API by removing registration step
    livepatch: Don't block the removal of patches loaded after a forced transition
    livepatch: Consolidate klp_free functions
    ...

    Linus Torvalds
     

08 Mar, 2019

1 commit


07 Mar, 2019

1 commit


06 Mar, 2019

1 commit

  • Pull networking updates from David Miller:
    "Here we go, another merge window full of networking and #ebpf changes:

    1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP
    range without dealing with floods of ARP traffic, from Linus
    Lüssing.

    2) Throttle buffered multicast packet transmission in mt76, from
    Felix Fietkau.

    3) Support adaptive interrupt moderation in ice, from Brett Creeley.

    4) A lot of struct_size conversions, from Gustavo A. R. Silva.

    5) Add peek/push/pop commands to bpftool, as well as bash completion,
    from Stanislav Fomichev.

    6) Optimize sk_msg_clone(), from Vakul Garg.

    7) Add SO_BINDTOIFINDEX, from David Herrmann.

    8) Be more conservative with local resends due to local congestion,
    from Yuchung Cheng.

    9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata.

    10) Add health buffer support to devlink, from Eran Ben Elisha.

    11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen.

    12) Add statistics to basic packet scheduler filter, from Cong Wang.

    13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan.

    14) Lots of new IP tunneling forwarding tests, also from Nir Dotan.

    15) Add 3ad stats to bonding, from Nikolay Aleksandrov.

    16) Lots of probing improvements for bpftool, from Quentin Monnet.

    17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski.

    18) Allow #ebpf programs to access gso_segs from skb shared info, from
    Eric Dumazet.

    19) Add sock_diag support for AF_XDP sockets, from Björn Töpel.

    20) Support 22260 iwlwifi devices, from Luca Coelho.

    21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov.

    22) Add JMP32 instruction class support to #ebpf, from Jiong Wang.

    23) Add spinlock support to #ebpf, from Alexei Starovoitov.

    24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson.

    25) Add device infomation API to devlink, from Jakub Kicinski.

    26) Add new timestamping socket options which are y2038 safe, from
    Deepa Dinamani.

    27) Add RX checksum offloading for various sh_eth chips, from Sergei
    Shtylyov.

    28) Flow offload infrastructure, from Pablo Neira Ayuso.

    29) Numerous cleanups, improvements, and bug fixes to the PHY layer
    and many drivers from Heiner Kallweit.

    30) Lots of changes to try and make packet scheduler classifiers run
    lockless as much as possible, from Vlad Buslov.

    31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows.

    32) Add concurrency tests to tc-tests infrastructure, from Vlad
    Buslov.

    33) Add hwmon support to aquantia, from Heiner Kallweit.

    34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet.

    And I would be remiss if I didn't thank the various major networking
    subsystem maintainers for integrating much of this work before I even
    saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso,
    Johannes Berg, Kalle Valo, and many others. Thank you!"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits)
    net/sched: avoid unused-label warning
    net: ignore sysctl_devconf_inherit_init_net without SYSCTL
    phy: mdio-mux: fix Kconfig dependencies
    net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
    net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
    selftest/net: Remove duplicate header
    sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
    net/mlx5e: Update tx reporter status in case channels were successfully opened
    devlink: Add support for direct reporter health state update
    devlink: Update reporter state to error even if recover aborted
    sctp: call iov_iter_revert() after sending ABORT
    team: Free BPF filter when unregistering netdev
    ip6mr: Do not call __IP6_INC_STATS() from preemptible context
    isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
    net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
    cxgb4/chtls: Prefix adapter flags with CXGB4
    net-sysfs: Switch to bitmap_zalloc()
    mellanox: Switch to bitmap_zalloc()
    bpf: add test cases for non-pointer sanitiation logic
    mlxsw: i2c: Extend initialization by querying resources data
    ...

    Linus Torvalds
     

05 Mar, 2019

2 commits

  • Document change towards group maintainership of livepatching code
    samples/ warning fix from Nicholas Mc Guire

    Jiri Kosina
     
  • Pull VFIO updates from Alex Williamson:

    - Switch mdev to generic UUID API (Andy Shevchenko)

    - Fixup platform reset include paths (Masahiro Yamada)

    - Fix usage of MINORMASK (Chengguang Xu)

    - Remove noise from duplicate spapr table unsets (Alexey Kardashevskiy)

    - Restore device state after PM reset (Alex Williamson)

    - Ensure memory translation enabled for PCI ROM access (Eric Auger)

    * tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio:
    vfio_pci: Enable memory accesses before calling pci_map_rom
    vfio/pci: Restore device state on PM transition
    vfio/spapr_tce: Skip unsetting already unset table
    samples/vfio-mdev/mtty: expand minor range when registering chrdev region
    samples/vfio-mdev/mdpy: expand minor range when registering chrdev region
    samples/vfio-mdev/mbochs: expand minor range when registering chrdev region
    vfio: expand minor range when registering chrdev region
    vfio: platform: reset: fix up include directives to remove ccflags-y
    vfio-mdev: Switch to use new generic UUID API

    Linus Torvalds
     

03 Mar, 2019

3 commits

  • Script for testing HBM (Host Bandwidth Manager) framework.
    It creates a cgroup to use for testing and load a BPF program to limit
    egress bandwidht. It then uses iperf3 or netperf to create
    loads. The output is the goodput in Mbps (unless -D is used).

    It can work on a single host using loopback or among two hosts (with netperf).
    When using loopback, it is recommended to also introduce a delay of at least
    1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized.

    USAGE: $name [out] [-b=|--bpf=] [-c=|--cc=] [-D]
    [-d=|--delay=] [--debug] [-E]
    [-f=|--flows=] [-h] [-i=|--id=] [-l]
    [-N] [-p=|--port=] [-P] [-q=]
    [-R] [-s=|--server=|--time=] [-w] [cubic|dctcp]
    Where:
    out Egress (default egress)
    -b or --bpf BPF program filename to load and attach.
    Default is nrm_out_kern.o for egress,
    -c or -cc TCP congestion control (cubic or dctcp)
    -d or --delay Add a delay in ms using netem
    -D In addition to the goodput in Mbps, it also outputs
    other detailed information. This information is
    test dependent (i.e. iperf3 or netperf).
    --debug Print BPF trace buffer
    -E Enable ECN (not required for dctcp)
    -f or --flows Number of concurrent flows (default=1)
    -i or --id cgroup id (an integer, default is 1)
    -l Do not limit flows using loopback
    -N Use netperf instead of iperf3
    -h Help
    -p or --port iperf3 port (default is 5201)
    -P Use an iperf3 instance for each flow
    -q Use the specified qdisc.
    -r or --rate Rate in Mbps (default 1s 1Gbps)
    -R Use TCP_RR for netperf. 1st flow has req
    size of 10KB, rest of 1MB. Reply in all
    cases is 1 byte.
    More detailed output for each flow can be found
    in the files netperf.., where is the
    cgroup id as specified with the -i flag, and
    is the flow id starting at 1 and increasing by 1 for
    flow (as specified by -f).
    -s or --server hostname of netperf server. Used to create netperf
    test traffic between to hosts (default is within host)
    netserver must be running on the host.
    --stats Get HBM stats (marked, dropped, etc.)
    -t or --time duration of iperf3 in seconds (default=5)
    -w Work conserving flag. cgroup can increase its
    bandwidth beyond the rate limit specified
    while there is available bandwidth. Current
    implementation assumes there is only one NIC
    (eth0), but can be extended to support multiple
    NICs. This is just a proof of concept.
    cubic or dctcp specify TCP CC to use

    Examples:
    ./do_hbm_test.sh -l -d=1 -D --stats
    Runs a 5 second test, using a single iperf3 flow and with the default
    rate limit of 1Gbps and a delay of 1ms (using netem) using the default
    TCP congestion control on the loopback device (hence we use "-l" to
    enforce bandwidth limit on loopback device). Since no direction is
    specified, it defaults to egress. Since no TCP CC algorithm is
    specified it uses the system default (Cubic for this test).
    With no -D flag, only the value of the AGGREGATE OUTPUT would show.
    id refers to the cgroup id and is useful when running multi cgroup
    tests (supported by a future patch).
    This patchset does not support calling TCP's congesion window
    reduction, even when packets are dropped by the BPF program, resulting
    in a large number of packets dropped. It is recommended that the current
    HBM implemenation only be used with ECN enabled flows. A future patch
    will add support for reducing TCP's cwnd and will increase the
    performance of non-ECN enabled flows.
    Output:
    Details for HBM in cgroup 1
    id:1
    rate_mbps:493
    duration:4.8 secs
    packets:11355
    bytes_MB:590
    pkts_dropped:4497
    bytes_dropped_MB:292
    pkts_marked_percent: 39.60
    bytes_marked_percent: 49.49
    pkts_dropped_percent: 39.60
    bytes_dropped_percent: 49.49
    PING AVG DELAY:2.075
    AGGREGATE_GOODPUT:505

    ./do_nrm_test.sh -l -d=1 -D --stats dctcp
    Same as above but using dctcp. Note that fewer bytes are dropped
    (0.01% vs. 49%).
    Output:
    Details for HBM in cgroup 1
    id:1
    rate_mbps:945
    duration:4.9 secs
    packets:16859
    bytes_MB:578
    pkts_dropped:1
    bytes_dropped_MB:0
    pkts_marked_percent: 28.74
    bytes_marked_percent: 45.15
    pkts_dropped_percent: 0.01
    bytes_dropped_percent: 0.01
    PING AVG DELAY:2.083
    AGGREGATE_GOODPUT:965

    ./do_nrm_test.sh -d=1 -D --stats
    As first example, but without limiting loopback device (i.e. no
    "-l" flag). Since there is no bandwidth limiting, no details for
    HBM are printed out.
    Output:
    Details for HBM in cgroup 1
    PING AVG DELAY:2.019
    AGGREGATE_GOODPUT:42655

    ./do_hbm.sh -l -d=1 -D --stats -f=2
    Uses iper3 and does 2 flows
    ./do_hbm.sh -l -d=1 -D --stats -f=4 -P
    Uses iperf3 and does 4 flows, each flow as a separate process.
    ./do_hbm.sh -l -d=1 -D --stats -f=4 -N
    Uses netperf, 4 flows
    ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=
    Uses netperf between two hosts. The remote host name is specified
    with -s= and you need to start the program netserver manually on
    the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp.
    ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \
    -s=
    As previous, but allows use of extra bandwidth. For this test the
    rate is 8Gbps vs. 1Gbps of the previous test.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: Alexei Starovoitov

    brakmo
     
  • The program nrm creates a cgroup and attaches a BPF program to the
    cgroup for testing HBM (Host Bandwidth Manager) for egress traffic.
    One still needs to create network traffic. This can be done through
    netesto, netperf or iperf3.
    A follow-up patch contains a script to create traffic.

    USAGE: hbm [-d] [-l] [-n ] [-r ] [-s] [-t ]
    [-w] [-h] [prog]
    Where:
    -d Print BPF trace debug buffer
    -l Also limit flows doing loopback
    -n To create cgroup "/hbm#" and attach prog. Default is /nrm1
    This is convenient when testing HBM in more than 1 cgroup
    -r Rate limit in Mbps
    -s Get HBM stats (marked, dropped, etc.)
    -t Exit after specified seconds (deault is 0)
    -w Work conserving flag. cgroup can increase its bandwidth
    beyond the rate limit specified while there is available
    bandwidth. Current implementation assumes there is only
    NIC (eth0), but can be extended to support multiple NICs.
    Currrently only supported for egress. Note, this is just
    a proof of concept.
    -h Print this info
    prog BPF program file name. Name defaults to hbm_out_kern.o

    More information about HBM can be found in the paper "BPF Host Resource
    Management" presented at the 2018 Linux Plumbers Conference, Networking Track
    (http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf)

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: Alexei Starovoitov

    brakmo
     
  • A cgroup skb BPF program to limit cgroup output bandwidth.
    It uses a modified virtual token bucket queue to limit average
    egress bandwidth. The implementation uses credits instead of tokens.
    Negative credits imply that queueing would have happened (this is
    a virtual queue, so no queueing is done by it. However, queueing may
    occur at the actual qdisc (which is not used for rate limiting).

    This implementation uses 3 thresholds, one to start marking packets and
    the other two to drop packets:
    CREDIT
    - +
    | | | 0
    | Large pkt |
    | drop thresh |
    Small pkt drop Mark threshold
    thresh

    The effect of marking depends on the type of packet:
    a) If the packet is ECN enabled, then the packet is ECN ce marked.
    The current mark threshold is tuned for DCTCP.
    c) Else, it is dropped if it is a large packet.

    If the credit is below the drop threshold, the packet is dropped.
    Note that dropping a packet through the BPF program does not trigger CWR
    (Congestion Window Reduction) in TCP packets. A future patch will add
    support for triggering CWR.

    This BPF program actually uses 2 drop thresholds, one threshold
    for larger packets (>= 120 bytes) and another for smaller packets. This
    protects smaller packets such as SYNs, ACKs, etc.

    The default bandwidth limit is set at 1Gbps but this can be changed by
    a user program through a shared BPF map. In addition, by default this BPF
    program does not limit connections using loopback. This behavior can be
    overwritten by the user program. There is also an option to calculate
    some statistics, such as percent of packets marked or dropped, which
    the user program can access.

    A latter patch provides such a program (hbm.c)

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: Alexei Starovoitov

    brakmo
     

02 Mar, 2019

1 commit

  • Compiling xdpsock_user.c with 4.8.5, I hit the following
    compilation warning:
    HOSTCC samples/bpf/xdpsock_user.o
    /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c: In function ‘main’:
    /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:449:6: warning: ‘idx_cq’ may be used unini
    tialized in this function [-Wmaybe-uninitialized]
    u32 idx_cq, idx_fq;
    ^
    /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:606:7: warning: ‘idx_rx’ may be used unini
    tialized in this function [-Wmaybe-uninitialized]
    u32 idx_rx, idx_tx = 0;
    ^
    /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:506:6: warning: ‘idx_rx’ may be used unini
    tialized in this function [-Wmaybe-uninitialized]
    u32 idx_rx, idx_fq = 0;

    As an example, the code pattern looks like:
    u32 idx_cq;
    ...
    ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
    if (ret) {
    ...
    }
    ... idx_fq ...
    The compiler warns since it does not know whether &idx_fq is assigned
    or not inside the library function xsk_ring_prod__reserve().

    Let us assign an initial value 0 to such auto variables to silence
    compiler warning.

    Fixes: 248c7f9c0e21 ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access")
    Signed-off-by: Yonghong Song
    Acked-by: Jonathan Lemon
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

01 Mar, 2019

3 commits

  • Some samples don't really need the magic of bpf_load,
    switch them to libbpf.

    v2: - specify program types.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • bpftool can do all the things load_sock_ops used to do, and more.
    Point users to bpftool instead of maintaining this sample utility.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • ping localhost may default of IPv6 on modern systems, but
    samples are trying to only parse IPv4. Force IPv4.

    samples/bpf/tracex1_user.c doesn't interpret the packet so
    we don't care which IP version will be used there.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     

28 Feb, 2019

1 commit

  • Currently, running sample "task_fd_query" and "tracex3" occurs the
    following error. On kernel v5.0-rc* this sample will be unavailable
    due to the removal of function 'blk_start_request' at commit "a1ce35f".
    (function removed, as "Single Queue IO scheduler" no longer exists)

    $ sudo ./task_fd_query
    failed to create kprobe 'blk_start_request' error 'No such file or
    directory'

    This commit will change the function 'blk_start_request' to
    'blk_mq_start_request' to fix the broken sample.

    Signed-off-by: Daniel T. Lee
    Signed-off-by: Daniel Borkmann

    Daniel T. Lee
     

26 Feb, 2019

1 commit

  • This commit converts the xdpsock sample application to use the AF_XDP
    functions present in libbpf. This cuts down the size of it by nearly
    300 lines of code.

    The default ring sizes plus the batch size has been increased and the
    size of the umem area has decreased. This so that the sample application
    will provide higher throughput. Note also that the shared umem code
    has been removed from the sample as this is not supported by libbpf
    at this point in time.

    Tested-by: Björn Töpel
    Signed-off-by: Magnus Karlsson
    Signed-off-by: Daniel Borkmann

    Magnus Karlsson
     

22 Feb, 2019

1 commit

  • The xdp_redirect and xdp_redirect_map sample programs both load a dummy
    program onto the egress interfaces. However, the unload code checks these
    programs against the wrong id number, and thus refuses to unload them. Fix
    the comparison to avoid this.

    Fixes: 3b7a8ec2dec3 ("samples/bpf: Check the prog id before exiting")
    Signed-off-by: Toke Høiland-Jørgensen
    Acked-by: Maciej Fijalkowski
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Toke Høiland-Jørgensen
     

18 Feb, 2019

1 commit

  • Linux 5.0-rc7

    * tag 'v5.0-rc7': (1667 commits)
    Linux 5.0-rc7
    Input: elan_i2c - add ACPI ID for touchpad in Lenovo V330-15ISK
    Input: st-keyscan - fix potential zalloc NULL dereference
    Input: apanel - switch to using brightness_set_blocking()
    powerpc/64s: Fix possible corruption on big endian due to pgd/pud_present()
    efi/arm: Revert "Defer persistent reservations until after paging_init()"
    arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table
    sunrpc: fix 4 more call sites that were using stack memory with a scatterlist
    include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
    Compiler Attributes: add support for __copy (gcc >= 9)
    lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
    auxdisplay: ht16k33: fix potential user-after-free on module unload
    x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls
    i2c: bcm2835: Clear current buffer pointers and counts after a transfer
    i2c: cadence: Fix the hold bit setting
    drm: Use array_size() when creating lease
    dm thin: fix bug where bio that overwrites thin block ignores FUA
    Revert "exec: load_script: don't blindly truncate shebang string"
    Revert "gfs2: read journal in large chunks to locate the head"
    net: ethernet: freescale: set FEC ethtool regs version
    ...

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

13 Feb, 2019

1 commit