15 Aug, 2019

2 commits

  • Realtek provided information on how the new NIC-integrated PHY's
    expose whether they support 2.5G/5G/10G. This allows to automatically
    differentiate 1Gbps and 2.5Gbps PHY's, and therefore allows to
    remove the fake PHY ID mechanism for RTL8125.
    So far RTL8125 supports 2.5Gbps only, but register layout for faster
    modes has been defined already, so let's use this information to be
    future-proof.

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     
  • Holger reported sporadic transmit timeouts and it turned out that one
    path misses ringing the doorbell. Fix was suggested by Eric.

    Fixes: ef14358546b1 ("r8169: make use of xmit_more")
    Suggested-by: Eric Dumazet
    Tested-by: Holger Hoffstätte
    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     

14 Aug, 2019

18 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next:

    1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.

    2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.

    3) More strict validation of IPVS sysctl values, from Junwei Hu.

    4) Remove unnecessary spaces after on the right hand side of assignments,
    from yangxingwu.

    5) Add offload support for bitwise operation.

    6) Extend the nft_offload_reg structure to store immediate date.

    7) Collapse several ip_set header files into ip_set.h, from
    Jeremy Sowden.

    8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
    from Jeremy Sowden.

    9) Fix several sparse warnings due to missing prototypes, from
    Valdis Kletnieks.

    10) Use static lock initialiser to ensure connlabel spinlock is
    initialized on boot time to fix sched/act_ct.c, patch
    from Florian Westphal.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • Hayes says:

    ====================
    v2:
    For patch #2, replace list_for_each_safe with list_for_each_entry_safe.
    Remove unlikely in WARN_ON. Adjust the coding style.

    For patch #4, replace list_for_each_safe with list_for_each_entry_safe.
    Remove "else" after "continue".

    For patch #5. replace sysfs with ethtool to modify rx_copybreak and
    rx_pending.

    v1:
    The different chips use different rx buffer size.

    Use skb_add_rx_frag() to reduce memory copy for RX.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • Let the rx_copybreak and rx_pending could be modified by
    ethtool.

    Signed-off-by: Hayes Wang
    Signed-off-by: Jakub Kicinski

    Hayes Wang
     
  • Use skb_add_rx_frag() to reduce the memory copy for rx data.

    Use a new list of rx_used to store the rx buffer which couldn't be
    reused yet.

    Besides, the total number of rx buffer may be increased or decreased
    dynamically. And it is limited by RTL8152_MAX_RX_AGG.

    Signed-off-by: Hayes Wang
    Signed-off-by: Jakub Kicinski

    Hayes Wang
     
  • Replace kmalloc_node() with alloc_pages() for rx buffer.

    Signed-off-by: Hayes Wang
    Signed-off-by: Jakub Kicinski

    Hayes Wang
     
  • The original method uses an array to store the rx information. The
    new one uses a list to link each rx structure. Then, it is possible
    to increase/decrease the number of rx structure dynamically.

    Signed-off-by: Hayes Wang
    Signed-off-by: Jakub Kicinski

    Hayes Wang
     
  • The different chips may accept different rx buffer sizes. The RTL8152
    supports 16K bytes, and RTL8153 support 32K bytes.

    Signed-off-by: Hayes Wang
    Signed-off-by: Jakub Kicinski

    Hayes Wang
     
  • Heiner says:

    ====================
    So far phy_speed_down/up can be used up to 1Gbps only. Remove this
    restriction and add needed helpers to phy-core.c

    v2:
    - remove unused parameter in patch 1
    - rename __phy_speed_down to phy_speed_down_core in patch 2
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • So far phy_speed_down/up can be used up to 1Gbps only. Remove this
    restriction by using new helper __phy_speed_down. New member adv_old
    in struct phy_device is used by phy_speed_up to restore the advertised
    modes before calling phy_speed_down. Don't simply advertise what is
    supported because a user may have intentionally removed modes from
    advertisement.

    Signed-off-by: Heiner Kallweit
    Reviewed-by: Andrew Lunn
    Signed-off-by: Jakub Kicinski

    Heiner Kallweit
     
  • phy_speed_down_core provides most of the functionality for
    phy_speed_down. It makes use of new helper phy_resolve_min_speed that is
    based on the sorting of the settings[] array. In certain cases it may be
    helpful to be able to exclude legacy half duplex modes, therefore
    prepare phy_resolve_min_speed() for it.

    v2:
    - rename __phy_speed_down to phy_speed_down_core

    Signed-off-by: Heiner Kallweit
    Reviewed-by: Andrew Lunn
    Signed-off-by: Jakub Kicinski

    Heiner Kallweit
     
  • We will need the functionality of __set_linkmode_max_speed also for
    linkmode bitmaps other than phydev->supported. Therefore split it.

    v2:
    - remove unused parameter from __set_linkmode_max_speed

    Signed-off-by: Heiner Kallweit
    Reviewed-by: Andrew Lunn
    Signed-off-by: Jakub Kicinski

    Heiner Kallweit
     
  • It is enough for caller of devlink_compat_switch_id_get() to hold the net
    device to guarantee that devlink port is not destroyed concurrently. Remove
    rtnl lock assertion and modify comment to warn user that they must hold
    either rtnl lock or reference to net device. This is necessary to
    accommodate future implementation of rtnl-unlocked TC offloads driver
    callbacks.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: Jakub Kicinski

    Vlad Buslov
     
  • Daniel Borkmann says:

    ====================
    The following pull-request contains BPF updates for your *net-next* tree.

    There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
    as well):

    for (i = 1; i info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
    <<<<<<< HEAD
    /*
    * using size = 1 is the safest choice, 4 will be too
    * big and cause kernel BTF validation failure if
    * original variable took less than 4 bytes
    */
    t->size = 1;
    *(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
    } else if (!has_datasec && kind == BTF_KIND_DATASEC) {
    =======
    t->size = sizeof(int);
    *(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
    } else if (!has_datasec && btf_is_datasec(t)) {
    >>>>>>> 72ef80b5ee131e96172f19e74b4f98fa3404efe8
    /* replace DATASEC with STRUCT */

    Conflict is between the two commits 1d4126c4e119 ("libbpf: sanitize VAR to
    conservative 1-byte INT") and b03bc6853c0e ("libbpf: convert libbpf code to
    use new btf helpers"), so we need to pick the sanitation fixup as well as
    use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
    the following:

    [...]
    if (!has_datasec && btf_is_var(t)) {
    /* replace VAR with INT */
    t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
    /*
    * using size = 1 is the safest choice, 4 will be too
    * big and cause kernel BTF validation failure if
    * original variable took less than 4 bytes
    */
    t->size = 1;
    *(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
    } else if (!has_datasec && btf_is_datasec(t)) {
    /* replace DATASEC with STRUCT */
    [...]

    The main changes are:

    1) Addition of core parts of compile once - run everywhere (co-re) effort,
    that is, relocation of fields offsets in libbpf as well as exposure of
    kernel's own BTF via sysfs and loading through libbpf, from Andrii.

    More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
    and http://vger.kernel.org/lpc-bpf2018.html#session-2

    2) Enable passing input flags to the BPF flow dissector to customize parsing
    and allowing it to stop early similar to the C based one, from Stanislav.

    3) Add a BPF helper function that allows generating SYN cookies from XDP and
    tc BPF, from Petar.

    4) Add devmap hash-based map type for more flexibility in device lookup for
    redirects, from Toke.

    5) Improvements to XDP forwarding sample code now utilizing recently enabled
    devmap lookups, from Jesper.

    6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
    and Takshak.

    7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.

    8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.

    9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.

    10) Add perf event output helper also for other skb-based program types, from Allan.

    11) Fix a co-re related compilation error in selftests, from Yonghong.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • Fix sparse warning:

    drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:3190:5:
    warning: symbol 'hclge_func_reset_sync_vf' was not declared. Should it be static?

    Reported-by: Hulk Robot
    Signed-off-by: YueHaibing
    Signed-off-by: Jakub Kicinski

    YueHaibing
     
  • Currently the notifications for deleted snapshots are sent only in case
    user deletes a snapshot manually. Send the notifications in case region
    is destroyed too.

    Signed-off-by: Jiri Pirko
    Signed-off-by: Jakub Kicinski

    Jiri Pirko
     
  • Andrii Nakryiko says:

    ====================
    Now that kernel's BTF is exposed through sysfs at well-known location, attempt
    to load it first as a target BTF for the purpose of BPF CO-RE relocations.

    Patch #1 is a follow-up patch to rename /sys/kernel/btf/kernel into
    /sys/kernel/btf/vmlinux.

    Patch #2 adds ability to load raw BTF contents from sysfs and expands the list
    of locations libbpf attempts to load vmlinux BTF from.
    ====================

    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Add support for loading kernel BTF from sysfs (/sys/kernel/btf/vmlinux)
    as a target BTF. Also extend the list of on disk search paths for
    vmlinux ELF image with entries that perf is searching for.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann

    Andrii Nakryiko
     
  • Expose kernel's BTF under the name vmlinux to be more uniform with using
    kernel module names as file names in the future.

    Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
    Suggested-by: Daniel Borkmann
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann

    Andrii Nakryiko
     

13 Aug, 2019

17 commits

  • Since the "last_dissection" map holds only the flow keys for the most
    recent packet, there is a small race in the skb-less flow dissector
    tests if a new packet comes between transmitting the test packet, and
    reading its keys from the map. If this happens, the test packet keys
    will be overwritten and the test will fail.

    Changing the "last_dissection" map to a hash map, keyed on the
    source/dest port pair resolves this issue. Additionally, let's clear the
    last test results from the map between tests to prevent previous test
    cases from interfering with the following test cases.

    Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode")
    Signed-off-by: Petar Penkov
    Signed-off-by: Daniel Borkmann

    Petar Penkov
     
  • bpftool requires libelf, and zlib for decompressing /proc/config.gz.
    zlib is a transitive dependency via libelf, and became mandatory since
    elfutils 0.165 (Jan 2016). The feature check of libelf is already done
    in the elfdep target of tools/lib/bpf/Makefile, pulled in by bpftool via
    a dependency on libbpf.a. Add a similar feature check for zlib.

    Suggested-by: Jakub Kicinski
    Signed-off-by: Peter Wu
    Acked-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    Peter Wu
     
  • Make .BTF section allocated and expose its contents through sysfs.

    /sys/kernel/btf directory is created to contain all the BTFs present
    inside kernel. Currently there is only kernel's main BTF, represented as
    /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
    each module will expose its BTF as /sys/kernel/btf/ file.

    Current approach relies on a few pieces coming together:
    1. pahole is used to take almost final vmlinux image (modulo .BTF and
    kallsyms) and generate .BTF section by converting DWARF info into
    BTF. This section is not allocated and not mapped to any segment,
    though, so is not yet accessible from inside kernel at runtime.
    2. objcopy dumps .BTF contents into binary file and subsequently
    convert binary file into linkable object file with automatically
    generated symbols _binary__btf_kernel_bin_start and
    _binary__btf_kernel_bin_end, pointing to start and end, respectively,
    of BTF raw data.
    3. final vmlinux image is generated by linking this object file (and
    kallsyms, if necessary). sysfs_btf.c then creates
    /sys/kernel/btf/kernel file and exposes embedded BTF contents through
    it. This allows, e.g., libbpf and bpftool access BTF info at
    well-known location, without resorting to searching for vmlinux image
    on disk (location of which is not standardized and vmlinux image
    might not be even available in some scenarios, e.g., inside qemu
    during testing).

    Alternative approach using .incbin assembler directive to embed BTF
    contents directly was attempted but didn't work, because sysfs_proc.o is
    not re-compiled during link-vmlinux.sh stage. This is required, though,
    to update embedded BTF data (initially empty data is embedded, then
    pahole generates BTF info and we need to regenerate sysfs_btf.o with
    updated contents, but it's too late at that point).

    If BTF couldn't be generated due to missing or too old pahole,
    sysfs_btf.c handles that gracefully by detecting that
    _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
    /sys/kernel/btf at all.

    v2->v3:
    - added Documentation/ABI/testing/sysfs-kernel-btf (Greg K-H);
    - created proper kobject (btf_kobj) for btf directory (Greg K-H);
    - undo v2 change of reusing vmlinux, as it causes extra kallsyms pass
    due to initially missing __binary__btf_kernel_bin_{start/end} symbols;

    v1->v2:
    - allow kallsyms stage to re-use vmlinux generated by gen_btf();

    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann

    Andrii Nakryiko
     
  • seen during boot:
    BUG: spinlock bad magic on CPU#2, swapper/0/1
    lock: nf_connlabels_lock+0x0/0x60, .magic: 00000000, .owner: /-1, .owner_cpu: 0
    Call Trace:
    do_raw_spin_lock+0x14e/0x1b0
    nf_connlabels_get+0x15/0x40
    ct_init_net+0xc4/0x270
    ops_init+0x56/0x1c0
    register_pernet_operations+0x1c8/0x350
    register_pernet_subsys+0x1f/0x40
    tcf_register_action+0x7c/0x1a0
    do_one_initcall+0x13d/0x2d9

    Problem is that ct action init function can run before
    connlabels_init(). Lock has not been initialised yet.

    Fix it by using a static initialiser.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Sparse warns about two tables not being declared.

    CHECK net/netfilter/nf_nat_proto.c
    net/netfilter/nf_nat_proto.c:725:26: warning: symbol 'nf_nat_ipv4_ops' was not declared. Should it be static?
    net/netfilter/nf_nat_proto.c:964:26: warning: symbol 'nf_nat_ipv6_ops' was not declared. Should it be static?

    And in fact they can indeed be static.

    Signed-off-by: Valdis Kletnieks
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Valdis Klētnieks
     
  • Sparse rightly complains about undeclared symbols.

    CHECK net/netfilter/nft_set_hash.c
    net/netfilter/nft_set_hash.c:647:21: warning: symbol 'nft_set_rhash_type' was not declared. Should it be static?
    net/netfilter/nft_set_hash.c:670:21: warning: symbol 'nft_set_hash_type' was not declared. Should it be static?
    net/netfilter/nft_set_hash.c:690:21: warning: symbol 'nft_set_hash_fast_type' was not declared. Should it be static?
    CHECK net/netfilter/nft_set_bitmap.c
    net/netfilter/nft_set_bitmap.c:296:21: warning: symbol 'nft_set_bitmap_type' was not declared. Should it be static?
    CHECK net/netfilter/nft_set_rbtree.c
    net/netfilter/nft_set_rbtree.c:470:21: warning: symbol 'nft_set_rbtree_type' was not declared. Should it be static?

    Include nf_tables_core.h rather than nf_tables.h to pick up the additional definitions.

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Pablo Neira Ayuso

    Valdis Klētnieks
     
  • All the blacklisted NF headers can now be compiled stand-alone, so
    removed them from the blacklist.

    Cc: Masahiro Yamada
    Signed-off-by: Jeremy Sowden
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • A number of non-UAPI Netfilter header-files contained superfluous
    "#ifdef __KERNEL__" guards. Removed them.

    Signed-off-by: Jeremy Sowden
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • linux/netfilter.h defines a number of struct and inline function
    definitions which are only available is CONFIG_NETFILTER is enabled.
    These structs and functions are used in declarations and definitions in
    other header-files. Added preprocessor checks to make sure these
    headers will compile if CONFIG_NETFILTER is disabled.

    Signed-off-by: Jeremy Sowden
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • struct nf_conn contains a "struct nf_conntrack ct_general" member and
    struct net contains a "struct netns_ct ct" member which are both only
    defined in CONFIG_NF_CONNTRACK is enabled. These members are used in a
    number of inline functions defined in other header-files. Added
    preprocessor checks to make sure the headers will compile if
    CONFIG_NF_CONNTRACK is disabled.

    Signed-off-by: Jeremy Sowden
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • nf_tables.h defines an API comprising several inline functions and
    macros that depend on the nft member of struct net. However, this is
    only defined is CONFIG_NF_TABLES is enabled. Added preprocessor checks
    to ensure that nf_tables.h will compile if CONFIG_NF_TABLES is disabled.

    Signed-off-by: Jeremy Sowden
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • br_netfilter.h defines inline functions that use an enum constant and
    struct member that are only defined if CONFIG_BRIDGE_NETFILTER is
    enabled. Added preprocessor checks to ensure br_netfilter.h will
    compile if CONFIG_BRIDGE_NETFILTER is disabled.

    Signed-off-by: Jeremy Sowden
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • A number of netfilter header-files used declarations and definitions
    from other headers without including them. Added include directives to
    make those declarations and definitions available.

    Signed-off-by: Jeremy Sowden
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • linux/netfilter/ipset/ip_set.h included four other header files:

    include/linux/netfilter/ipset/ip_set_comment.h
    include/linux/netfilter/ipset/ip_set_counter.h
    include/linux/netfilter/ipset/ip_set_skbinfo.h
    include/linux/netfilter/ipset/ip_set_timeout.h

    Of these the first three were not included anywhere else. The last,
    ip_set_timeout.h, was included in a couple of other places, but defined
    inline functions which call other inline functions defined in ip_set.h,
    so ip_set.h had to be included before it.

    Inlined all four into ip_set.h, and updated the other files that
    included ip_set_timeout.h.

    Signed-off-by: Jeremy Sowden
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Jeremy Sowden
     
  • Store immediate data into offload context register. This allows follow
    up instructions to take it from the corresponding source register.

    This patch is required to support for payload mangling, although other
    instructions that take data from source register will benefit from this
    too.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Extract mask from bitwise operation and store it into the corresponding
    context register so the cmp instruction can set the mask accordingly.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch removes extra spaces.

    Signed-off-by: yangxingwu
    Signed-off-by: Pablo Neira Ayuso

    yangxingwu
     

12 Aug, 2019

3 commits

  • /proc/config has never existed as far as I can see, but /proc/config.gz
    is present on Arch Linux. Add support for decompressing config.gz using
    zlib which is a mandatory dependency of libelf anyway. Replace existing
    stdio functions with gzFile operations since the latter transparently
    handles uncompressed and gzip-compressed files.

    Cc: Quentin Monnet
    Signed-off-by: Peter Wu
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Peter Wu
     
  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    Cc: Richard Fontana
    Cc: Steve Winslow
    Cc: netdev@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Greg Kroah-Hartman
     
  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    Cc: Wei Liu
    Cc: Paul Durrant
    Cc: xen-devel@lists.xenproject.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Greg Kroah-Hartman