23 Dec, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
    including adding a missing ipv6 match description.

    2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
    Bhat.

    3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

    4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

    5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

    6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
    Chaignon.

    7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

    8) Fix established socket lookup race when socket goes from
    TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
    RCU grace period. From Eric Dumazet.

    9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

    10) Fix active backup transition after link failure in bonding, from
    Mahesh Bandewar.

    11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

    12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

    13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

    14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

    15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

    16) Reject invalid MTU values in stmmac, from Jose Abreu.

    17) Fix refcount leak in error path of u32 classifier, from Davide
    Caratti.

    18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
    Kaseorg.

    19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

    20) Disable hardware GRO when XDP is attached to qede, frm Manish
    Chopra.

    21) Since we encode state in the low pointer bits, dst metrics must be
    at least 4 byte aligned, which is not necessarily true on m68k. Add
    annotations to fix this, from Geert Uytterhoeven.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
    sfc: Include XDP packet headroom in buffer step size.
    sfc: fix channel allocation with brute force
    net: dst: Force 4-byte alignment of dst_metrics
    selftests: pmtu: fix init mtu value in description
    hv_netvsc: Fix unwanted rx_table reset
    net: phy: ensure that phy IDs are correctly typed
    mod_devicetable: fix PHY module format
    qede: Disable hardware gro when xdp prog is installed
    net: ena: fix issues in setting interrupt moderation params in ethtool
    net: ena: fix default tx interrupt moderation interval
    net/smc: unregister ib devices in reboot_event
    net: stmmac: platform: Fix MDIO init for platforms without PHY
    llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
    net: hisilicon: Fix a BUG trigered by wrong bytes_compl
    net: dsa: ksz: use common define for tag len
    s390/qeth: don't return -ENOTSUPP to userspace
    s390/qeth: fix promiscuous mode after reset
    s390/qeth: handle error due to unsupported transport mode
    cxgb4: fix refcount init for TC-MQPRIO offload
    tc-testing: initial tdc selftests for cls_u32
    ...

    Linus Torvalds
     

18 Dec, 2019

1 commit

  • Recently noticed that we're tracking programs related to local storage maps
    through their prog pointer. This is a wrong assumption since the prog pointer
    can still change throughout the verification process, for example, whenever
    bpf_patch_insn_single() is called.

    Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign()
    is not guaranteed to be the same as we pass in bpf_cgroup_storage_release()
    and the map would therefore remain in busy state forever. Fix this by using
    the prog's aux pointer which is stable throughout verification and beyond.

    Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Cc: Roman Gushchin
    Cc: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/1471c69eca3022218666f909bc927a92388fd09e.1576580332.git.daniel@iogearbox.net

    Daniel Borkmann
     

10 Dec, 2019

1 commit

  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

13 Nov, 2019

2 commits

  • cgroup ID is currently allocated using a dedicated per-hierarchy idr
    and used internally and exposed through tracepoints and bpf. This is
    confusing because there are tracepoints and other interfaces which use
    the cgroupfs ino as IDs.

    The preceding changes made kn->id exposed as ino as 64bit ino on
    supported archs or ino+gen (low 32bits as ino, high gen). There's no
    reason for cgroup to use different IDs. The kernfs IDs are unique and
    userland can easily discover them and map them back to paths using
    standard file operations.

    This patch replaces cgroup IDs with kernfs IDs.

    * cgroup_id() is added and all cgroup ID users are converted to use it.

    * kernfs_node creation is moved to earlier during cgroup init so that
    cgroup_id() is available during init.

    * While at it, s/cgroup/cgrp/ in psi helpers for consistency.

    * Fallback ID value is changed to 1 to be consistent with root cgroup
    ID.

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Cc: Namhyung Kim

    Tejun Heo
     
  • kernfs_node->id is currently a union kernfs_node_id which represents
    either a 32bit (ino, gen) pair or u64 value. I can't see much value
    in the usage of the union - all that's needed is a 64bit ID which the
    current code is already limited to. Using a union makes the code
    unnecessarily complicated and prevents using 64bit ino without adding
    practical benefits.

    This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
    ino is stored in the lower 32bits and gen upper. Accessors -
    kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
    ino and gen. This simplifies ID handling less cumbersome and will
    allow using 64bit inos on supported archs.

    This patch doesn't make any functional changes.

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Cc: Namhyung Kim
    Cc: Jens Axboe
    Cc: Alexei Starovoitov

    Tejun Heo
     

01 Jun, 2019

4 commits

  • Most bpf map types doing similar checks and bytes to pages
    conversion during memory allocation and charging.

    Let's unify these checks by moving them into bpf_map_charge_init().

    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     
  • In order to unify the existing memlock charging code with the
    memcg-based memory accounting, which will be added later, let's
    rework the current scheme.

    Currently the following design is used:
    1) .alloc() callback optionally checks if the allocation will likely
    succeed using bpf_map_precharge_memlock()
    2) .alloc() performs actual allocations
    3) .alloc() callback calculates map cost and sets map.memory.pages
    4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
    and performs actual charging; in case of failure the map is
    destroyed

    1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
    performs uncharge and releases the user
    2) .map_free() callback releases the memory

    The scheme can be simplified and made more robust:
    1) .alloc() calculates map cost and calls bpf_map_charge_init()
    2) bpf_map_charge_init() sets map.memory.user and performs actual
    charge
    3) .alloc() performs actual allocations

    1) .map_free() callback releases the memory
    2) bpf_map_charge_finish() performs uncharge and releases the user

    The new scheme also allows to reuse bpf_map_charge_init()/finish()
    functions for memcg-based accounting. Because charges are performed
    before actual allocations and uncharges after freeing the memory,
    no bogus memory pressure can be created.

    In cases when the map structure is not available (e.g. it's not
    created yet, or is already destroyed), on-stack bpf_map_memory
    structure is used. The charge can be transferred with the
    bpf_map_charge_move() function.

    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     
  • Group "user" and "pages" fields of bpf_map into the bpf_map_memory
    structure. Later it can be extended with "memcg" and other related
    information.

    The main reason for a such change (beside cosmetics) is to pass
    bpf_map_memory structure to charging functions before the actual
    allocation of bpf_map.

    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     
  • Cgroup local storage maps lack the memlock precharge check,
    which is performed before the memory allocation for
    most other bpf map types.

    Let's add it in order to unify all map types.

    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     

10 Apr, 2019

1 commit

  • This work adds two new map creation flags BPF_F_RDONLY_PROG
    and BPF_F_WRONLY_PROG in order to allow for read-only or
    write-only BPF maps from a BPF program side.

    Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
    applies to system call side, meaning the BPF program has full
    read/write access to the map as usual while bpf(2) calls with
    map fd can either only read or write into the map depending
    on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
    for the exact opposite such that verifier is going to reject
    program loads if write into a read-only map or a read into a
    write-only map is detected. For read-only map case also some
    helpers are forbidden for programs that would alter the map
    state such as map deletion, update, etc. As opposed to the two
    BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well
    as BPF_F_WRONLY_PROG really do correspond to the map lifetime.

    We've enabled this generic map extension to various non-special
    maps holding normal user data: array, hash, lru, lpm, local
    storage, queue and stack. Further generic map types could be
    followed up in future depending on use-case. Main use case
    here is to forbid writes into .rodata map values from verifier
    side.

    Signed-off-by: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

02 Feb, 2019

2 commits

  • Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
    and for map_update() helper function.
    In all these cases take a lock of existing element (which was provided
    in BTF description) before copying (in or out) the rest of map value.

    Implementation details that are part of uapi:

    Array:
    The array map takes the element lock for lookup/update.

    Hash:
    hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
    If old element exists it takes the element lock and updates the element in place.
    If element doesn't exist it allocates new one and inserts into hash table
    while holding the bucket lock.
    In rare case the hashmap has to take both the bucket lock and the element lock
    to update old value in place.

    Cgroup local storage:
    It is similar to array. update in place and lookup are done with lock taken.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     
  • Allow 'struct bpf_spin_lock' to reside inside cgroup local storage.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

18 Dec, 2018

1 commit

  • Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
    local storage maps") added bpffs pretty print for cgroup
    local storage maps. The commit worked for struct without kind_flag
    set.

    This patch refactored and made pretty print also work
    with kind_flag set for the struct.

    Acked-by: Martin KaFai Lau
    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

13 Dec, 2018

1 commit

  • Implement bpffs pretty printing for cgroup local storage maps
    (both shared and per-cpu).
    Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):

    Shared:
    $ cat /sys/fs/bpf/map_2
    # WARNING!! The output is for debug purpose only
    # WARNING!! The output format will change
    {4294968594,1}: {9999,1039896}

    Per-cpu:
    $ cat /sys/fs/bpf/map_1
    # WARNING!! The output is for debug purpose only
    # WARNING!! The output format will change
    {4294968594,1}: {
    cpu0: {0,0,0,0,0}
    cpu1: {0,0,0,0,0}
    cpu2: {1,104,0,0,0}
    cpu3: {0,0,0,0,0}
    }

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     

29 Nov, 2018

1 commit


26 Nov, 2018

1 commit

  • Building tags produces warning:

    ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1"

    Let's use the same fix as in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU
    expansions"), even though it violates the usual code style.

    Signed-off-by: Rustam Kovhaev
    Signed-off-by: Daniel Borkmann

    Rustam Kovhaev
     

17 Nov, 2018

1 commit

  • Naresh reported an issue with the non-atomic memory allocation of
    cgroup local storage buffers:

    [ 73.047526] BUG: sleeping function called from invalid context at
    /srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
    [ 73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto
    [ 73.068342] INFO: lockdep is turned off.
    [ 73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
    4.20.0-rc2-next-20181113 #1
    [ 73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
    2.0b 07/27/2017
    [ 73.088018] Call Trace:
    [ 73.090463] dump_stack+0x70/0xa5
    [ 73.093783] ___might_sleep+0x152/0x240
    [ 73.097619] __might_sleep+0x4a/0x80
    [ 73.101191] __kmalloc_node+0x1cf/0x2f0
    [ 73.105031] ? cgroup_storage_update_elem+0x46/0x90
    [ 73.109909] cgroup_storage_update_elem+0x46/0x90

    cgroup_storage_update_elem() (as well as other update map update
    callbacks) is called with disabled preemption, so GFP_ATOMIC
    allocation should be used: e.g. alloc_htab_elem() in hashtab.c.

    Reported-by: Naresh Kamboju
    Tested-by: Naresh Kamboju
    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     

09 Oct, 2018

1 commit

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2018-10-08

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
    BPF programs to perform lookups for sockets in a network namespace. This would
    allow programs to determine early on in processing whether the stack is
    expecting to receive the packet, and perform some action (eg drop,
    forward somewhere) based on this information.

    2) per-cpu cgroup local storage from Roman Gushchin.
    Per-cpu cgroup local storage is very similar to simple cgroup storage
    except all the data is per-cpu. The main goal of per-cpu variant is to
    implement super fast counters (e.g. packet counters), which don't require
    neither lookups, neither atomic operations in a fast path.
    The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

    3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

    4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

    5) rename of libbpf interfaces from Andrey Ignatov.
    libbpf is maturing as a library and should follow good practices in
    library design and implementation to play well with other libraries.
    This patch set brings consistent naming convention to global symbols.

    6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
    to let Apache2 projects use libbpf

    7) various AF_XDP fixes from Björn and Magnus
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Oct, 2018

1 commit


01 Oct, 2018

3 commits

  • This commit introduced per-cpu cgroup local storage.

    Per-cpu cgroup local storage is very similar to simple cgroup storage
    (let's call it shared), except all the data is per-cpu.

    The main goal of per-cpu variant is to implement super fast
    counters (e.g. packet counters), which don't require neither
    lookups, neither atomic operations.

    >From userspace's point of view, accessing a per-cpu cgroup storage
    is similar to other per-cpu map types (e.g. per-cpu hashmaps and
    arrays).

    Writing to a per-cpu cgroup storage is not atomic, but is performed
    by copying longs, so some minimal atomicity is here, exactly
    as with other per-cpu maps.

    Signed-off-by: Roman Gushchin
    Cc: Daniel Borkmann
    Cc: Alexei Starovoitov
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     
  • To simplify the following introduction of per-cpu cgroup storage,
    let's rework a bit a mechanism of passing a pointer to a cgroup
    storage into the bpf_get_local_storage(). Let's save a pointer
    to the corresponding bpf_cgroup_storage structure, instead of
    a pointer to the actual buffer.

    It will help us to handle per-cpu storage later, which has
    a different way of accessing to the actual data.

    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Cc: Daniel Borkmann
    Cc: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     
  • In order to introduce per-cpu cgroup storage, let's generalize
    bpf cgroup core to support multiple cgroup storage types.
    Potentially, per-node cgroup storage can be added later.

    This commit is mostly a formal change that replaces
    cgroup_storage pointer with a array of cgroup_storage pointers.
    It doesn't actually introduce a new storage type,
    it will be done later.

    Each bpf program is now able to have one cgroup storage of each type.

    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Cc: Daniel Borkmann
    Cc: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     

28 Sep, 2018

1 commit

  • cgroup_storage_update_elem() shouldn't accept any flags
    argument values except BPF_ANY and BPF_EXIST to guarantee
    the backward compatibility, had a new flag value been added.

    Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps")
    Signed-off-by: Roman Gushchin
    Reported-by: Daniel Borkmann
    Cc: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     

13 Aug, 2018

1 commit

  • Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
    the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty
    print for hash/lru_hash maps") enabled support for BTF and
    dumping via BPF fs for array and hash/lru map. However, both
    can be decoupled from each other such that regular BPF maps
    can be supported for attaching BTF key/value information,
    while not all maps necessarily need to dump via map_seq_show_elem()
    callback.

    The basic sanity check which is a prerequisite for all maps
    is that key/value size has to match in any case, and some maps
    can have extra checks via map_check_btf() callback, e.g.
    probing certain types or indicating no support in general. With
    that we can also enable retrieving BTF info for per-cpu map
    types and lpm.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Acked-by: Yonghong Song

    Daniel Borkmann
     

03 Aug, 2018

2 commits

  • This commit introduces the bpf_cgroup_storage_set() helper,
    which will be used to pass a pointer to a cgroup storage
    to the bpf helper.

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     
  • This commit introduces BPF_MAP_TYPE_CGROUP_STORAGE maps:
    a special type of maps which are implementing the cgroup storage.

    >From the userspace point of view it's almost a generic
    hash map with the (cgroup inode id, attachment type) pair
    used as a key.

    The only difference is that some operations are restricted:
    1) a user can't create new entries,
    2) a user can't remove existing entries.

    The lookup from userspace is o(log(n)).

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Roman Gushchin