31 Mar, 2020

12 commits

  • David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next:

    1) Add support to specify a stateful expression in set definitions,
    this allows users to specify e.g. counters per set elements.

    2) Flowtable software counter support.

    3) Flowtable hardware offload counter support, from wenxu.

    3) Parallelize flowtable hardware offload requests, from Paul Blakey.
    This includes a patch to add one work entry per offload command.

    4) Several patches to rework nf_queue refcount handling, from Florian
    Westphal.

    4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling
    information is missing and set up indirect flow block as TC_SETUP_FT,
    patch from wenxu.

    5) Stricter netlink attribute sanity check on filters, from Romain Bellan
    and Florent Fourcot.

    5) Annotations to make sparse happy, from Jules Irenge.

    6) Improve icmp errors in debugging information, from Haishuang Yan.

    7) Fix warning in IPVS icmp error debugging, from Haishuang Yan.

    8) Fix endianess issue in tcp extension header, from Sergey Marinkevich.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The previous patch allowed device drivers to publish their default
    binding between packet trap policers and packet trap groups. However,
    some users might not be content with this binding and would like to
    change it.

    In case user space passed a packet trap policer identifier when setting
    a packet trap group, invoke the appropriate device driver callback and
    pass the new policer identifier.

    v2:
    * Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in
    devlink_trap_group_set() and bail if not present
    * Add extack error message in case trap group was partially modified

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Packet trap groups are used to aggregate logically related packet traps.
    Currently, these groups allow user space to batch operations such as
    setting the trap action of all member traps.

    In order to prevent the CPU from being overwhelmed by too many trapped
    packets, it is desirable to bind a packet trap policer to these groups.
    For example, to limit all the packets that encountered an exception
    during routing to 10Kpps.

    Allow device drivers to bind default packet trap policers to packet trap
    groups when the latter are registered with devlink.

    The next patch will enable user space to change this default binding.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Devices capable of offloading the kernel's datapath and perform
    functions such as bridging and routing must also be able to send (trap)
    specific packets to the kernel (i.e., the CPU) for processing.

    For example, a device acting as a multicast-aware bridge must be able to
    trap IGMP membership reports to the kernel for processing by the bridge
    module.

    In most cases, the underlying device is capable of handling packet rates
    that are several orders of magnitude higher compared to those that can
    be handled by the CPU.

    Therefore, in order to prevent the underlying device from overwhelming
    the CPU, devices usually include packet trap policers that are able to
    police the trapped packets to rates that can be handled by the CPU.

    This patch allows capable device drivers to register their supported
    packet trap policers with devlink. User space can then tune the
    parameters of these policer (currently, rate and burst size) and read
    from the device the number of packets that were dropped by the policer,
    if supported.

    Subsequent patches in the series will allow device drivers to create
    default binding between these policers and packet trap groups and allow
    user space to change the binding.

    v2:
    * Add 'strict_start_type' in devlink policy
    * Have device drivers provide max/min rate/burst size for each policer.
    Use them to check validity of user provided parameters

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Avoid taking a reference on listen sockets by checking the socket type
    in the sk_assign and in the corresponding skb_steal_sock() code in the
    the transport layer, and by ensuring that the prefetch free (sock_pfree)
    function uses the same logic to check whether the socket is refcounted.

    Suggested-by: Martin KaFai Lau
    Signed-off-by: Joe Stringer
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200329225342.16317-4-joe@wand.net.nz

    Joe Stringer
     
  • Add support for TPROXY via a new bpf helper, bpf_sk_assign().

    This helper requires the BPF program to discover the socket via a call
    to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
    helper takes its own reference to the socket in addition to any existing
    reference that may or may not currently be obtained for the duration of
    BPF processing. For the destination socket to receive the traffic, the
    traffic must be routed towards that socket via local route. The
    simplest example route is below, but in practice you may want to route
    traffic more narrowly (eg by CIDR):

    $ ip route add local default dev lo

    This patch avoids trying to introduce an extra bit into the skb->sk, as
    that would require more invasive changes to all code interacting with
    the socket to ensure that the bit is handled correctly, such as all
    error-handling cases along the path from the helper in BPF through to
    the orphan path in the input. Instead, we opt to use the destructor
    variable to switch on the prefetch of the socket.

    Signed-off-by: Joe Stringer
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz

    Joe Stringer
     
  • On low memory system, run time dumps can consume too much memory. Add
    administrator ability to disable auto dumps per reporter as part of the
    error flow handle routine.

    This attribute is not relevant while executing
    DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.

    By default, auto dump is activated for any reporter that has a dump method,
    as part of the reporter registration to devlink.

    Signed-off-by: Eran Ben Elisha
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Eran Ben Elisha
     
  • When health reporter is registered to devlink, devlink will implicitly set
    auto recover if and only if the reporter has a recover method. No reason
    to explicitly get the auto recover flag from the driver.

    Remove this flag from all drivers that called
    devlink_health_reporter_create.

    All existing health reporters set auto recovery to true if they have a
    recover method.

    Yet, administrator can unset auto recover via netlink command as prior to
    this patch.

    Signed-off-by: Eran Ben Elisha
    Reviewed-by: Jiri Pirko
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Eran Ben Elisha
     
  • The rest of the devlink code sets the extack message using
    NL_SET_ERR_MSG_MOD. Change the existing appearances of NL_SET_ERR_MSG
    to NL_SET_ERR_MSG_MOD.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Xin Long says:
    On udp rx path udp_rcv_segment() may do segment where the frag skbs
    will get the header copied from the head skb in skb_segment_list()
    by calling __copy_skb_header(), which could overwrite the frag skbs'
    extensions by __skb_ext_copy() and cause a leak.

    This issue was found after loading esp_offload where a sec path ext
    is set in the skb.

    Fix this by discarding head state of the fraglist skb before replacing
    its contents.

    Fixes: 3a1296a38d0cf62 ("net: Support GRO/GSO fraglist chaining.")
    Cc: Steffen Klassert
    Reported-by: Xiumei Mu
    Tested-by: Xin Long
    Signed-off-by: Florian Westphal
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Florian Westphal
     

30 Mar, 2020

5 commits

  • Add three string sets related to timestamping information:

    ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags
    ETH_SS_TS_TX_TYPES: timestamping Tx types
    ETH_SS_TS_RX_FILTERS: timestamping Rx filters

    These will be used for TIMESTAMP_GET request.

    v2: avoid compiler warning ("enumeration value not handled in switch")
    in net_hwtstamp_validate()

    v3: omit dash in Tx type names ("one-step-*" -> "onestep-*"), suggested by
    Richard Cochran

    Signed-off-by: Michal Kubecek
    Acked-by: Richard Cochran
    Signed-off-by: David S. Miller

    Michal Kubecek
     
  • This patch adds functionality to configure routes for RPL source routing
    functionality. There is no IPIP functionality yet implemented which can
    be added later when the cases when to use IPv6 encapuslation comes more
    clear.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • The build_state callback of lwtunnel doesn't contain the net namespace
    structure yet. This patch will add it so we can check on specific
    address configuration at creation time of rpl source routes.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
    offset of the GSO in skb cb. This patch fixes the typo.

    Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation")
    Signed-off-by: Cambda Zhu
    Signed-off-by: David S. Miller

    Cambda Zhu
     
  • page pool API can be useful for non-DMA cases like
    xen-netfront driver so let's allow to pass zero flags to
    page pool flags.

    v2: check DMA direction only if PP_FLAG_DMA_MAP is set

    Signed-off-by: Denis Kirjanov
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Denis Kirjanov
     

29 Mar, 2020

1 commit

  • While it is currently possible for userspace to specify that an existing
    XDP program should not be replaced when attaching to an interface, there is
    no mechanism to safely replace a specific XDP program with another.

    This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
    set along with IFLA_XDP_FD. If set, the kernel will check that the program
    currently loaded on the interface matches the expected one, and fail the
    operation if it does not. This corresponds to a 'cmpxchg' memory operation.
    Setting the new attribute with a negative value means that no program is
    expected to be attached, which corresponds to setting the UPDATE_IF_NOEXIST
    flag.

    A new companion flag, XDP_FLAGS_REPLACE, is also added to explicitly
    request checking of the EXPECTED_FD attribute. This is needed for userspace
    to discover whether the kernel supports the new attribute.

    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Reviewed-by: Jakub Kicinski
    Link: https://lore.kernel.org/bpf/158515700640.92963.3551295145441017022.stgit@toke.dk

    Toke Høiland-Jørgensen
     

28 Mar, 2020

7 commits

  • We already have the bpf_get_current_uid_gid() helper enabled, and
    given we now have perf event RB output available for connect(),
    sendmsg(), recvmsg() and bind-related hooks, add a trivial change
    to enable bpf_get_current_pid_tgid() and bpf_get_current_comm()
    as well.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/18744744ed93c06343be8b41edcfd858706f39d7.1585323121.git.daniel@iogearbox.net

    Daniel Borkmann
     
  • Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(),
    recvmsg() and bind-related hooks in order to retrieve the cgroup v2
    context which can then be used as part of the key for BPF map lookups,
    for example. Given these hooks operate in process context 'current' is
    always valid and pointing to the app that is performing mentioned
    syscalls if it's subject to a v2 cgroup. Also with same motivation of
    commit 7723628101aa ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper")
    enable retrieval of ancestor from current so the cgroup id can be used
    for policy lookups which can then forbid connect() / bind(), for example.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net

    Daniel Borkmann
     
  • Today, Kubernetes is still operating on cgroups v1, however, it is
    possible to retrieve the task's classid based on 'current' out of
    connect(), sendmsg(), recvmsg() and bind-related hooks for orchestrators
    which attach to the root cgroup v2 hook in a mixed env like in case
    of Cilium, for example, in order to then correlate certain pod traffic
    and use it as part of the key for BPF map lookups.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/555e1c69db7376c0947007b4951c260e1074efc3.1585323121.git.daniel@iogearbox.net

    Daniel Borkmann
     
  • In Cilium we're mainly using BPF cgroup hooks today in order to implement
    kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
    ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
    between Cilium managed nodes. While this works in its current shape and avoids
    packet-level NAT for inter Cilium managed node traffic, there is one major
    limitation we're facing today, that is, lack of netns awareness.

    In Kubernetes, the concept of Pods (which hold one or multiple containers)
    has been built around network namespaces, so while we can use the global scope
    of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
    NodePort ports on loopback addresses), we also have the need to differentiate
    between initial network namespaces and non-initial one. For example, ExternalIP
    services mandate that non-local service IPs are not to be translated from the
    host (initial) network namespace as one example. Right now, we have an ugly
    work-around in place where non-local service IPs for ExternalIP services are
    not xlated from connect() and friends BPF hooks but instead via less efficient
    packet-level NAT on the veth tc ingress hook for Pod traffic.

    On top of determining whether we're in initial or non-initial network namespace
    we also have a need for a socket-cookie like mechanism for network namespaces
    scope. Socket cookies have the nice property that they can be combined as part
    of the key structure e.g. for BPF LRU maps without having to worry that the
    cookie could be recycled. We are planning to use this for our sessionAffinity
    implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
    which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
    provide the cookie for the initial network namespace while passing the context
    instead of NULL would provide the cookie from the application's network namespace.
    We're using a hole, so no size increase; the assignment happens only once.
    Therefore this allows for a comparison on initial namespace as well as regular
    cookie usage as we have today with socket cookies. We could later on enable
    this helper for other program types as well as we would see need.

    (*) Both externalTrafficPolicy={Local|Cluster} types
    [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net

    Daniel Borkmann
     
  • Currently, connect(), sendmsg(), recvmsg() and bind-related hooks
    are all lacking perf event rb output in order to push notifications
    or monitoring events up to user space. Back in commit a5a3a828cd00
    ("bpf: add perf event notificaton support for sock_ops"), I've worked
    with Sowmini to enable them for sock_ops where the context part is
    not used (as opposed to skbs for example where the packet data can
    be appended). Make the bpf_sockopt_event_output() helper generic and
    enable it for mentioned hooks.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/69c39daf87e076b31e52473c902e9bfd37559124.1585323121.git.daniel@iogearbox.net

    Daniel Borkmann
     
  • We currently make heavy use of the socket cookie in BPF's connect(),
    sendmsg() and recvmsg() hooks for load-balancing decisions. However,
    it is currently not enabled/implemented in BPF {post-}bind hooks
    where it can later be used in combination for correlation in the tc
    egress path, for example.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/e9d71f310715332f12d238cc650c1edc5be55119.1585323121.git.daniel@iogearbox.net

    Daniel Borkmann
     
  • The indirect block setup should use TC_SETUP_FT as the type instead of
    TC_SETUP_BLOCK. Adjust existing users of the indirect flow block
    infrastructure.

    Fixes: b5140a36da78 ("netfilter: flowtable: add indr block setup support")
    Signed-off-by: wenxu
    Signed-off-by: Pablo Neira Ayuso

    wenxu
     

27 Mar, 2020

10 commits

  • Implement support for the DEVLINK_CMD_REGION_NEW command for creating
    snapshots. This new command parallels the existing
    DEVLINK_CMD_REGION_DEL.

    In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
    ".snapshot" operation must be implemented in the region's ops structure.

    The desired snapshot id must be provided. This helps avoid confusion on
    the purpose of DEVLINK_CMD_REGION_NEW, and keeps the API simpler.

    The requested id will be inserted into the xarray tracking the number of
    snapshots using each id. If this id is already used by another snapshot
    on any region, an error will be returned.

    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • Each snapshot created for a devlink region must have an id. These ids
    are supposed to be unique per "event" that caused the snapshot to be
    created. Drivers call devlink_region_snapshot_id_get to obtain a new id
    to use for a new event trigger. The id values are tracked per devlink,
    so that the same id number can be used if a triggering event creates
    multiple snapshots on different regions.

    There is no mechanism for snapshot ids to ever be reused. Introduce an
    xarray to store the count of how many snapshots are using a given id,
    replacing the snapshot_id field previously used for picking the next id.

    The devlink_region_snapshot_id_get() function will use xa_alloc to
    insert an initial value of 1 value at an available slot between 0 and
    U32_MAX.

    The new __devlink_snapshot_id_increment() and
    __devlink_snapshot_id_decrement() functions will be used to track how
    many snapshots currently use an id.

    Drivers must now call devlink_snapshot_id_put() in order to release
    their reference of the snapshot id after adding region snapshots.

    By tracking the total number of snapshots using a given id, it is
    possible for the decrement() function to erase the id from the xarray
    when it is not in use.

    With this method, a snapshot id can become reused again once all
    snapshots that referred to it have been deleted via
    DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.

    This work also paves the way to introduce a mechanism for userspace to
    request a snapshot.

    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • The devlink_snapshot_id_get() function returns a snapshot id. The
    snapshot id is a u32, so there is no way to indicate an error code.

    A future change is going to possibly add additional cases where this
    function could fail. Refactor the function to return the snapshot id in
    an argument, so that it can return zero or an error value.

    This ensures that snapshot ids cannot be confused with error values, and
    aids in the future refactor of snapshot id allocation management.

    Because there is no current way to release previously used snapshot ids,
    add a simple check ensuring that an error is reported in case the
    snapshot_id would over flow.

    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • A future change is going to implement a new devlink command to request
    a snapshot on demand. As part of this, the logic for handling the
    snapshot ids will be refactored. To simplify the snapshot id allocation
    function, move it to a separate function prefixed by `__`. This helper
    function will assume the lock is held.

    While no other callers will exist, it simplifies refactoring the logic
    because there is no need to complicate the function with gotos to handle
    unlocking on failure.

    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • The devlink_region_snapshot_create function returns -ENOMEM when the
    maximum number of snapshots has been reached. This is confusing because
    it is not an issue of being out of memory. Change this to use -ENOSPC
    instead.

    Reported-by: Jiri Pirko
    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • A future change is going to add a new devlink command to request
    a snapshot on demand. This function will want to call the
    devlink_region_snapshot_create function while already holding the
    devlink instance lock.

    Extract the logic of this function into a static function prefixed by
    `__` to indicate that it is an internal helper function. Modify the
    original function to be implemented in terms of the new locked
    function.

    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • The function documentation comment for devlink_region_snapshot_create
    included a literal tab character between 'future analyses' that was
    difficult to spot as it happened to only display as one space wide.

    Fix the comment to use a space here instead of a stray tab appearing in
    the middle of a sentence.

    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • It does not makes sense that two snapshots for a given region would use
    different destructors. Simplify snapshot creation by adding
    a .destructor op for regions.

    This operation will replace the data_destructor for the snapshot
    creation, and makes snapshot creation easier.

    Noticed-by: Jakub Kicinski
    Signed-off-by: Jacob Keller
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • Modify the devlink region code in preparation for adding new operations
    on regions.

    Create a devlink_region_ops structure, and move the name pointer from
    within the devlink_region structure into the ops structure (similar to
    the devlink_health_reporter_ops).

    This prepares the regions to enable support of additional operations in
    the future such as requesting snapshots, or accessing the region
    directly without a snapshot.

    In order to re-use the constant strings in the mlx4 driver their
    declaration must be changed to 'const char * const' to ensure the
    compiler realizes that both the data and the pointer cannot change.

    Signed-off-by: Jacob Keller
    Reviewed-by: Jakub Kicinski
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jacob Keller
     
  • Saeed Mahameed says:

    ====================
    mlx5-updates-2020-03-25

    1) Cleanups from Dan Carpenter and wenxu.

    2) Paul and Roi, Some minor updates and fixes to E-Switch to address
    issues introduced in the previous reg_c0 updates series.

    3) Eli Cohen simplifies and improves flow steering matching group searches
    and flow table entries version management.

    4) Parav Pandit, improves devlink eswitch mode changes thread safety.
    By making devlink rely on driver for thread safety and introducing mlx5
    eswitch mode change protection.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Mar, 2020

4 commits

  • devlink_nl_cmd_eswitch_set_doit() doesn't hold devlink->lock mutex while
    invoking driver callback. This is likely due to eswitch mode setting
    involves adding/remove devlink ports, health reporters or
    other devlink objects for a devlink device.

    So it is driver responsiblity to ensure thread safe eswitch state
    transition happening via either sriov legacy enablement or via devlink
    eswitch set callback.

    Therefore, get() callback should also be invoked without holding
    devlink->lock mutex.
    Vendor driver can use same internal lock which it uses during eswitch
    mode set() callback.
    This makes get() and set() implimentation symmetric in devlink core and
    in vendor drivers.

    Hence, remove holding devlink->lock mutex during eswitch get() callback.

    Failing to do so results into below deadlock scenario when mlx5_core
    driver is improved to handle eswitch mode set critical section invoked
    by devlink and sriov sysfs interface in subsequent patch.

    devlink_nl_cmd_eswitch_set_doit()
    mlx5_eswitch_mode_set()
    mutex_lock(esw->mode_lock) lock); lock); mode_lock)
    Reviewed-by: Mark Bloch
    Signed-off-by: Parav Pandit
    Signed-off-by: Saeed Mahameed

    Parav Pandit
     
  • Overlapping header include additions in macsec.c

    A bug fix in 'net' overlapping with the removal of 'version'
    string in ena_netdev.c

    Overlapping test additions in selftests Makefile

    Overlapping PCI ID table adjustments in iwlwifi driver.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
    net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
    pkt->skb->tc_redirected = 1;
    ^~
    net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
    pkt->skb->tc_from_ingress = 1;
    ^~

    To avoid a direct dependency with tc actions from netfilter, wrap the
    redirect bits around CONFIG_NET_REDIRECT and move helpers to
    include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
    only existing client of these bits in the tree.

    This patch adds skb_set_redirected() that sets on the redirected bit
    on the skbuff, it specifies if the packet was redirect from ingress
    and resets the timestamp (timestamp reset was originally missing in the
    netfilter bugfix).

    Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
    Reported-by: noreply@ellerman.id.au
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • TCP recvmsg() calls skb_copy_datagram_iter(), which
    calls an indirect function (cb pointing to simple_copy_to_iter())
    for every MSS (fragment) present in the skb.

    CONFIG_RETPOLINE=y forces a very expensive operation
    that we can avoid thanks to indirect call wrappers.

    This patch gives a 13% increase of performance on
    a single flow, if the bottleneck is the thread reading
    the TCP socket.

    Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers")
    Signed-off-by: Eric Dumazet
    Acked-by: Paolo Abeni
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Mar, 2020

1 commit

  • Packet trap groups are now explicitly registered by drivers and not
    implicitly registered when the packet traps are registered. Therefore,
    there is no need to encode entire group structure the trap is associated
    with inside the trap structure.

    Instead, only pass the group identifier. Refer to it as initial group
    identifier, as future patches will allow user space to move traps
    between groups.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel