27 Nov, 2017

2 commits

  • If we want to add a datapath flow, which has more than 500 vxlan outputs'
    action, we will get the following error reports:
    openvswitch: netlink: Flow action size 32832 bytes exceeds max
    openvswitch: netlink: Flow action size 32832 bytes exceeds max
    openvswitch: netlink: Actions may not be safe on all matching packets
    ... ...

    It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
    this is not the root cause. For example, for a vxlan output action, we need
    about 60 bytes for the nlattr, but after it is converted to the flow
    action, it only occupies 24 bytes. This means that we can still support
    more than 1000 vxlan output actions for a single datapath flow under the
    the current 32k max limitation.

    So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
    shouldn't report EINVAL and keep it move on, as the judgement can be
    done by the reserve_sfa_size.

    Signed-off-by: zhangliping
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    zhangliping
     
  • gso_type is being used in binary AND operations together with SKB_GSO_UDP.
    The issue is that variable gso_type is of type unsigned short and
    SKB_GSO_UDP expands to more than 16 bits:

    SKB_GSO_UDP = 1 << 16

    this makes any binary AND operation between gso_type and SKB_GSO_UDP to
    be always zero, hence making some code unreachable and likely causing
    undesired behavior.

    Fix this by changing the data type of variable gso_type to unsigned int.

    Addresses-Coverity-ID: 1462223
    Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
    Signed-off-by: Gustavo A. R. Silva
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

24 Nov, 2017

1 commit

  • Tuntap and similar devices can inject GSO packets. Accept type
    VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

    Processes are expected to use feature negotiation such as TUNSETOFFLOAD
    to detect supported offload types and refrain from injecting other
    packets. This process breaks down with live migration: guest kernels
    do not renegotiate flags, so destination hosts need to expose all
    features that the source host does.

    Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
    This patch introduces nearly(*) no new code to simplify verification.
    It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
    insertion and software UFO segmentation.

    It does not reinstate protocol stack support, hardware offload
    (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
    of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

    To support SKB_GSO_UDP reappearing in the stack, also reinstate
    logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
    by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
    CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
    ("net: avoid skb_warn_bad_offload false positives on UFO").

    (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
    ipv6_proxy_select_ident is changed to return a __be32 and this is
    assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
    at the end of the enum to minimize code churn.

    Tested
    Booted a v4.13 guest kernel with QEMU. On a host kernel before this
    patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
    enabled, same as on a v4.13 host kernel.

    A UFO packet sent from the guest appears on the tap device:
    host:
    nc -l -p -u 8000 &
    tcpdump -n -i tap0

    guest:
    dd if=/dev/zero of=payload.txt bs=1 count=2000
    nc -u 192.16.1.1 8000 < payload.txt

    Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
    packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

    Changes
    v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

    Link: http://lkml.kernel.org/r/
    Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
    Reported-by: Michal Kubecek
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

15 Nov, 2017

1 commit

  • It seems that the intention of the code is to null check the value
    returned by function genlmsg_put. But the current code is null
    checking the address of the pointer that holds the value returned
    by genlmsg_put.

    Fix this by properly null checking the value returned by function
    genlmsg_put in order to avoid a pontential null pointer dereference.

    Addresses-Coverity-ID: 1461561 ("Dereference before null check")
    Addresses-Coverity-ID: 1461562 ("Dereference null return value")
    Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

14 Nov, 2017

3 commits


13 Nov, 2017

3 commits


08 Nov, 2017

1 commit

  • v16->17
    - Fixed disputed check code: keep them in nsh_push and nsh_pop
    but also add them in __ovs_nla_copy_actions

    v15->v16
    - Add csum recalculation for nsh_push, nsh_pop and set_nsh
    pointed out by Pravin
    - Move nsh key into the union with ipv4 and ipv6 and add
    check for nsh key in match_validate pointed out by Pravin
    - Add nsh check in validate_set and __ovs_nla_copy_actions

    v14->v15
    - Check size in nsh_hdr_from_nlattr
    - Fixed four small issues pointed out By Jiri and Eric

    v13->v14
    - Rename skb_push_nsh to nsh_push per Dave's comment
    - Rename skb_pop_nsh to nsh_pop per Dave's comment

    v12->v13
    - Fix NSH header length check in set_nsh

    v11->v12
    - Fix missing changes old comments pointed out
    - Fix new comments for v11

    v10->v11
    - Fix the left three disputable comments for v9
    but not fixed in v10.

    v9->v10
    - Change struct ovs_key_nsh to
    struct ovs_nsh_key_base base;
    __be32 context[NSH_MD1_CONTEXT_SIZE];
    - Fix new comments for v9

    v8->v9
    - Fix build error reported by daily intel build
    because nsh module isn't selected by openvswitch

    v7->v8
    - Rework nested value and mask for OVS_KEY_ATTR_NSH
    - Change pop_nsh to adapt to nsh kernel module
    - Fix many issues per comments from Jiri Benc

    v6->v7
    - Remove NSH GSO patches in v6 because Jiri Benc
    reworked it as another patch series and they have
    been merged.
    - Change it to adapt to nsh kernel module added by NSH
    GSO patch series

    v5->v6
    - Fix the rest comments for v4.
    - Add NSH GSO support for VxLAN-gpe + NSH and
    Eth + NSH.

    v4->v5
    - Fix many comments by Jiri Benc and Eric Garver
    for v4.

    v3->v4
    - Add new NSH match field ttl
    - Update NSH header to the latest format
    which will be final format and won't change
    per its author's confirmation.
    - Fix comments for v3.

    v2->v3
    - Change OVS_KEY_ATTR_NSH to nested key to handle
    length-fixed attributes and length-variable
    attriubte more flexibly.
    - Remove struct ovs_action_push_nsh completely
    - Add code to handle nested attribute for SET_MASKED
    - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
    to transfer NSH header data.
    - Fix comments and coding style issues by Jiri and Eric

    v1->v2
    - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
    - Dynamically allocate struct ovs_action_push_nsh for
    length-variable metadata.

    OVS master and 2.8 branch has merged NSH userspace
    patch series, this patch is to enable NSH support
    in kernel data path in order that OVS can support
    NSH in compat mode by porting this.

    Signed-off-by: Yi Yang
    Acked-by: Jiri Benc
    Acked-by: Eric Garver
    Acked-by: Pravin Shelar
    Signed-off-by: David S. Miller

    Yi Yang
     

05 Nov, 2017

1 commit

  • This patch allows reliable identification of netdevice interfaces connected
    to openvswitch bridges. In particular, user space queries the netdev
    interfaces belonging to the ports for statistics, up/down state, etc.
    Datapath dump needs to provide enough information for the user space to be
    able to do that.

    Currently, only interface names are returned. This is not sufficient, as
    openvswitch allows its ports to be in different name spaces and the
    interface name is valid only in its name space. What is needed and generally
    used in other netlink APIs, is the pair ifindex+netnsid.

    The solution is addition of the ifindex+netnsid pair (or only ifindex if in
    the same name space) to vport get/dump operation.

    On request side, ideally the ifindex+netnsid pair could be used to
    get/set/del the corresponding vport. This is not implemented by this patch
    and can be added later if needed.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

04 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

22 Oct, 2017

1 commit


11 Oct, 2017

1 commit

  • This adds a ct_clear action for clearing conntrack state. ct_clear is
    currently implemented in OVS userspace, but is not backed by an action
    in the kernel datapath. This is useful for flows that may modify a
    packet tuple after a ct lookup has already occurred.

    Signed-off-by: Eric Garver
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Eric Garver
     

10 Oct, 2017

1 commit


05 Oct, 2017

1 commit


13 Sep, 2017

1 commit


04 Sep, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. Basically, updates to the conntrack core, enhancements for
    nf_tables, conversion of netfilter hooks from linked list to array to
    improve memory locality and asorted improvements for the Netfilter
    codebase. More specifically, they are:

    1) Add expection to hashes after timer initialization to prevent
    access from another CPU that walks on the hashes and calls
    del_timer(), from Florian Westphal.

    2) Don't update nf_tables chain counters from hot path, this is only
    used by the x_tables compatibility layer.

    3) Get rid of nested rcu_read_lock() calls from netfilter hook path.
    Hooks are always guaranteed to run from rcu read side, so remove
    nested rcu_read_lock() where possible. Patch from Taehee Yoo.

    4) nf_tables new ruleset generation notifications include PID and name
    of the process that has updated the ruleset, from Phil Sutter.

    5) Use skb_header_pointer() from nft_fib, so we can reuse this code from
    the nf_family netdev family. Patch from Pablo M. Bermudo.

    6) Add support for nft_fib in nf_tables netdev family, also from Pablo.

    7) Use deferrable workqueue for conntrack garbage collection, to reduce
    power consumption, from Patch from Subash Abhinov Kasiviswanathan.

    8) Add nf_ct_expect_iterate_net() helper and use it. From Florian
    Westphal.

    9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian.

    10) Drop references on conntrack removal path when skbuffs has escaped via
    nfqueue, from Florian.

    11) Don't queue packets to nfqueue with dying conntrack, from Florian.

    12) Constify nf_hook_ops structure, from Florian.

    13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter.

    14) Add nla_strdup(), from Phil Sutter.

    15) Rise nf_tables objects name size up to 255 chars, people want to use
    DNS names, so increase this according to what RFC 1035 specifies.
    Patch series from Phil Sutter.

    16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook
    registration on demand, suggested by Eric Dumazet, patch from Florian.

    17) Remove unused variables in compat_copy_entry_from_user both in
    ip_tables and arp_tables code. Patch from Taehee Yoo.

    18) Constify struct nf_conntrack_l4proto, from Julia Lawall.

    19) Constify nf_loginfo structure, also from Julia.

    20) Use a single rb root in connlimit, from Taehee Yoo.

    21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo.

    22) Use audit_log() instead of open-coding it, from Geliang Tang.

    23) Allow to mangle tcp options via nft_exthdr, from Florian.

    24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes
    a fix for a miscalculation of the minimal length.

    25) Simplify branch logic in h323 helper, from Nick Desaulniers.

    26) Calculate netlink attribute size for conntrack tuple at compile
    time, from Florian.

    27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure.
    From Florian.

    28) Remove holes in nf_conntrack_l4proto structure, so it becomes
    smaller. From Florian.

    29) Get rid of print_tuple() indirection for /proc conntrack listing.
    Place all the code in net/netfilter/nf_conntrack_standalone.c.
    Patch from Florian.

    30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is
    off. From Florian.

    31) Constify most nf_conntrack_{l3,l4}proto helper functions, from
    Florian.

    32) Fix broken indentation in ebtables extensions, from Colin Ian King.

    33) Fix several harmless sparse warning, from Florian.

    34) Convert netfilter hook infrastructure to use array for better memory
    locality, joint work done by Florian and Aaron Conole. Moreover, add
    some instrumentation to debug this.

    35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once
    per batch, from Florian.

    36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian.

    37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao.

    38) Remove unused code in the generic protocol tracker, from Davide
    Caratti.

    I think I will have material for a second Netfilter batch in my queue if
    time allow to make it fit in this merge window.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Aug, 2017

1 commit


22 Aug, 2017

1 commit


17 Aug, 2017

1 commit

  • For sw_flow_actions, the actions_len only represents the kernel part's
    size, and when we dump the actions to the userspace, we will do the
    convertions, so it's true size may become bigger than the actions_len.

    But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
    to alloc the skbuff, so the user_skb's size may become insufficient and
    oops will happen like this:
    skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
    ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:129!
    [...]
    Call Trace:

    [] skb_put+0x43/0x44
    [] skb_zerocopy+0x6c/0x1f4
    [] queue_userspace_packet+0x3a3/0x448 [openvswitch]
    [] ovs_dp_upcall+0x30/0x5c [openvswitch]
    [] output_userspace+0x132/0x158 [openvswitch]
    [] ? ip6_rcv_finish+0x74/0x77 [ipv6]
    [] do_execute_actions+0xcc1/0xdc8 [openvswitch]
    [] ovs_execute_actions+0x74/0x106 [openvswitch]
    [] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
    [] ? key_extract+0x63c/0x8d5 [openvswitch]
    [] ovs_vport_receive+0xa1/0xc3 [openvswitch]
    [...]

    Also we can find that the actions_len is much little than the orig_len:
    crash> struct sw_flow_actions 0xffff8812f539d000
    struct sw_flow_actions {
    rcu = {
    next = 0xffff8812f5398800,
    func = 0xffffe3b00035db32
    },
    orig_len = 1384,
    actions_len = 592,
    actions = 0xffff8812f539d01c
    }

    So as a quick fix, use the orig_len instead of the actions_len to alloc
    the user_skb.

    Last, this oops happened on our system running a relative old kernel, but
    the same risk still exists on the mainline, since we use the wrong
    actions_len from the beginning.

    Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
    Cc: Neil McKee
    Signed-off-by: Liping Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Liping Zhang
     

12 Aug, 2017

1 commit


02 Aug, 2017

1 commit


25 Jul, 2017

1 commit


21 Jul, 2017

1 commit


20 Jul, 2017

2 commits

  • When calling the flow_free() to free the flow, we call many times
    (cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
    take up our CPU usage if we call the flow_free() frequently.
    When we put all packets to userspace via upcall, and OvS will send
    them back via netlink to ovs_packet_cmd_execute(will call flow_free).

    The test topo is shown as below. VM01 sends TCP packets to VM02,
    and OvS forward packtets. When testing, we use perf to report the
    system performance.

    VM01 --- OvS-VM --- VM02

    Without this patch, perf-top show as below: The flow_free() is
    3.02% CPU usage.

    4.23% [kernel] [k] _raw_spin_unlock_irqrestore
    3.62% [kernel] [k] __do_softirq
    3.16% [kernel] [k] __memcpy
    3.02% [kernel] [k] flow_free
    2.42% libc-2.17.so [.] __memcpy_ssse3_back
    2.18% [kernel] [k] copy_user_generic_unrolled
    2.17% [kernel] [k] find_next_bit

    When applied this patch, perf-top show as below: Not shown on
    the list anymore.

    4.11% [kernel] [k] _raw_spin_unlock_irqrestore
    3.79% [kernel] [k] __do_softirq
    3.46% [kernel] [k] __memcpy
    2.73% libc-2.17.so [.] __memcpy_ssse3_back
    2.25% [kernel] [k] copy_user_generic_unrolled
    1.89% libc-2.17.so [.] _int_malloc
    1.53% ovs-vswitchd [.] xlate_actions

    With this patch, the TCP throughput(we dont use Megaflow Cache
    + Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
    (maybe ~10% performance imporve).

    This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
    that the flow used. And we only check the flow_stats on the cpu we
    used, and it is unncessary to check all possible cpu when getting,
    cleaning, and updating the flow_stats. Adding the cpu_used_mask to
    sw_flow struct does’t increase the cacheline number.

    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • In the ovs_flow_stats_update(), we only use the node
    var to alloc flow_stats struct. But this is not a
    common case, it is unnecessary to call the numa_node_id()
    everytime. This patch is not a bugfix, but there maybe
    a small increase.

    Signed-off-by: Tonghao Zhang
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

18 Jul, 2017

1 commit


16 Jul, 2017

1 commit

  • When there is an established connection in direction A->B, it is
    possible to receive a packet on port B which then executes
    ct(commit,force) without first performing ct() - ie, a lookup.
    In this case, we would expect that this packet can delete the existing
    entry so that we can commit a connection with direction B->A. However,
    currently we only perform a check in skb_nfct_cached() for whether
    OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
    lookup previously occurred. In the above scenario, a lookup has not
    occurred but we should still be able to statelessly look up the
    existing entry and potentially delete the entry if it is in the
    opposite direction.

    This patch extends the check to also hint that if the action has the
    force flag set, then we will lookup the existing entry so that the
    force check at the end of skb_nfct_cached has the ability to delete
    the connection.

    Fixes: dd41d330b03 ("openvswitch: Add force commit.")
    CC: Pravin Shelar
    CC: dev@openvswitch.org
    Signed-off-by: Joe Stringer
    Signed-off-by: Greg Rose
    Signed-off-by: David S. Miller

    Greg Rose
     

03 Jul, 2017

1 commit


02 Jul, 2017

1 commit

  • When compiling OvS-master on 4.4.0-81 kernel,
    there is a warning:

    CC [M] /root/ovs/datapath/linux/datapath.o
    /root/ovs/datapath/linux/datapath.c: In function
    'ovs_flow_cmd_set':
    /root/ovs/datapath/linux/datapath.c:1221:1: warning:
    the frame size of 1040 bytes is larger than 1024 bytes
    [-Wframe-larger-than=]

    This patch factors out match-init and action-copy to avoid
    "Wframe-larger-than=1024" warning. Because mask is only
    used to get actions, we new a function to save some
    stack space.

    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

25 Jun, 2017

1 commit

  • Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
    representators and control messages over single set of hardware queues.
    Control messages and muxed traffic may need ordered delivery.

    Those requirements make it hard to comfortably use TC infrastructure today
    unless we have a way of attaching metadata to skbs at the upper device.
    Because single set of queues is used for many netdevs stopping TC/sched
    queues of all of them reliably is impossible and lower device has to
    retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
    the fastpath.

    This patch attempts to enable port/representative devs to attach metadata
    to skbs which carry port id. This way representatives can be queueless and
    all queuing can be performed at the lower netdev in the usual way.

    Traffic arriving on the port/representative interfaces will be have
    metadata attached and will subsequently be queued to the lower device for
    transmission. The lower device should recognize the metadata and translate
    it to HW specific format which is most likely either a special header
    inserted before the network headers or descriptor/metadata fields.

    Metadata is associated with the lower device by storing the netdev pointer
    along with port id so that if TC decides to redirect or mirror the new
    netdev will not try to interpret it.

    This is mostly for SR-IOV devices since switches don't have lower netdevs
    today.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

21 Jun, 2017

1 commit


16 Jun, 2017

1 commit

  • There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

15 Jun, 2017

1 commit


08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller
     

23 May, 2017

1 commit


20 May, 2017

1 commit