10 Dec, 2013

1 commit


03 Dec, 2013

1 commit

  • This reverts commit 018c5bba052b3a383d83cf0c756da0e7bc748397.

    It causes regressions for people using chips driven by the sungem
    driver. Suspicion is that the skb->csum value isn't being adjusted
    properly.

    The change also has a bug in that if __pskb_trim() fails, we'll leave
    a corruped skb->csum value in there. We would really need to revert
    it to it's original value in that case.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Nov, 2013

1 commit

  • Currently pskb_trim_rcsum() just balks on CHECKSUM_COMPLETE packets
    and remarks them as CHECKSUM_NONE, forcing a software checksum
    validation later.

    We have all of the mechanics available to fixup the skb->csum value,
    even for complicated fragmented packets, via the helpers
    skb_checksum() and csum_sub().

    So just use them.

    Based upon a suggestion by Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Nov, 2013

1 commit

  • Pushing original fragments through causes several problems. For example
    for matching, frags may not be matched correctly. Take following
    example:

    On HOSTA do:
    ip6tables -I INPUT -p icmpv6 -j DROP
    ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

    and on HOSTB you do:
    ping6 HOSTA -s2000 (MTU is 1500)

    Incoming echo requests will be filtered out on HOSTA. This issue does
    not occur with smaller packets than MTU (where fragmentation does not happen)

    As was discussed previously, the only correct solution seems to be to use
    reassembled skb instead of separete frags. Doing this has positive side
    effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
    dances in ipvs and conntrack can be removed.

    Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
    entirely and use code in net/ipv6/reassembly.c instead.

    Signed-off-by: Jiri Pirko
    Acked-by: Julian Anastasov
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Jiri Pirko
     

08 Nov, 2013

2 commits

  • Use "@" to refer to parameters in the kernel-doc description. According
    to Documentation/kernel-doc-nano-HOWTO.txt "&" shall be used to refer to
    structures only.

    Signed-off-by: Mathias Krause
    Cc: "David S. Miller"
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • This function has usage beside IPsec so move it to the core skbuff code.
    While doing so, give it some documentation and change its return type to
    'unsigned char *' to be in line with skb_put().

    Signed-off-by: Mathias Krause
    Cc: Steffen Klassert
    Cc: "David S. Miller"
    Cc: Herbert Xu
    Signed-off-by: David S. Miller

    Mathias Krause
     

05 Nov, 2013

1 commit

  • Sometimes we need to coalesce the rx frags to avoid frag list. One example is
    virtio-net driver which tries to use small frags for both MTU sized packet and
    GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.

    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Cc: Michael Dalton
    Cc: Eric Dumazet
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jason Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jason Wang
     

04 Nov, 2013

1 commit

  • Currently, skb_checksum walks over 1) linearized, 2) frags[], and
    3) frag_list data and calculats the one's complement, a 32 bit
    result suitable for feeding into itself or csum_tcpudp_magic(),
    but unsuitable for SCTP as we're calculating CRC32c there.

    Hence, in order to not re-implement the very same function in
    SCTP (and maybe other protocols) over and over again, use an
    update() + combine() callback internally to allow for walking
    over the skb with different algorithms.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

22 Oct, 2013

1 commit

  • Now ipv6_gso_segment() is stackable, its relatively easy to
    implement GSO/TSO support for SIT tunnels

    Performance results, when segmentation is done after tunnel
    device (as no NIC is yet enabled for TSO SIT support) :

    Before patch :

    lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
    MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    87380 16384 16384 10.00 3168.31 4.81 4.64 2.988 2.877

    After patch :

    lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
    MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    87380 16384 16384 10.00 5525.00 7.76 5.17 2.763 1.840

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Oct, 2013

2 commits

  • Now inet_gso_segment() is stackable, its relatively easy to
    implement GSO/TSO support for IPIP

    Performance results, when segmentation is done after tunnel
    device (as no NIC is yet enabled for TSO IPIP support) :

    Before patch :

    lpq83:~# ./netperf -H 7.7.9.84 -Cc
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    87380 16384 16384 10.00 3357.88 5.09 3.70 2.983 2.167

    After patch :

    lpq83:~# ./netperf -H 7.7.9.84 -Cc
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    87380 16384 16384 10.00 7710.19 4.52 6.62 1.152 1.687

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In order to support GSO on IPIP, we need to make
    inet_gso_segment() stackable.

    It should not assume network header starts right after mac
    header.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Oct, 2013

1 commit

  • While working on virtio_net new allocation strategy to increase
    payload/truesize ratio, we found that refactoring sk_page_frag_refill()
    was needed.

    This patch splits sk_page_frag_refill() into two parts, adding
    skb_page_frag_refill() which can be used without a socket.

    While we are at it, add a minimum frag size of 32 for
    sk_page_frag_refill()

    Michael will either use netdev_alloc_frag() from softirq context,
    or skb_page_frag_refill() from process context in refill_work()
    (GFP_KERNEL allocations)

    Signed-off-by: Eric Dumazet
    Cc: Michael Dalton
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Oct, 2013

1 commit


02 Oct, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be.h
    drivers/net/usb/qmi_wwan.c
    drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
    include/net/netfilter/nf_conntrack_synproxy.h
    include/net/secure_seq.h

    The conflicts are of two varieties:

    1) Conflicts with Joe Perches's 'extern' removal from header file
    function declarations. Usually it's an argument signature change
    or a function being added/removed. The resolutions are trivial.

    2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
    a new value, another changes an existing value. That sort of
    thing.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Oct, 2013

2 commits


27 Sep, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches

    Joe Perches
     

04 Sep, 2013

2 commits


08 Aug, 2013

1 commit


04 Aug, 2013

1 commit


02 Aug, 2013

2 commits

  • Eliezer renames several *ll_poll to *busy_poll, but forgets
    CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

    Cc: Eliezer Tamir
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • David suggested to add a BUG_ON() to catch if some layer
    sets skb->sk pointer without a corresponding destructor.

    As skb can sit in a queue, it's mandatory to make sure the
    socket cannot disappear, and it's usually done by taking a
    reference on the socket, then releasing it from the skb
    destructor.

    This patch is a follow-up to commit c34a761231b5
    ("net: skb_orphan() changes") and will be reverted after
    catching all possible offenders if any.

    Suggested-by: David Miller
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Aug, 2013

1 commit

  • It is illegal to set skb->sk without corresponding destructor.

    Its therefore safe for skb_orphan() to not clear skb->sk if
    skb->destructor is not set.

    Also avoid clearing skb->destructor if already NULL.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Jul, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/freescale/fec_main.c
    drivers/net/ethernet/renesas/sh_eth.c
    net/ipv4/gre.c

    The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
    and the splitting of the gre.c code into seperate files.

    The FEC conflict was two sets of changes adding ethtool support code
    in an "!CONFIG_M5272" CPP protected block.

    Finally the sh_eth.c conflict was between one commit add bits set
    in the .eesr_err_check mask whilst another commit removed the
    .tx_error_check member and assignments.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Jun, 2013

1 commit


26 Jun, 2013

1 commit

  • commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
    added a possible skb leak, because it frees only the head of segment
    list, in case a skb_linearize() call fails.

    This patch adds a kfree_skb_list() helper to fix the bug.

    Signed-off-by: Eric Dumazet
    Cc: Pravin B Shelar
    Cc: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Jun, 2013

2 commits

  • Similar to the following commits:

    commit 00f97da17a0c8d656d0c9 (netpoll: fix position of network header)
    commit 525cebedb32a87fa48584 (pktgen: Fix position of ip and udp header)

    using skb_tail_offset() seems not correct since the offset
    is based on head pointer.

    With the last caller removed, skb_tail_offset() can be killed
    finally.

    Cc: Thomas Graf
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Adds an ndo_ll_poll method and the code that supports it.
    This method can be used by low latency applications to busy-poll
    Ethernet device queues directly from the socket code.
    sysctl_net_ll_poll controls how many microseconds to poll.
    Default is zero (disabled).
    Individual protocol support will be added by subsequent patches.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Eliezer Tamir
    Acked-by: Eric Dumazet
    Tested-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

06 Jun, 2013

1 commit

  • Merge 'net' bug fixes into 'net-next' as we have patches
    that will build on top of them.

    This merge commit includes a change from Emil Goode
    (emilgoode@gmail.com) that fixes a warning that would
    have been introduced by this merge. Specifically it
    fixes the pingv6_ops method ipv6_chk_addr() to add a
    "const" to the "struct net_device *dev" argument and
    likewise update the dummy_ipv6_chk_addr() declaration.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Jun, 2013

2 commits

  • udp6 over GRE tunnel does not work after to GRE tso changes. GRE
    tso handler passes inner packet but keeps track of outer header
    start in SKB_GSO_CB(skb)->mac_offset. udp6 fragment need to
    take care of outer header, which start at the mac_offset, while
    adding fragment header.
    This bug is introduced by commit 68c3316311 (GRE: Add TCP
    segmentation offload for GRE).

    Reported-by: Dmitry Kravkov
    Signed-off-by: Pravin B Shelar
    Tested-by: Dmitry Kravkov
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • commit 1a37e412a0225fcba5587 (net: Use 16bits for *_headers
    fields of struct skbuff) converts skb->*_header to u16,
    some #if NET_SKBUFF_DATA_USES_OFFSET are now useless,
    and to be safe, we could just use "X = (typeof(X)) ~0U;"
    as suggested by David.

    Cc: David S. Miller
    Cc: Simon Horman
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

29 May, 2013

1 commit

  • This corrects an regression introduced by "net: Use 16bits for *_headers
    fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In
    that case skb->tail will be a pointer however skb->network_header is now
    an offset.

    This patch corrects the problem by adding a wrapper to return skb tail as
    an offset regardless of the value of NET_SKBUFF_DATA_USES_OFFSET. It seems
    that skb->tail that this offset may be more than 64k and some care has been
    taken to treat such cases as an error.

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

28 May, 2013

2 commits

  • In the case where a non-MPLS packet is received and an MPLS stack is
    added it may well be the case that the original skb is GSO but the
    NIC used for transmit does not support GSO of MPLS packets.

    The aim of this code is to provide GSO in software for MPLS packets
    whose skbs are GSO.

    SKB Usage:

    When an implementation adds an MPLS stack to a non-MPLS packet it should do
    the following to skb metadata:

    * Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
    skb->inner_protocol is added by this patch.

    * Set skb->protocol to the new MPLS ethertype of the packet.

    * Set skb->network_header to correspond to the
    end of the L3 header, including the MPLS label stack.

    I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
    kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
    That patch sets the above requirements in datapath/actions.c:push_mpls()
    and was used to exercise this code. The datapath patch is against the Open
    vSwtich tree but it is intended that it be added to the Open vSwtich code
    present in the mainline Linux kernel at some point.

    Features:

    I believe that the approach that I have taken is at least partially
    consistent with the handling of other protocols. Jesse, I understand that
    you have some ideas here. I am more than happy to change my implementation.

    This patch adds dev->mpls_features which may be used by devices
    to advertise features supported for MPLS packets.

    A new NETIF_F_MPLS_GSO feature is added for devices which support
    hardware MPLS GSO offload. Currently no devices support this
    and MPLS GSO always falls back to software.

    Alternate Implementation:

    One possible alternate implementation is to teach netif_skb_features()
    and skb_network_protocol() about MPLS, in a similar way to their
    understanding of VLANs. I believe this would avoid the need
    for net/mpls/mpls_gso.c and in particular the calls to
    __skb_push() and __skb_push() in mpls_gso_segment().

    I have decided on the implementation in this patch as it should
    not introduce any overhead in the case where mpls_gso is not compiled
    into the kernel or inserted as a module.

    MPLS GSO suggested by Jesse Gross.
    Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
    by Pravin B Shelar.

    Cc: Jesse Gross
    Cc: Pravin B Shelar
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • In order to mitigate ongoing incresase in the size of struct skbuff
    use 16 bit integer offsets rather than pointers for inner_*_headers.

    This appears to reduce the size of struct skbuff from 0xd0 to 0xc0
    bytes on x86_64 with the following all unset.

    CONFIG_XFRM
    CONFIG_NF_CONNTRACK
    CONFIG_NF_CONNTRACK_MODULE
    NET_SKBUFF_NF_DEFRAG_NEEDED
    CONFIG_BRIDGE_NETFILTER
    CONFIG_NET_SCHED
    CONFIG_IPV6_NDISC_NODETYPE
    CONFIG_NET_DMA
    CONFIG_NETWORK_SECMARK

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

20 Apr, 2013

2 commits

  • Add a function to allocate a sk_buff head without any data. This will
    be used by memory mapped netlink to attach data from the mmaped area
    to the skb.

    Additionally change skb_release_all() to check whether the skb has a
    data area to allow the skb destructor to clear the data pointer in case
    only a head has been allocated.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add a protocol argument to the VLAN packet tagging functions. In case of HW
    tagging, we need that protocol available in the ndo_start_xmit functions,
    so it is stored in a new field in the skb. The new field fits into a hole
    (on 64 bit) and doesn't increase the sks's size.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

08 Apr, 2013

1 commit


06 Apr, 2013

1 commit

  • Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
    to reset nf_trace in nf_reset(). This is wrong and unnecessary.

    nf_reset() is used in the following cases:

    - when passing packets up the the socket layer, at which point we want to
    release all netfilter references that might keep modules pinned while
    the packet is queued. nf_trace doesn't matter anymore at this point.

    - when encapsulating or decapsulating IPsec packets. We want to continue
    tracing these packets after IPsec processing.

    - when passing packets through virtual network devices. Only devices on
    that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
    used anymore. Its not entirely clear whether those packets should
    be traced after that, however we've always done that.

    - when passing packets through virtual network devices that make the
    packet cross network namespace boundaries. This is the only cases
    where we clearly want to reset nf_trace and is also what the
    original patch intended to fix.

    Add a new function nf_reset_trace() and use it in dev_forward_skb() to
    fix this properly.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

02 Apr, 2013

1 commit

  • Rename skb_dst_set_noref to __skb_dst_set_noref and
    add force flag as suggested by David Miller. The new wrapper
    skb_dst_set_noref_force will force dst entries that are not
    cached to be attached as skb dst without taking reference
    as long as provided dst is reclaimed after RCU grace period.

    Signed-off-by: Julian Anastasov
    Signed-off by: Hans Schillstrom
    Acked-by: David S. Miller
    Signed-off-by: Simon Horman

    Julian Anastasov