28 Dec, 2016

4 commits

  • Pull networking fixes from David Miller:

    1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.

    The most important is to not assume the packet is RX just because
    the destination address matches that of the device. Such an
    assumption causes problems when an interface is put into loopback
    mode.

    2) If we retry when creating a new tc entry (because we dropped the
    RTNL mutex in order to load a module, for example) we end up with
    -EAGAIN and then loop trying to replay the request. But we didn't
    reset some state when looping back to the top like this, and if
    another thread meanwhile inserted the same tc entry we were trying
    to, we re-link it creating an enless loop in the tc chain. Fix from
    Daniel Borkmann.

    3) There are two different WRITE bits in the MDIO address register for
    the stmmac chip, depending upon the chip variant. Due to a bug we
    could set them both, fix from Hock Leong Kweh.

    4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: stmmac: fix incorrect bit set in gmac4 mdio addr register
    r8169: add support for RTL8168 series add-on card.
    net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
    openvswitch: upcall: Fix vlan handling.
    ipv4: Namespaceify tcp_tw_reuse knob
    net: korina: Fix NAPI versus resources freeing
    net, sched: fix soft lockup in tc_classify
    net/mlx4_en: Fix user prio field in XDP forward
    tipc: don't send FIN message from connectionless socket
    ipvlan: fix multicast processing
    ipvlan: fix various issues in ipvlan_process_multicast()

    Linus Torvalds
     
  • After commit 73b62bd085f4737679ea9afc7867fa5f99ba7d1b ("virtio-net:
    remove the warning before XDP linearizing"), there's no users for
    bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
    commit f23bc46c30ca5ef58b8549434899fcbac41b2cfc.

    Cc: Daniel Borkmann
    Cc: John Fastabend
    Signed-off-by: Jason Wang
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jason Wang
     
  • Networking stack accelerate vlan tag handling by
    keeping topmost vlan header in skb. This works as
    long as packet remains in OVS datapath. But during
    OVS upcall vlan header is pushed on to the packet.
    When such packet is sent back to OVS datapath, core
    networking stack might not handle it correctly. Following
    patch avoids this issue by accelerating the vlan tag
    during flow key extract. This simplifies datapath by
    bringing uniform packet processing for packets from
    all code paths.

    Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
    CC: Jarno Rajahalme
    CC: Jiri Benc
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     
  • Different namespaces might have different requirements to reuse
    TIME-WAIT sockets for new connections. This might be required in
    cases where different namespace applications are in place which
    require TIME_WAIT socket connections to be reduced independently
    of the host.

    Signed-off-by: Haishuang Yan
    Signed-off-by: David S. Miller

    Haishuang Yan
     

27 Dec, 2016

1 commit

  • Shahar reported a soft lockup in tc_classify(), where we run into an
    endless loop when walking the classifier chain due to tp->next == tp
    which is a state we should never run into. The issue only seems to
    trigger under load in the tc control path.

    What happens is that in tc_ctl_tfilter(), thread A allocates a new
    tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
    with it. In that classifier callback we had to unlock/lock the rtnl
    mutex and returned with -EAGAIN. One reason why we need to drop there
    is, for example, that we need to request an action module to be loaded.

    This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
    after we loaded and found the requested action, we need to redo the
    whole request so we don't race against others. While we had to unlock
    rtnl in that time, thread B's request was processed next on that CPU.
    Thread B added a new tp instance successfully to the classifier chain.
    When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
    and destroying its tp instance which never got linked, we goto replay
    and redo A's request.

    This time when walking the classifier chain in tc_ctl_tfilter() for
    checking for existing tp instances we had a priority match and found
    the tp instance that was created and linked by thread B. Now calling
    again into tp->ops->change() with that tp was successful and returned
    without error.

    tp_created was never cleared in the second round, thus kernel thinks
    that we need to link it into the classifier chain (once again). tp and
    *back point to the same object due to the match we had earlier on. Thus
    for thread B's already public tp, we reset tp->next to tp itself and
    link it into the chain, which eventually causes the mentioned endless
    loop in tc_classify() once a packet hits the data path.

    Fix is to clear tp_created at the beginning of each request, also when
    we replay it. On the paths that can cause -EAGAIN we already destroy
    the original tp instance we had and on replay we really need to start
    from scratch. It seems that this issue was first introduced in commit
    12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
    and avoid kernel panic when we use cls_cgroup").

    Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
    Reported-by: Shahar Klein
    Signed-off-by: Daniel Borkmann
    Cc: Cong Wang
    Acked-by: Eric Dumazet
    Tested-by: Shahar Klein
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

26 Dec, 2016

3 commits

  • No point in going through loops and hoops instead of just comparing the
    values.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     
  • ktime_set(S,N) was required for the timespec storage type and is still
    useful for situations where a Seconds and Nanoseconds part of a time value
    needs to be converted. For anything where the Seconds argument is 0, this
    is pointless and can be replaced with a simple assignment.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     
  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

25 Dec, 2016

1 commit


24 Dec, 2016

8 commits

  • In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the
    check for socket type is in the wrong place, causing a closing socket
    to always send out a FIN message even when the socket was never
    connected. This is normally harmless, since the destination node for
    such messages most often is zero, and the message will be dropped, but
    it is still a wrong and confusing behavior.

    We fix this in this commit.

    Reviewed-by: Parthasarathy Bhuvaragan
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Currently if SCTP closes the receive window with window pressure, mostly
    caused by excessive skb overhead on payload/overheads ratio, SCTP will
    close the window abruptly while saving the delta on rwnd_press. It will
    start recovering rwnd as the chunks are consumed by the application and
    the rwnd_press will be only recovered after rwnd reach the same value as
    of rwnd_press, mostly to prevent silly window syndrome.

    Thing is, this is very inefficient with small data chunks, as with those
    it will never reach back that value, and thus it will never recover from
    such pressure. This means that we will not issue window updates when
    recovering from 0 window and will rely on a sender retransmit to notice
    it.

    The fix here is to remove such threshold, as no value is good enough: it
    depends on the (avg) chunk sizes being used.

    Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
    sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
    Rate limited to 845kbps, for visibility. Capture done at netserver side.

    Previously:
    01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
    01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
    01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
    01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
    01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
    01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
    01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
    02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    (window update missed)
    04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
    04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
    04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
    04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
    04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g

    After:
    02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
    02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
    02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
    02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
    02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
    02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
    02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
    03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
    03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
    03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
    03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
    03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
    03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
    03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
    (following data transmission triggered by window updates above)
    03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
    03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
    03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
    03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
    03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
    03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
    03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • It's possible that we receive a packet that is larger than current
    window. If it's the first packet in this way, it will cause it to
    increase rwnd_over. Then, if we receive another data chunk (specially as
    SCTP allows you to have one data chunk in flight even during 0 window),
    rwnd_over will be overwritten instead of added to.

    In the long run, this could cause the window to grow bigger than its
    initial size, as rwnd_over would be charged only for the last received
    data chunk while the code will try open the window for all packets that
    were received and had its value in rwnd_over overwritten. This, then,
    can lead to the worsening of payload/buffer ratio and cause rwnd_press
    to kick in more often.

    The fix is to sum it too, same as is done for rwnd_press, so that if we
    receive 3 chunks after closing the window, we still have to release that
    same amount before re-opening it.

    Log snippet from sctp_test exhibiting the issue:
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease:
    association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease:
    association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease:
    association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • neigh_cleanup_and_release() is always called after marking a neighbour
    as dead, but it only notifies user space and not in-kernel listeners of
    the netevent notification chain.

    This can cause multiple problems. In my specific use case, it causes the
    listener (a switch driver capable of L3 offloads) to believe a neighbour
    entry is still valid, and is thus erroneously kept in the device's
    table.

    Fix that by sending a netevent after marking the neighbour as dead.

    Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • By setting certain socket options on ipv6 raw sockets, we can confuse the
    length calculation in rawv6_push_pending_frames triggering a BUG_ON.

    RIP: 0010:[] [] rawv6_sendmsg+0xc30/0xc40
    RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282
    RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
    RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
    RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
    R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
    R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

    Call Trace:
    [] ? unmap_page_range+0x693/0x830
    [] inet_sendmsg+0x67/0xa0
    [] sock_sendmsg+0x38/0x50
    [] SYSC_sendto+0xef/0x170
    [] SyS_sendto+0xe/0x10
    [] do_syscall_64+0x50/0xa0
    [] entry_SYSCALL64_slow_path+0x25/0x25

    Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

    Reproducer:

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define LEN 504

    int main(int argc, char* argv[])
    {
    int fd;
    int zero = 0;
    char buf[LEN];

    memset(buf, 0, LEN);

    fd = socket(AF_INET6, SOCK_RAW, 7);

    setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
    setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

    sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
    }

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones
     
  • Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
    the packet. For sockets that have transport headers pulled, transport
    offset can be negative. Use signed comparison to avoid overflow.

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Reported-by: Nisar Jagabar
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • When matching on flags, we should require the user to provide the
    mask and avoid using an all-ones mask. Not doing so causes matching
    on flags provided w.o mask to hit on the value being unset for all
    flags, which may not what the user wanted to happen.

    Fixes: faa3ffce7829 ('net/sched: cls_flower: Add support for matching on flags')
    Signed-off-by: Or Gerlitz
    Reported-by: Paul Blakey
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Or Gerlitz
     
  • The UDP dst port was provided to the helper function which sets the
    IPv6 IP tunnel meta-data under a wrong param order, fix that.

    Fixes: 75bfbca01e48 ('net/sched: act_tunnel_key: Add UDP dst port option')
    Signed-off-by: Or Gerlitz
    Reviewed-by: Hadar Hen Zion
    Signed-off-by: David S. Miller

    Or Gerlitz
     

23 Dec, 2016

1 commit

  • Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
    protocols.") made ip_do_redirect call sock_net(sk) to determine
    the network namespace of the passed-in socket. This crashes if sk
    is NULL.

    Fix this by getting the network namespace from the skb instead.

    Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     

22 Dec, 2016

1 commit

  • Madalin reported crashes happening in tcp_tasklet_func() on powerpc64

    Before TSQ_QUEUED bit is cleared, we must ensure the changes done
    by list_del(&tp->tsq_node); are committed to memory, otherwise
    corruption might happen, as an other cpu could catch TSQ_QUEUED
    clearance too soon.

    We can notice that old kernels were immune to this bug, because
    TSQ_QUEUED was cleared after a bh_lock_sock(sk)/bh_unlock_sock(sk)
    section, but they could have missed a kick to write additional bytes,
    when NIC interrupts for a given flow are spread to multiple cpus.

    Affected TCP flows would need an incoming ACK or RTO timer to add more
    packets to the pipe. So overall situation should be better now.

    Fixes: b223feb9de2a ("tcp: tsq: add shortcut in tcp_tasklet_func()")
    Signed-off-by: Eric Dumazet
    Reported-by: Madalin Bucur
    Tested-by: Madalin Bucur
    Tested-by: Xing Lei
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Dec, 2016

6 commits

  • To make the code clearer, use rb_entry() instead of container_of() to
    deal with rbtree.

    Signed-off-by: Geliang Tang
    Reviewed-by: Leon Romanovsky
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Geliang Tang
     
  • To make the code clearer, use rb_entry() instead of container_of() to
    deal with rbtree.

    Signed-off-by: Geliang Tang
    Signed-off-by: David S. Miller

    Geliang Tang
     
  • To make the code clearer, use rb_entry() instead of container_of() to
    deal with rbtree.

    Signed-off-by: Geliang Tang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Geliang Tang
     
  • sctp.local_addr_list is a global address list that is supposed to include
    all the local addresses. sctp updates this list according to NETDEV_UP/
    NETDEV_DOWN notifications.

    However, if multiple NICs have the same address, the global list would
    have duplicate addresses. Even if for one NIC, promote secondaries in
    __inet_del_ifa can also lead to accumulating duplicate addresses.

    When sctp binds address 'ANY' and creates a connection, it copies all
    the addresses from global list into asoc's bind addr list, which makes
    sctp pack the duplicate addresses into INIT/INIT_ACK packets.

    This patch is to filter the duplicate addresses when copying the addrs
    from global list in sctp_copy_local_addr_list and unpacking addr_param
    from cookie in sctp_raw_to_bind_addrs to asoc's bind addr list.

    Note that we can't filter the duplicate addrs when global address list
    gets updated, As NETDEV_DOWN event may remove an addr that still exists
    in another NIC.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to reduce indent level by using continue when the addr
    is not allowed, and also drop end_copy by using break.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Add a break statement to prevent fall-through from
    OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break
    actions setting ethernet addresses fail to validate with log messages
    complaining about invalid tunnel attributes.

    Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
    Signed-off-by: Jarno Rajahalme
    Acked-by: Pravin B Shelar
    Acked-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     

20 Dec, 2016

1 commit

  • …_data and ip_finish_output

    There is an inconsistent conditional judgement in __ip_append_data and
    ip_finish_output functions, the variable length in __ip_append_data just
    include the length of application's payload and udp header, don't include
    the length of ip header, but in ip_finish_output use
    (skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
    length of ip header.

    That causes some particular application's udp payload whose length is
    between (MTU - IP Header) and MTU were fragmented by ip_fragment even
    though the rst->dev support UFO feature.

    Add the length of ip header to length in __ip_append_data to keep
    consistent conditional judgement as ip_finish_output for ip fragment.

    Signed-off-by: Zheng Li <james.z.li@ericsson.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    zheng li
     

18 Dec, 2016

14 commits

  • Pull networking fixes and cleanups from David Miller:

    1) Revert bogus nla_ok() change, from Alexey Dobriyan.

    2) Various bpf validator fixes from Daniel Borkmann.

    3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
    drivers, from Dongpo Li.

    4) Several ethtool ksettings conversions from Philippe Reynes.

    5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.

    6) XDP support for virtio_net, from John Fastabend.

    7) Fix NAT handling within a vrf, from David Ahern.

    8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
    net: mv643xx_eth: fix build failure
    isdn: Constify some function parameters
    mlxsw: spectrum: Mark split ports as such
    cgroup: Fix CGROUP_BPF config
    qed: fix old-style function definition
    net: ipv6: check route protocol when deleting routes
    r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
    irda: w83977af_ir: cleanup an indent issue
    net: sfc: use new api ethtool_{get|set}_link_ksettings
    net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
    net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
    bpf: fix mark_reg_unknown_value for spilled regs on map value marking
    bpf: fix overflow in prog accounting
    bpf: dynamically allocate digest scratch buffer
    gtp: Fix initialization of Flags octet in GTPv1 header
    gtp: gtp_check_src_ms_ipv4() always return success
    net/x25: use designated initializers
    isdn: use designated initializers
    ...

    Linus Torvalds
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    Three fixes:
    * avoid a WARN_ON() when trying to use WEP with AP_VLANs
    * ensure enough headroom on mesh forwarding packets
    * don't report unknown/invalid rates to userspace
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • The protocol field is checked when deleting IPv4 routes, but ignored for
    IPv6, which causes problems with routing daemons accidentally deleting
    externally set routes (observed by multiple bird6 users).

    This can be verified using `ip -6 route del proto something`.

    Signed-off-by: Mantas Mikulėnas
    Signed-off-by: David S. Miller

    Mantas M
     
  • Prepare to mark sensitive kernel structures for randomization by making
    sure they're using designated initializers. These were identified during
    allyesconfig builds of x86, arm, and arm64, with most initializer fixes
    extracted from grsecurity.

    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     
  • Prepare to mark sensitive kernel structures for randomization by making
    sure they're using designated initializers. These were identified during
    allyesconfig builds of x86, arm, and arm64, with most initializer fixes
    extracted from grsecurity.

    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     
  • Prepare to mark sensitive kernel structures for randomization by making
    sure they're using designated initializers. These were identified during
    allyesconfig builds of x86, arm, and arm64, with most initializer fixes
    extracted from grsecurity.

    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     
  • This adds a warning for drivers to use when encountering an invalid
    buffer for XDP. For normal cases this should not happen but to catch
    this in virtual/qemu setups that I may not have expected from the
    emulation layer having a standard warning is useful.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
    when it failed to find a transport by sctp_addrs_lookup_transport.

    This patch is to fix it by moving up rcu_read_unlock right before checking
    transport and also to remove the out path.

    Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Since commit 7fda702f9315 ("sctp: use new rhlist interface on sctp transport
    rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but
    rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast.

    It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport.
    sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(),
    as __sctp_lookup_association is called in rx path or sctp_lookup_association
    which are in the protection of rcu_read_lock() already.

    But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it
    doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check
    usage' in __rhashtable_lookup.

    This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc
    before calling sctp_epaddr_lookup_transport.

    Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable")
    Reported-by: Dmitry Vyukov
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Since the struct miscdevice have many members, it is dangerous to init
    it without members name relying only on member order.

    This patch add member name to the init declaration.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • The IRNET_MAJOR define is not used, so this patch remove it.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • This patch move the define for IRNET_MINOR to include/linux/miscdevice.h
    It is better that all minor number definitions are in the same place.

    Signed-off-by: Corentin Labbe
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • The only use of miscdevice is irda_ppp so no need to include
    linux/miscdevice.h for all irda files.
    This patch move the linux/miscdevice.h include to irnet_ppp.h

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • irproc.c does not use any miscdevice so this patch remove this
    unnecessary inclusion.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin