09 Feb, 2013

3 commits

  • Pull networking fixes from David Miller:

    1) Revert iwlwifi reclaimed packet tracking, it causes problems for a
    bunch of folks. From Emmanuel Grumbach.

    2) Work limiting code in brcmsmac wifi driver can clear tx status
    without processing the event. From Arend van Spriel.

    3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger.

    4) l2tp tunnel delete can race with close, fix from Tom Parkin.

    5) pktgen_add_device() failures are not checked at all, fix from Cong
    Wang.

    6) Fix unintentional removal of carrier off from tun_detach(),
    otherwise we confuse userspace, from Michael S. Tsirkin.

    7) Don't leak socket reference counts and ubufs in vhost-net driver,
    from Jason Wang.

    8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil
    Horman.

    9) Protect against USB networking devices which spam the host with 0
    length frames, from Bjørn Mork.

    10) Prevent neighbour overflows in ipv6 for locally destined routes,
    from Marcelo Ricardo. This is the best short-term fix for this, a
    longer term fix has been implemented in net-next.

    11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops. This
    mistake is largely because the ipv6 functions don't even have some
    kind of prefix in their names to suggest they are ipv6 specific.
    From Tom Parkin.

    12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from
    Yuchung Cheng.

    13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from
    Francois Romieu and your's truly.

    14) Fix infinite loops and divides by zero in TCP congestion window
    handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen.

    15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix
    from Phil Sutter.

    16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala.

    17) Protect XEN netback driver against hostile frontend putting garbage
    into the rings, don't leak pages in TX GOP checking, and add proper
    resource releasing in error path of xen_netbk_get_requests(). From
    Ian Campbell.

    18) SCTP authentication keys should be cleared out and released with
    kzfree(), from Daniel Borkmann.

    19) L2TP is a bit too clever trying to maintain skb->truesize, and ends
    up corrupting socket memory accounting to the point where packet
    sending is halted indefinitely. Just remove the adjustments
    entirely, they aren't really needed. From Eric Dumazet.

    20) ATM Iphase driver uses a data type with the same name as the S390
    headers, rename to fix the build. From Heiko Carstens.

    21) Fix a typo in copying the inner network header offset from one SKB
    to another, from Pravin B Shelar.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
    net: sctp: sctp_endpoint_free: zero out secret key data
    net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
    atm/iphase: rename fregt_t -> ffreg_t
    net: usb: fix regression from FLAG_NOARP code
    l2tp: dont play with skb->truesize
    net: sctp: sctp_auth_key_put: use kzfree instead of kfree
    netback: correct netbk_tx_err to handle wrap around.
    xen/netback: free already allocated memory on failure in xen_netbk_get_requests
    xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
    xen/netback: shutdown the ring if it contains garbage.
    net: qmi_wwan: add more Huawei devices, including E320
    net: cdc_ncm: add another Huawei vendor specific device
    ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
    tcp: fix for zero packets_in_flight was too broad
    brcmsmac: rework of mac80211 .flush() callback operation
    ssb: unregister gpios before unloading ssb
    bcma: unregister gpios before unloading bcma
    rtlwifi: Fix scheduling while atomic bug
    net: usbnet: fix tx_dropped statistics
    tcp: ipv6: Update MIB counters for drops
    ...

    Linus Torvalds
     
  • On sctp_endpoint_destroy, previously used sensitive keying material
    should be zeroed out before the memory is returned, as we already do
    with e.g. auth keys when released.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • In sctp_setsockopt_auth_key, we create a temporary copy of the user
    passed shared auth key for the endpoint or association and after
    internal setup, we free it right away. Since it's sensitive data, we
    should zero out the key before returning the memory back to the
    allocator. Thus, use kzfree instead of kfree, just as we do in
    sctp_auth_key_put().

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

08 Feb, 2013

3 commits

  • Andrew Savchenko reported a DNS failure and we diagnosed that
    some UDP sockets were unable to send more packets because their
    sk_wmem_alloc was corrupted after a while (tx_queue column in
    following trace)

    $ cat /proc/net/udp
    sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
    ...
    459: 00000000:0270 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4507 2 ffff88003d612380 0
    466: 00000000:0277 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4802 2 ffff88003d613180 0
    470: 076A070A:007B 00000000:0000 07 FFFF4600:00000000 00:00000000 00000000 123 0 5552 2 ffff880039974380 0
    470: 010213AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4986 2 ffff88003dbd3180 0
    470: 010013AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4985 2 ffff88003dbd2e00 0
    470: 00FCA8C0:007B 00000000:0000 07 FFFFFB00:00000000 00:00000000 00000000 0 0 4984 2 ffff88003dbd2a80 0
    ...

    Playing with skb->truesize is tricky, especially when
    skb is attached to a socket, as we can fool memory charging.

    Just remove this code, its not worth trying to be ultra
    precise in xmit path.

    Reported-by: Andrew Savchenko
    Tested-by: Andrew Savchenko
    Signed-off-by: Eric Dumazet
    Cc: James Chapman
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For sensitive data like keying material, it is common practice to zero
    out keys before returning the memory back to the allocator. Thus, use
    kzfree instead of kfree.

    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • …vswitch into openvswitch

    Jesse Gross says:

    ====================
    One bug fix for net/3.8 for a long standing problem that was reported a few
    times recently.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

07 Feb, 2013

3 commits


05 Feb, 2013

3 commits


04 Feb, 2013

4 commits

  • When releasing a packet socket, the routine packet_set_ring() is reused
    to free rings instead of allocating them. But when calling it for the
    first time, it fills req->tp_block_nr with the value of rb->pg_vec_len
    which in the second invocation makes it bail out since req->tp_block_nr
    is greater zero but req->tp_block_size is zero.

    This patch solves the problem by passing a zeroed auto-variable to
    packet_set_ring() upon each invocation from packet_release().

    As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING
    and packet mmap), i.e. the original inclusion of TX ring support into
    af_packet, but applies only to sockets with both RX and TX ring
    allocated, which is probably why this was unnoticed all the time.

    Signed-off-by: Phil Sutter
    Cc: Johann Baudy
    Cc: Daniel Borkmann
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Phil Sutter
     
  • Use correct inner offset to set inner_network_offset.
    Found by inspection.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start())
    uncovered a bug in FRTO code :
    tcp_process_frto() is setting snd_cwnd to 0 if the number
    of in flight packets is 0.

    As Neal pointed out, if no packet is in flight we lost our
    chance to disambiguate whether a loss timeout was spurious.

    We should assume it was a proper loss.

    Reported-by: Pasi Kärkkäinen
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Cc: Ilpo Järvinen
    Cc: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Since commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start()),
    a nul snd_cwnd triggers an infinite loop in tcp_slow_start()

    Avoid this infinite loop and log a one time error for further
    analysis. FRTO code is suspected to cause this bug.

    Reported-by: Pasi Kärkkäinen
    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Feb, 2013

1 commit


01 Feb, 2013

7 commits

  • Pull NFS client bugfixes from Trond Myklebust:

    - Error reporting in nfs_xdev_mount incorrectly maps all errors to
    ENOMEM

    - Fix an NFSv4 refcounting issue

    - Fix a mount failure when the server reboots during NFSv4 trunking
    discovery

    - NFSv4.1 mounts may need to run the lease recovery thread.

    - Don't silently fail setattr() requests on mountpoints

    - Fix a SUNRPC socket/transport livelock and priority queue issue

    - We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.

    * tag 'nfs-for-3.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
    SUNRPC: When changing the queue priority, ensure that we change the owner
    NFS: Don't silently fail setattr() requests on mountpoints
    NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
    NFSv4: Fix NFSv4 trunking discovery
    NFSv4: Fix NFSv4 reference counting for trunked sessions
    NFS: Fix error reporting in nfs_xdev_mount

    Linus Torvalds
     
  • On receiving the SYN-ACK, Fast Open checks icsk_retransmit for SYN
    retransmission to detect SYN/data drops. But if F-RTO is disabled,
    icsk_retransmit is reset at step D of tcp_fastretrans_alert() (
    under tcp_ack()) before tcp_rcv_fastopen_synack(). The fix is to use
    total_retrans instead which accounts for SYN retransmission regardless
    the use of F-RTO.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • l2tp_ip6 is incorrectly using the IPv4-specific ip_cmsg_recv to handle
    ancillary data. This means that socket options such as IPV6_RECVPKTINFO are
    not honoured in userspace.

    Convert l2tp_ip6 to use the IPv6-specific handler.

    Ref: net/ipv6/udp.c

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: Chris Elston
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • ip6_datagram_recv_ctl and ip6_datagram_send_ctl are used for handling IPv6
    ancillary data. Since ip6_datagram_send_ctl is already publicly exported for
    use in modules, ip6_datagram_recv_ctl should also be available to support
    ancillary data in the receive path.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific. Since
    datagram_send_ctl is publicly exported it should be appropriately named to
    reflect the fact that it's for IPv6 only.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • If occurs a LE or SCO hci_conn timeout and the connection is already
    established (BT_CONNECTED state), the connection is not terminated as
    expected. This bug can be reproduced using l2test or scotest tool.
    Once the connection is established, kill l2test/scotest and the
    connection won't be terminated.

    This patch fixes hci_conn_disconnect helper so it is able to
    terminate LE and SCO connections, as well as ACL.

    Signed-off-by: Andre Guedes
    Signed-off-by: Gustavo Padovan

    Andre Guedes
     
  • The conn->smp_chan pointer can be NULL if SMP PDUs arrive at unexpected
    moments. To avoid NULL pointer dereferences the code should be checking
    for this and disconnect if an unexpected SMP PDU arrives. This patch
    fixes the issue by adding a check for conn->smp_chan for all other PDUs
    except pairing request and security request (which are are the first
    PDUs to come to initialize the SMP context).

    Signed-off-by: Johan Hedberg
    CC: stable@vger.kernel.org
    Acked-by: Marcel Holtmann
    Signed-off-by: Gustavo Padovan

    Johan Hedberg
     

31 Jan, 2013

2 commits

  • They will be created at output, if ever needed. This avoids creating
    empty neighbor entries when TPROXYing/Forwarding packets for addresses
    that are not even directly reachable.

    Note that IPv4 already handles it this way. No neighbor entries are
    created for local input.

    Tested by myself and customer.

    Signed-off-by: Jiri Pirko
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • This fixes a livelock in the xprt->sending queue where we end up never
    making progress on lower priority tasks because sleep_on_priority()
    keeps adding new tasks with the same owner to the head of the queue,
    and priority bumps mean that we keep resetting the queue->owner to
    whatever task is at the head of the queue.

    Regression introduced by commit c05eecf636101dd4347b2d8fa457626bf0088e0a
    (SUNRPC: Don't allow low priority tasks to pre-empt higher priority ones).

    Reported-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

30 Jan, 2013

5 commits

  • We drop a connection request if the accept backlog is full and there are
    sufficient packets in the syn queue to warrant starting drops. Increment the
    appropriate counters so this isn't silent, for accurate stats and help in
    debugging.

    This patch assumes LINUX_MIB_LISTENDROPS is a superset of/includes the
    counter LINUX_MIB_LISTENOVERFLOWS.

    Signed-off-by: Nivedita Singhvi
    Acked-by: Vijay Subramanian
    Signed-off-by: David S. Miller

    Nivedita Singhvi
     
  • The "Universal/Local" (U/L) bit must be complmented according to RFC4944
    and RFC2464.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki / 吉藤英明
     
  • The return value of pktgen_add_device() is not checked, so
    even if we fail to add some device, for example, non-exist one,
    we still see "OK:...". This patch fixes it.

    After this patch, I got:

    # echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
    -bash: echo: write error: No such device
    # cat /proc/net/pktgen/kpktgend_0
    Running:
    Stopped:
    Result: ERROR: can not add device non-exist
    # echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
    # cat /proc/net/pktgen/kpktgend_0
    Running:
    Stopped: eth0
    Result: OK: add_device=eth0

    (Candidate for -stable)

    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • The delay calculation with the rate extension introduces in v3.3 does
    not properly work, if other packets are still queued for transmission.
    For the delay calculation to work, both delay types (latency and delay
    introduces by rate limitation) have to be handled differently. The
    latency delay for a packet can overlap with the delay of other packets.
    The delay introduced by the rate however is separate, and can only
    start, once all other rate-introduced delays finished.

    Latency delay is from same distribution for each packet, rate delay
    depends on the packet size.

    .: latency delay
    -: rate delay
    x: additional delay we have to wait since another packet is currently
    transmitted

    .....---- Packet 1
    .....xx------ Packet 2
    .....------ Packet 3
    ^^^^^
    latency stacks
    ^^
    rate delay doesn't stack
    ^^
    latency stacks

    -----> time

    When a packet is enqueued, we first consider the latency delay. If other
    packets are already queued, we can reduce the latency delay until the
    last packet in the queue is send, however the latency delay cannot be

    Acked-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Johannes Naab
     
  • If a tunnel socket is created by userspace, l2tp hooks the socket destructor
    in order to clean up resources if userspace closes the socket or crashes. It
    also caches a pointer to the struct sock for use in the data path and in the
    netlink interface.

    While it is safe to use the cached sock pointer in the data path, where the
    skb references keep the socket alive, it is not safe to use it elsewhere as
    such access introduces a race with userspace closing the socket. In
    particular, l2tp_tunnel_delete is prone to oopsing if a multithreaded
    userspace application closes a socket at the same time as sending a netlink
    delete command for the tunnel.

    This patch fixes this oops by forcing l2tp_tunnel_delete to explicitly look up
    a tunnel socket held by userspace using sockfd_lookup().

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     

29 Jan, 2013

1 commit


28 Jan, 2013

4 commits

  • Per-net sysctl table needs to be explicitly freed at
    net exit. Otherwise we see the following with kmemleak:

    unreferenced object 0xffff880402d08000 (size 2048):
    comm "chrome_sandbox", pid 18437, jiffies 4310887172 (age 9097.630s)
    hex dump (first 32 bytes):
    b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff .h...... .......
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x21/0x3e
    [] slab_post_alloc_hook+0x28/0x2a
    [] __kmalloc_track_caller+0xf1/0x104
    [] kmemdup+0x1b/0x30
    [] sctp_sysctl_net_register+0x1f/0x72
    [] sctp_net_init+0x100/0x39f
    [] ops_init+0xc6/0xf5
    [] setup_net+0x4c/0xd0
    [] copy_net_ns+0x6d/0xd6
    [] create_new_namespaces+0xd7/0x147
    [] copy_namespaces+0x63/0x99
    [] copy_process+0xa65/0x1233
    [] do_fork+0x10b/0x271
    [] sys_clone+0x23/0x25
    [] stub_clone+0x13/0x20
    [] 0xffffffffffffffff

    I fixed the spelling of sysctl_header so the code actually
    compiles. -- EWB.

    Reported-by: Martin Mokrejs
    Signed-off-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Due to IP_GRE GSO support, GRE can recieve non linear skb which
    results in panic in case of GRE_CSUM. Following patch fixes it by
    using correct csum API.

    Bug introduced in commit 6b78f16e4bdde3936b (gre: add GSO support)

    Signed-off-by: Pravin B Shelar
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • While sctp handling a duplicate COOKIE-ECHO and the action is
    'Association restart', sctp_sf_do_dupcook_a() will processing
    the unexpected COOKIE-ECHO for peer restart, but it does not set
    the association state to SCTP_STATE_ESTABLISHED, so the association
    could stuck in SCTP_STATE_SHUTDOWN_PENDING state forever.
    This violates the sctp specification:
    RFC 4960 5.2.4. Handle a COOKIE ECHO when a TCB Exists
    Action
    A) In this case, the peer may have restarted. .....
    After this, the endpoint shall enter the ESTABLISHED state.

    To resolve this problem, adding a SCTP_CMD_NEW_STATE cmd to the
    command list before SCTP_CMD_REPLY cmd, this will set the restart
    association to SCTP_STATE_ESTABLISHED state properly and also avoid
    I-bit being set in the DATA chunk header when COOKIE_ACK is bundled
    with DATA chunks.

    Signed-off-by: Xufeng Zhang
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Xufeng Zhang
     
  • We did this for IPv4 in b49d3c1e1c "net: ipmr: limit MRT_TABLE
    identifiers" but we need to do it for IPv6 as well. On IPv6 the name
    is "pim6reg" instead of "pimreg" so there is one less digit allowed.

    The strcpy() is in ip6mr_reg_vif().

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

27 Jan, 2013

3 commits


24 Jan, 2013

1 commit