09 Oct, 2013

1 commit


07 Oct, 2013

1 commit


04 Oct, 2013

1 commit

  • Change all __wait_event*() implementations to match the corresponding
    wait_event*() signature for convenience.

    In particular this does away with the weird 'ret' logic. Since there
    are __wait_event*() users this requires we update them too.

    Reviewed-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20131002092529.042563462@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

02 Oct, 2013

9 commits

  • Pull networking changes from David Miller:

    1) Multiply in netfilter IPVS can overflow when calculating destination
    weight. From Simon Kirby.

    2) Use after free fixes in IPVS from Julian Anastasov.

    3) SFC driver bug fixes from Daniel Pieczko.

    4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.

    5) Locking and encapsulation fixes to serial line CAN driver, from
    Andrew Naujoks.

    6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
    Eilon Greenstein, and Ariel Elior.

    7) In lapb, if no other packets are outstanding, T1 timeouts actually
    stall things and no packet gets sent. Fix from Josselin Costanzi.

    8) ICMP redirects should not make it to the socket error queues, from
    Duan Jiong.

    9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.

    10) Fix setting of VLAN priority field on via-rhine driver, from Roget
    Luethi.

    11) Fix TX stalls and VLAN promisc programming in be2net driver from
    Ajit Khaparde.

    12) Packet padding doesn't get handled correctly in new usbnet SG
    support code, from Ming Lei.

    13) Fix races in netdevice teardown wrt. network namespace closing.
    From Eric W. Biederman.

    14) Fix potential missed initialization of net_secret if not TCP
    connections are openned. From Eric Dumazet.

    15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
    Aleksander Morgado.

    16) skb_cow_head() can change skb->data and thus packet header pointers,
    don't use stale ip_hdr reference in ip_tunnel code.

    17) Backend state transition handling fixes in xen-netback, from Paul
    Durrant.

    18) Packet offset for AH protocol is handled wrong in flow dissector,
    from Eric Dumazet.

    19) Taking down an fq packet scheduler instance can leave stale packets
    in the queues, fix from Eric Dumazet.

    20) Fix performance regressions introduced by TCP Small Queues. From
    Eric Dumazet.

    21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
    Hannes Frederic Sowa.

    22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
    reference to the ipv4/ipv6 specific network device state, so use the
    reference put that will check and release the object if the
    reference hits zero. From Salam Noureddine.

    23) Fix memory corruption in ip_tunnel driver, and use skb_push()
    instead of __skb_push() so that similar bugs are less hard to find.
    From Steffen Klassert.

    24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
    Nicolas Dichtel.

    25) fq scheduler doesn't accurately rate limit in certain circumstances,
    from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
    pkt_sched: fq: rate limiting improvements
    ip6tnl: allow to use rtnl ops on fb tunnel
    sit: allow to use rtnl ops on fb tunnel
    ip_tunnel: Remove double unregister of the fallback device
    ip_tunnel_core: Change __skb_push back to skb_push
    ip_tunnel: Add fallback tunnels to the hash lists
    ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
    qlcnic: Fix SR-IOV configuration
    ll_temac: Reset dma descriptors indexes on ndo_open
    skbuff: size of hole is wrong in a comment
    ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
    ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
    ethernet: moxa: fix incorrect placement of __initdata tag
    ipv6: gre: correct calculation of max_headroom
    powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
    Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
    bonding: Fix broken promiscuity reference counting issue
    tcp: TSQ can use a dynamic limit
    dm9601: fix IFF_ALLMULTI handling
    pkt_sched: fq: qdisc dismantle fixes
    ...

    Linus Torvalds
     
  • FQ rate limiting suffers from two problems, reported
    by Steinar :

    1) FQ enforces a delay when flow quantum is exhausted in order
    to reduce cpu overhead. But if packets are small, current
    delay computation is slightly wrong, and observed rates can
    be too high.

    Steinar had this problem because he disabled TSO and GSO,
    and default FQ quantum is 2*1514.

    (Of course, I wish recent TSO auto sizing changes will help
    to not having to disable TSO in the first place)

    2) maxrate was not used for forwarded flows (skbs not attached
    to a socket)

    Tested:

    tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
    netperf -H lpq84 -l 1000 &
    sleep 10 ; tc -s qdisc show dev eth0
    qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
    quantum 3028 initial_quantum 15140 maxrate 8000Kbit
    Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
    rate 7831Kbit 653pps backlog 7570b 5p requeues 0
    44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
    0 gc, 0 highprio, 5545 throttled

    lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
    09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457
    09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457
    09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384
    09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457
    09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457
    09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384
    09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457
    09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457
    09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384
    09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457
    09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457
    09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384

    Reported-by: Steinar H. Gunderson
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • rtnl ops where introduced by c075b13098b3 ("ip6tnl: advertise tunnel param via
    rtnl"), but I forget to assign rtnl ops to fb tunnels.

    Now that it is done, we must remove the explicit call to
    unregister_netdevice_queue(), because the fallback tunnel is added to the queue
    in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
    is valid since commit 0bd8762824e7 ("ip6tnl: add x-netns support")).

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • rtnl ops where introduced by ba3e3f50a0e5 ("sit: advertise tunnel param via
    rtnl"), but I forget to assign rtnl ops to fb tunnels.

    Now that it is done, we must remove the explicit call to
    unregister_netdevice_queue(), because the fallback tunnel is added to the queue
    in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
    is valid since commit 5e6700b3bf98 ("sit: add support of x-netns")).

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • When queueing the netdevices for removal, we queue the
    fallback device twice in ip_tunnel_destroy(). The first
    time when we queue all netdevices in the namespace and
    then again explicitly. Fix this by removing the explicit
    queueing of the fallback device.

    Bug was introduced when network namespace support was added
    with commit 6c742e714d8 ("ipip: add x-netns support").

    Cc: Nicolas Dichtel
    Signed-off-by: Steffen Klassert
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
    moved the IP header installation to iptunnel_xmit() and
    changed skb_push() to __skb_push(). This makes possible
    bugs hard to track down, so change it back to skb_push().

    Cc: Pravin Shelar
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Currently we can not update the tunnel parameters of
    the fallback tunnels because we don't find them in the
    hash lists. Fix this by adding them on initialization.

    Bug was introduced with commit c544193214
    ("GRE: Refactor GRE tunneling code.")

    Cc: Pravin Shelar
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • We might extend the used aera of a skb beyond the total
    headroom when we install the ipip header. Fix this by
    calling skb_cow_head() unconditionally.

    Bug was introduced with commit c544193214
    ("GRE: Refactor GRE tunneling code.")

    Cc: Pravin Shelar
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains Netfilter/IPVS fixes for your net
    tree, they are:

    * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
    Patrick McHardy.

    * Fix possible weight overflow in lblc and lblcr schedulers due to
    32-bits arithmetics, from Simon Kirby.

    * Fix possible memory access race in the lblc and lblcr schedulers,
    introduced when it was converted to use RCU, two patches from
    Julian Anastasov.

    * Fix hard dependency on CPU 0 when reading per-cpu stats in the
    rate estimator, from Julian Anastasov.

    * Fix race that may lead to object use after release, when invoking
    ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
    Anastasov.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Oct, 2013

8 commits

  • It is possible for the timer handlers to run after the call to
    ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
    handler function in order to do proper cleanup when the refcnt
    reaches 0. Otherwise, the refcnt can reach zero without the
    inet6_dev being destroyed and we end up leaking a reference to
    the net_device and see messages like the following,

    unregister_netdevice: waiting for eth0 to become free. Usage count = 1

    Tested on linux-3.4.43.

    Signed-off-by: Salam Noureddine
    Signed-off-by: David S. Miller

    Salam Noureddine
     
  • It is possible for the timer handlers to run after the call to
    ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
    function in order to do proper cleanup when the refcnt reaches 0.
    Otherwise, the refcnt can reach zero without the in_device being
    destroyed and we end up leaking a reference to the net_device and
    see messages like the following,

    unregister_netdevice: waiting for eth0 to become free. Usage count = 1

    Tested on linux-3.4.43.

    Signed-off-by: Salam Noureddine
    Signed-off-by: David S. Miller

    Salam Noureddine
     
  • gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
    so initialize max_headroom to zero. Otherwise the

    if (encap_limit >= 0) {
    max_headroom += 8;
    mtu -= 8;
    }

    increments an uninitialized variable before max_headroom was reset.

    Found with coverity: 728539

    Cc: Dmitry Kozlov
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • When TCP Small Queues was added, we used a sysctl to limit amount of
    packets queues on Qdisc/device queues for a given TCP flow.

    Problem is this limit is either too big for low rates, or too small
    for high rates.

    Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
    auto sizing, it can better control number of packets in Qdisc/device
    queues.

    New limit is two packets or at least 1 to 2 ms worth of packets.

    Low rates flows benefit from this patch by having even smaller
    number of packets in queues, allowing for faster recovery,
    better RTT estimations.

    High rates flows benefit from this patch by allowing more than 2 packets
    in flight as we had reports this was a limiting factor to reach line
    rate. [ In particular if TX completion is delayed because of coalescing
    parameters ]

    Example for a single flow on 10Gbp link controlled by FQ/pacing

    14 packets in flight instead of 2

    $ tc -s -d qd
    qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
    buckets 1024 quantum 3028 initial_quantum 15140
    Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
    requeues 6822476)
    rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
    2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
    2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

    Note that sk_pacing_rate is currently set to twice the actual rate, but
    this might be refined in the future when a flow is in congestion
    avoidance.

    Additional change : skb->destructor should be set to tcp_wfree().

    A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

    Signed-off-by: Eric Dumazet
    Cc: Wei Liu
    Cc: Cong Wang
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • fq_reset() should drops all packets in queue, including
    throttled flows.

    This patch moves code from fq_destroy() to fq_reset()
    to do the cleaning.

    fq_change() must stop calling fq_dequeue() if all remaining
    packets are from throttled flows.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for
    later usage"), we missed that existing code was using nhoff as a
    temporary variable that could not always contain transport header
    offset.

    This is not a problem for TCP/UDP because port offset (@poff)
    is 0 for these protocols.

    Signed-off-by: Eric Dumazet
    Cc: Daniel Borkmann
    Cc: Nikolay Aleksandrov
    Acked-by: Nikolay Aleksandrov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Consider the scenario where an IPv6 router is advertising a fixed
    preferred_lft of 1800 seconds, while the valid_lft begins at 3600
    seconds and counts down in realtime.

    A client should reset its preferred_lft to 1800 every time the RA is
    received, but a bug is causing Linux to ignore the update.

    The core problem is here:
    if (prefered_lft != ifp->prefered_lft) {

    Note that ifp->prefered_lft is an offset, so it doesn't decrease over
    time. Thus, the comparison is always (1800 != 1800), which fails to
    trigger an update.

    The most direct solution would be to compute a "stored_prefered_lft",
    and use that value in the comparison. But I think that trying to filter
    out unnecessary updates here is a premature optimization. In order for
    the filter to apply, both of these would need to hold:

    - The advertised valid_lft and preferred_lft are both declining in
    real time.
    - No clock skew exists between the router & client.

    So in this patch, I've set "update_lft = 1" unconditionally, which
    allows the surrounding code to be greatly simplified.

    Signed-off-by: Paul Marks
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paul Marks
     
  • While sending packet skb_cow_head() can change skb header which
    invalidates inner_iph pointer to skb header. Following patch
    avoid using it. Found by code inspection.

    This bug was introduced by commit 0e6fbc5b6c6218 (ip_tunnels: extend
    iptunnel_xmit()).

    Signed-off-by: Pravin B Shelar
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

30 Sep, 2013

1 commit

  • TCP packets hitting the SYN proxy through the SYNPROXY target are not
    validated by TCP conntrack. When th->doff is below 5, an underflow happens
    when calculating the options length, causing skb_header_pointer() to
    return NULL and triggering the BUG_ON().

    Handle this case gracefully by checking for NULL instead of using BUG_ON().

    Reported-by: Martin Topholm
    Tested-by: Martin Topholm
    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

29 Sep, 2013

3 commits

  • A host might need net_secret[] and never open a single socket.

    Problem added in commit aebda156a570782
    ("net: defer net_secret[] initialization")

    Based on prior patch from Hannes Frederic Sowa.

    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There is currently serialization network namespaces exiting and
    network devices exiting as the final part of netdev_run_todo does not
    happen under the rtnl_lock. This is compounded by the fact that the
    only list of devices unregistering in netdev_run_todo is local to the
    netdev_run_todo.

    This lack of serialization in extreme cases results in network devices
    unregistering in netdev_run_todo after the loopback device of their
    network namespace has been freed (making dst_ifdown unsafe), and after
    the their network namespace has exited (making the NETDEV_UNREGISTER,
    and NETDEV_UNREGISTER_FINAL callbacks unsafe).

    Add the missing serialization by a per network namespace count of how
    many network devices are unregistering and having a wait queue that is
    woken up whenever the count is decreased. The count and wait queue
    allow default_device_exit_batch to wait until all of the unregistration
    activity for a network namespace has finished before proceeding to
    unregister the loopback device and then allowing the network namespace
    to exit.

    Only a single global wait queue is used because there is a single global
    lock, and there is a single waiter, per network namespace wait queues
    would be a waste of resources.

    The per network namespace count of unregistering devices gives a
    progress guarantee because the number of network devices unregistering
    in an exiting network namespace must ultimately drop to zero (assuming
    network device unregistration completes).

    The basic logic remains the same as in v1. This patch is now half
    comment and half rtnl_lock_unregistering an expanded version of
    wait_event performs no extra work in the common case where no network
    devices are unregistering when we get to default_device_exit_batch.

    Reported-by: Francesco Ruggeri
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • When a router is doing DNAT for 6to4/6rd packets the latest
    anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
    6to4 and 6rd") will drop them because the IPv6 address embedded does
    not match the IPv4 destination. This patch will allow them to pass by
    testing if we have an address that matches on 6to4/6rd interface. I
    have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
    Also, log the dropped packets (with rate limit).

    Signed-off-by: Catalin(ux) M. BOIE
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Catalin\(ux\) M. BOIE
     

28 Sep, 2013

1 commit


27 Sep, 2013

1 commit


24 Sep, 2013

5 commits

  • In the following scenario the socket is corked:
    If the first UDP packet is larger then the mtu we try to append it to the
    write queue via ip6_ufo_append_data. A following packet, which is smaller
    than the mtu would be appended to the already queued up gso-skb via
    plain ip6_append_data. This causes random memory corruptions.

    In ip6_ufo_append_data we also have to be careful to not queue up the
    same skb multiple times. So setup the gso frame only when no first skb
    is available.

    This also fixes a shortcoming where we add the current packet's length to
    cork->length but return early because of a packet > mtu with dontfrag set
    (instead of sutracting it again).

    Found with trinity.

    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Redirect isn't an error condition, it should leave
    the error handler without touching the socket.

    Signed-off-by: Duan Jiong
    Signed-off-by: David S. Miller

    Duan Jiong
     
  • Redirect isn't an error condition, it should leave
    the error handler without touching the socket.

    Signed-off-by: Duan Jiong
    Signed-off-by: David S. Miller

    Duan Jiong
     
  • MRP doesn't implement the periodictimer in 802.1Q, so it never retries
    if packets get lost. I ran into this problem when MRP sent a MVRP
    JoinIn before the interface was fully up. The JoinIn was lost, MRP
    didn't retry, and MVRP registration failed.

    Tested against Juniper QFabric switches

    Signed-off-by: Noel Burton-Krahn
    Acked-by: David Ward
    Signed-off-by: David S. Miller

    Noel Burton-Krahn
     
  • Actually re-send packets when the T1 timer runs out. This fixes a bug
    where packets are waiting on the write queue until disconnection when
    no other traffic is outstanding.

    Signed-off-by: Josselin Costanzi
    Signed-off-by: Maxime Jayat
    Signed-off-by: David S. Miller

    josselin.costanzi@mobile-devices.fr
     

22 Sep, 2013

1 commit


21 Sep, 2013

1 commit

  • When the dlc is closed, rfcomm_dev_state_change() tries to release the
    port in the case it cannot get a reference to the tty. However this is
    racy and not even needed.

    Infact as Peter Hurley points out:

    1. Only consider dlcs that are 'stolen' from a connected socket, ie.
    reused. Allocated dlcs cannot have been closed prior to port
    activate and so for these dlcs a tty reference will always be avail
    in rfcomm_dev_state_change() -- except for the conditions covered by
    #2b below.
    2. If a tty was at some point previously created for this rfcomm, then
    either
    (a) the tty reference is still avail, so rfcomm_dev_state_change()
    will perform a hangup. So nothing to do, or,
    (b) the tty reference is no longer avail, and the tty_port will be
    destroyed by the last tty_port_put() in rfcomm_tty_cleanup.
    Again, no action required.
    3. Prior to obtaining the dlc lock in rfcomm_dev_add(),
    rfcomm_dev_state_change() will not 'see' a rfcomm_dev so nothing to
    do here.
    4. After releasing the dlc lock in rfcomm_dev_add(),
    rfcomm_dev_state_change() will 'see' an incomplete rfcomm_dev if a
    tty reference could not be obtained. Again, the best thing to do here
    is nothing. Any future attempted open() will block on
    rfcomm_dev_carrier_raised(). The unconnected device will exist until
    released by ioctl(RFCOMMRELEASEDEV).

    The patch removes the aforementioned code and uses the
    tty_port_tty_hangup() helper to hangup the tty.

    Signed-off-by: Gianluca Anzolin
    Reviewed-by: Peter Hurley
    Signed-off-by: Gustavo Padovan

    Gianluca Anzolin
     

20 Sep, 2013

5 commits

  • Pull networking fixes from David Miller:

    1) If the local_df boolean is set on an SKB we have to allocate a
    unique ID even if IP_DF is set in the ipv4 headers, from Ansis
    Atteka.

    2) Some fixups for the new chipset support that went into the sfc
    driver, from Ben Hutchings.

    3) Because SCTP bypasses a good chunk of, and actually duplicates, the
    logic of the ipv6 output path, some IPSEC things don't get done
    properly. Integrate SCTP better into the ipv6 output path so that
    these problems are fixed and such issues don't get missed in the
    future either. From Daniel Borkmann.

    4) Fix skge regressions added by the DMA mapping error return checking
    added in v3.10, from Mikulas Patocka.

    5) Kill some more IRQF_DISABLED references, from Michael Opdenacker.

    6) Fix races and deadlocks in the bridging code, from Hong Zhiguo.

    7) Fix error handling in tun_set_iff(), in particular don't leak
    resources. From Jason Wang.

    8) Prevent format-string injection into xen-netback driver, from Kees
    Cook.

    9) Fix regression added to netpoll ARP packet handling, in particular
    check for the right ETH_P_ARP protocol code. From Sonic Zhang.

    10) Try to deal with AMD IOMMU errors when using r8169 chips, from
    Francois Romieu.

    11) Cure freezes due to recent changes in the rt2x00 wireless driver,
    from Stanislaw Gruszka.

    12) Don't do SPI transfers (which can sleep) in interrupt context in
    cw1200 driver, from Solomon Peachy.

    13) Fix LEDs handling bug in 5720 tg3 chips already handled for 5719.
    From Nithin Sujir.

    14) Make xen_netbk_count_skb_slots() count the actual number of slots
    that will be used, taking into consideration packing and other
    issues that the transmit path will run into. From David Vrabel.

    15) Use the correct maximum age when calculating the bridge
    message_age_timer, from Chris Healy.

    16) Get rid of memory leaks in mcs7780 IRDA driver, from Alexey
    Khoroshilov.

    17) Netfilter conntrack extensions were converted to RCU but are not
    always freed properly using kfree_rcu(). Fix from Michal Kubecek.

    18) VF reset recovery not being done correctly in qlcnic driver, from
    Manish Chopra.

    19) Fix inverted test in ATM nicstar driver, from Andy Shevchenko.

    20) Missing workqueue destroy in cxgb4 error handling, from Wei Yang.

    21) Internal switch not initialized properly in bgmac driver, from Rafał
    Miłecki.

    22) Netlink messages report wrong local and remote addresses in IPv6
    tunneling, from Ding Zhi.

    23) ICMP redirects should not generate socket errors in DCCP and SCTP.
    We're still working out how this should be handled for RAW and UDP
    sockets. From Daniel Borkmann and Duan Jiong.

    24) We've had several bugs wherein the network namespace's loopback
    device gets accessed after it is free'd, NULL it out so that we can
    catch these problems more readily. From Eric W Biederman.

    25) Fix regression in TCP RTO calculations, from Neal Cardwell.

    26) Fix too early free of xen-netback network device when VIFs still
    exist. From Paul Durrant.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
    netconsole: fix a deadlock with rtnl and netconsole's mutex
    netpoll: fix NULL pointer dereference in netpoll_cleanup
    skge: fix broken driver
    ip: generate unique IP identificator if local fragmentation is allowed
    ip: use ip_hdr() in __ip_make_skb() to retrieve IP header
    xen-netback: Don't destroy the netdev until the vif is shut down
    net:dccp: do not report ICMP redirects to user space
    cnic: Fix crash in cnic_bnx2x_service_kcq()
    bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions.
    vxlan: Avoid creating fdb entry with NULL destination
    tcp: fix RTO calculated from cached RTT
    drivers: net: phy: cicada.c: clears warning Use #include instead of
    net loopback: Set loopback_dev to NULL when freed
    batman-adv: set the TAG flag for the vid passed to BLA
    netfilter: nfnetlink_queue: use network skb for sequence adjustment
    net: sctp: rfc4443: do not report ICMP redirects to user space
    net: usb: cdc_ether: use usb.h macros whenever possible
    net: usb: cdc_ether: fix checkpatch errors and warnings
    net: usb: cdc_ether: Use wwan interface for Telit modules
    ip6_tunnels: raddr and laddr are inverted in nl msg
    ...

    Linus Torvalds
     
  • I've been hitting a NULL ptr deref while using netconsole because the
    np->dev check and the pointer manipulation in netpoll_cleanup are done
    without rtnl and the following sequence happens when having a netconsole
    over a vlan and we remove the vlan while disabling the netconsole:
    CPU 1 CPU2
    removes vlan and calls the notifier
    enters store_enabled(), calls
    netdev_cleanup which checks np->dev
    and then waits for rtnl
    executes the netconsole netdev
    release notifier making np->dev
    == NULL and releases rtnl
    continues to dereference a member of
    np->dev which at this point is == NULL

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If local fragmentation is allowed, then ip_select_ident() and
    ip_select_ident_more() need to generate unique IDs to ensure
    correct defragmentation on the peer.

    For example, if IPsec (tunnel mode) has to encrypt large skbs
    that have local_df bit set, then all IP fragments that belonged
    to different ESP datagrams would have used the same identificator.
    If one of these IP fragments would get lost or reordered, then
    peer could possibly stitch together wrong IP fragments that did
    not belong to the same datagram. This would lead to a packet loss
    or data corruption.

    Signed-off-by: Ansis Atteka
    Signed-off-by: David S. Miller

    Ansis Atteka
     
  • skb->data already points to IP header, but for the sake of
    consistency we can also use ip_hdr() to retrieve it.

    Signed-off-by: Ansis Atteka
    Signed-off-by: David S. Miller

    Ansis Atteka
     
  • Pull ceph fixes from Sage Weil:
    "These fix several bugs with RBD from 3.11 that didn't get tested in
    time for the merge window: some error handling, a use-after-free, and
    a sequencing issue when unmapping and image races with a notify
    operation.

    There is also a patch fixing a problem with the new ceph + fscache
    code that just went in"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    fscache: check consistency does not decrement refcount
    rbd: fix error handling from rbd_snap_name()
    rbd: ignore unmapped snapshots that no longer exist
    rbd: fix use-after free of rbd_dev->disk
    rbd: make rbd_obj_notify_ack() synchronous
    rbd: complete notifies before cleaning up osd_client and rbd_dev
    libceph: add function to ensure notifies are complete

    Linus Torvalds
     

19 Sep, 2013

2 commits

  • When reading percpu stats we need to properly reset
    the sum when CPU 0 is not present in the possible mask.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • commit c5549571f975ab ("ipvs: convert lblcr scheduler to rcu")
    allows RCU readers to use dest after calling ip_vs_dest_put().
    In the corner case it can race with ip_vs_dest_trash_expire()
    which can release the dest while it is being returned to the
    RCU readers as scheduling result.

    To fix the problem do not allow e->dest to be replaced and
    defer the ip_vs_dest_put() call by using RCU callback. Now
    e->dest does not need to be RCU pointer.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov