13 Jun, 2015

1 commit

  • Currently, we can ask to authenticate DATA chunks and we can send DATA
    chunks on the same packet as COOKIE_ECHO, but if you try to combine
    both, the DATA chunk will be sent unauthenticated and peer won't accept
    it, leading to a communication failure.

    This happens because even though the data was queued after it was
    requested to authenticate DATA chunks, it was also queued before we
    could know that remote peer can handle authenticating, so
    sctp_auth_send_cid() returns false.

    The fix is whenever we set up an active key, re-check send queue for
    chunks that now should be authenticated. As a result, such packet will
    now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.

    Reported-by: Liu Wei
    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

12 Jun, 2015

2 commits

  • We saw excessive direct memory compaction triggered by skb_page_frag_refill.
    This causes performance issues and add latency. Commit 5640f7685831e0
    introduces the order-3 allocation. According to the changelog, the order-3
    allocation isn't a must-have but to improve performance. But direct memory
    compaction has high overhead. The benefit of order-3 allocation can't
    compensate the overhead of direct memory compaction.

    This patch makes the order-3 page allocation atomic. If there is no memory
    pressure and memory isn't fragmented, the alloction will still success, so we
    don't sacrifice the order-3 benefit here. If the atomic allocation fails,
    direct memory compaction will not be triggered, skb_page_frag_refill will
    fallback to order-0 immediately, hence the direct memory compaction overhead is
    avoided. In the allocation failure case, kswapd is waken up and doing
    compaction, so chances are allocation could success next time.

    alloc_skb_with_frags is the same.

    The mellanox driver does similar thing, if this is accepted, we must fix
    the driver too.

    V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
    V2: make the changelog clearer

    Cc: Eric Dumazet
    Cc: Chris Mason
    Cc: Debabrata Banerjee
    Signed-off-by: Shaohua Li
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Shaohua Li
     
  • If a device is renamed and the original name is subsequently reused
    for a new device, the following warning is generated:

    sysctl duplicate entry: /net/mpls/conf/veth0//input
    CPU: 3 PID: 1379 Comm: ip Not tainted 4.1.0-rc4+ #20
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    0000000000000000 0000000000000000 ffffffff81566aaf 0000000000000000
    ffffffff81236279 ffff88002f7d7f00 0000000000000000 ffff88000db336d8
    ffff88000db33698 0000000000000005 ffff88002e046000 ffff8800168c9280
    Call Trace:
    [] ? dump_stack+0x40/0x50
    [] ? __register_sysctl_table+0x289/0x5a0
    [] ? mpls_dev_notify+0x1ff/0x300 [mpls_router]
    [] ? notifier_call_chain+0x4f/0x70
    [] ? register_netdevice+0x2b2/0x480
    [] ? veth_newlink+0x178/0x2d3 [veth]
    [] ? rtnl_newlink+0x73c/0x8e0
    [] ? rtnl_newlink+0x16a/0x8e0
    [] ? __kmalloc_reserve.isra.30+0x32/0x90
    [] ? rtnetlink_rcv_msg+0x8d/0x250
    [] ? __alloc_skb+0x47/0x1f0
    [] ? __netlink_lookup+0xab/0xe0
    [] ? rtnetlink_rcv+0x30/0x30
    [] ? netlink_rcv_skb+0xb0/0xd0
    [] ? rtnetlink_rcv+0x24/0x30
    [] ? netlink_unicast+0x107/0x1a0
    [] ? netlink_sendmsg+0x50e/0x630
    [] ? sock_sendmsg+0x3c/0x50
    [] ? ___sys_sendmsg+0x27b/0x290
    [] ? mem_cgroup_try_charge+0x88/0x110
    [] ? mem_cgroup_commit_charge+0x56/0xa0
    [] ? do_filp_open+0x30/0xa0
    [] ? __sys_sendmsg+0x3e/0x80
    [] ? system_call_fastpath+0x16/0x75

    Fix this by unregistering the previous sysctl table (registered for
    the path containing the original device name) and re-registering the
    table for the path containing the new device name.

    Fixes: 37bde79979c3 ("mpls: Per-device enabling of packet input")
    Reported-by: Scott Feldman
    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     

11 Jun, 2015

4 commits

  • Jeff Layton reported the following;

    [ 74.232485] ------------[ cut here ]------------
    [ 74.233354] WARNING: CPU: 2 PID: 754 at net/core/sock.c:364 sk_clear_memalloc+0x51/0x80()
    [ 74.234790] Modules linked in: cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device nfsd snd_pcm snd_timer snd e1000 ppdev parport_pc joydev parport pvpanic soundcore floppy serio_raw i2c_piix4 pcspkr nfs_acl lockd virtio_balloon acpi_cpufreq auth_rpcgss grace sunrpc qxl drm_kms_helper ttm drm virtio_console virtio_blk virtio_pci ata_generic virtio_ring pata_acpi virtio
    [ 74.243599] CPU: 2 PID: 754 Comm: swapoff Not tainted 4.1.0-rc6+ #5
    [ 74.244635] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 74.245546] 0000000000000000 0000000079e69e31 ffff8800d066bde8 ffffffff8179263d
    [ 74.246786] 0000000000000000 0000000000000000 ffff8800d066be28 ffffffff8109e6fa
    [ 74.248175] 0000000000000000 ffff880118d48000 ffff8800d58f5c08 ffff880036e380a8
    [ 74.249483] Call Trace:
    [ 74.249872] [] dump_stack+0x45/0x57
    [ 74.250703] [] warn_slowpath_common+0x8a/0xc0
    [ 74.251655] [] warn_slowpath_null+0x1a/0x20
    [ 74.252585] [] sk_clear_memalloc+0x51/0x80
    [ 74.253519] [] xs_disable_swap+0x42/0x80 [sunrpc]
    [ 74.254537] [] rpc_clnt_swap_deactivate+0x7e/0xc0 [sunrpc]
    [ 74.255610] [] nfs_swap_deactivate+0x27/0x30 [nfs]
    [ 74.256582] [] destroy_swap_extents+0x74/0x80
    [ 74.257496] [] SyS_swapoff+0x222/0x5c0
    [ 74.258318] [] ? syscall_trace_leave+0xc7/0x140
    [ 74.259253] [] system_call_fastpath+0x12/0x71
    [ 74.260158] ---[ end trace 2530722966429f10 ]---

    The warning in question was unnecessary but with Jeff's series the rules
    are also clearer. This patch removes the warning and updates the comment
    to explain why sk_mem_reclaim() may still be called.

    [jlayton: remove if (sk->sk_forward_alloc) conditional. As Leon
    points out that it's not needed.]

    Cc: Leon Romanovsky
    Signed-off-by: Mel Gorman
    Signed-off-by: Jeff Layton
    Signed-off-by: David S. Miller

    Mel Gorman
     
  • Since the addition of sysfs multicast router support if one set
    multicast_router to "2" more than once, then the port would be added to
    the hlist every time and could end up linking to itself and thus causing an
    endless loop for rlist walkers.
    So to reproduce just do:
    echo 2 > multicast_router; echo 2 > multicast_router;
    in a bridge port and let some igmp traffic flow, for me it hangs up
    in br_multicast_flood().
    Fix this by adding a check in br_multicast_add_router() if the port is
    already linked.
    The reason this didn't happen before the addition of multicast_router
    sysfs entries is because there's a !hlist_unhashed check that prevents
    it.

    Signed-off-by: Nikolay Aleksandrov
    Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If the TIPC connection timer expires in a probing state, a
    self abort message is supposed to be generated and delivered
    to the local socket. This is currently broken, and the abort
    message is actually sent out to the peer node with invalid
    addressing information. This will cause the link to enter
    a constant retransmission state and eventually reset.
    We fix this by removing the self-abort message creation and
    tear down connection immediately instead.

    Signed-off-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     
  • This reverts commit 0243508edd317ff1fa63b495643a7c192fbfcd92.

    It introduces new regressions.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Jun, 2015

1 commit

  • Until recently, mac80211 overwrote all the statistics it could
    provide when getting called, but it now relies on the struct
    having been zeroed by the caller. This was always the case in
    nl80211, but wext used a static struct which could even cause
    values from one device leak to another.

    Using a static struct is OK (as even documented in a comment)
    since the whole usage of this function and its return value is
    always locked under RTNL. Not clearing the struct for calling
    the driver has always been wrong though, since drivers were
    free to only fill values they could report, so calling this
    for one device and then for another would always have leaked
    values from one to the other.

    Fix this by initializing the structure in question before the
    driver method call.

    This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691

    Cc: stable@vger.kernel.org
    Reported-by: Gerrit Renker
    Reported-by: Alexander Kaltsas
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

09 Jun, 2015

3 commits

  • Commit 70008aa50e92 ("skbuff: convert to skb_orphan_frags") replaced
    open coded tests of SKBTX_DEV_ZEROCOPY and skb_copy_ubufs with calls
    to helper function skb_orphan_frags. Apply that to the last remaining
    open coded site.

    Signed-off-by: Willem de Bruijn
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • UDP encapsulation is broken on IPv6. This is because the logic to resubmit
    the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
    the resubmit label is in the wrong position since we already get the
    nexthdr value when performing decapsulation. In addition the skb pull is no
    longer necessary either.

    This changes the return value check to look for < 0, using it for the
    nexthdr on the next iteration, and moves the resubmit label to the proper
    location.

    With these changes the v6 code now matches what we do in the v4 ip input
    code wrt resubmitting when decapsulating.

    Signed-off-by: Josh Hunt
    Acked-by: "Tom Herbert"
    Signed-off-by: David S. Miller

    Josh Hunt
     
  • The memory pointed to by idev->stats.icmpv6msgdev,
    idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
    read context without taking a reference on idev. For example, through
    IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
    waiting for an RCU grace period to elapse. This could lead to the
    memory being written to after it has been freed.

    Fix this by using call_rcu to free the memory used for stats, as well
    as idev after an RCU grace period has elapsed.

    Signed-off-by: Robert Shearman
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Robert Shearman
     

08 Jun, 2015

4 commits

  • br_fdb_update() can be called in process context in the following way:
    br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
    so we need to disable softirqs because there are softirq users of the
    hash_lock. One easy way to reproduce this is to modify the bridge utility
    to set NTF_USE, enable stp and then set maxageing to a low value so
    br_fdb_cleanup() is called frequently and then just add new entries in
    a loop. This happens because br_fdb_cleanup() is called from timer/softirq
    context. The spin locks in br_fdb_update were _bh before commit f8ae737deea1
    ("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
    and at the time that commit was correct because br_fdb_update() couldn't be
    called from process context, but that changed after commit:
    292d1398983f ("bridge: add NTF_USE support")
    Using local_bh_disable/enable around br_fdb_update() allows us to keep
    using the spin_lock/unlock in br_fdb_update for the fast-path.

    Signed-off-by: Nikolay Aleksandrov
    Fixes: 292d1398983f ("bridge: add NTF_USE support")
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This reverts commit 1d7c49037b12016e7056b9f2c990380e2187e766.

    Nikolay Aleksandrov has a better version of this fix.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The mpls device is used in an RCU read context without a lock being
    held. As the memory is freed without waiting for the RCU grace period
    to elapse, the freed memory could still be in use.

    Address this by using kfree_rcu to free the memory for the mpls device
    after the RCU grace period has elapsed.

    Fixes: 03c57747a702 ("mpls: Per-device MPLS state")
    Signed-off-by: Robert Shearman
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • br_fdb_update() can be called in process context in the following way:
    br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
    so we need to use spin_lock_bh because there are softirq users of the
    hash_lock. One easy way to reproduce this is to modify the bridge utility
    to set NTF_USE, enable stp and then set maxageing to a low value so
    br_fdb_cleanup() is called frequently and then just add new entries in
    a loop. This happens because br_fdb_cleanup() is called from timer/softirq
    context. These locks were _bh before commit f8ae737deea1
    ("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
    and at the time that commit was correct because br_fdb_update() couldn't be
    called from process context, but that changed after commit:
    292d1398983f ("bridge: add NTF_USE support")

    Signed-off-by: Wilson Kok
    Signed-off-by: Nikolay Aleksandrov
    Fixes: 292d1398983f ("bridge: add NTF_USE support")
    Signed-off-by: David S. Miller

    Wilson Kok
     

04 Jun, 2015

2 commits

  • 421b3885bf6d56391297844f43fb7154a6396e12 "udp: ipv4: Add udp early
    demux" introduced a regression that allowed sockets bound to INADDR_ANY
    to receive packets from multicast groups that the socket had not joined.
    For example a socket that had joined 224.168.2.9 could also receive
    packets from 225.168.2.9 despite not having joined that group if
    ip_early_demux is enabled.

    Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
    that the multicast packet is indeed ours.

    Signed-off-by: Shawn Bohrer
    Reported-by: Yurij M. Plotnikov
    Signed-off-by: David S. Miller

    Shawn Bohrer
     
  • Currently, openvswitch tries to disable LRO from the user space. This does
    not work correctly when the device added is a vlan interface, though.
    Instead of dealing with possibly complex stacked cross name space relations
    in the user space, do the same as bridging does and call dev_disable_lro in
    the kernel.

    Signed-off-by: Jiri Benc
    Acked-by: Flavio Leitner
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

02 Jun, 2015

4 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fix for net

    The following patch reverts the ebtables chunk that enforces counters that was
    introduced in the recently applied d26e2c9ffa38 ('Revert "netfilter: ensure
    number of counters is >0 in do_replace()"') since this breaks ebtables.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We currently rely on the PMTU discovery of xfrm.
    However if a packet is localy sent, the PMTU mechanism
    of xfrm tries to to local socket notification what
    might not work for applications like ping that don't
    check for this. So add pmtu handling to vti6_xmit to
    report MTU changes immediately.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • This reverts commit f96dee13b8e10f00840124255bed1d8b4c6afd6f.

    It isn't right, ethtool is meant to manage one PHY instance
    per netdevice at a time, and this is selected by the SET
    command. Therefore by definition the GET command must only
    return the settings for the configured and selected PHY.

    Reported-by: Ben Hutchings
    Signed-off-by: David S. Miller

    David S. Miller
     
  • This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of
    counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.

    Setting rules with ebtables does not work any more with 1086bbe97a07 place.

    There is an error message and no rules set in the end.

    e.g.

    ~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
    Unable to update the kernel. Two possible causes:
    1. Multiple ebtables programs were executing simultaneously. The ebtables
    userspace tool doesn't by default support multiple ebtables programs
    running

    Reverting the ebtables part of 1086bbe97a07 makes this work again.

    Signed-off-by: Bernhard Thaler
    Signed-off-by: Pablo Neira Ayuso

    Bernhard Thaler
     

01 Jun, 2015

3 commits

  • While shuffling some code around, dsa_switch_setup_one() was introduced,
    and it was modified to return either an error code using ERR_PTR() or a
    NULL pointer when running out of memory or failing to setup a switch.

    This is a problem for its caler: dsa_switch_setup() which uses IS_ERR()
    and expects to find an error code, not a NULL pointer, so we still try
    to proceed with dsa_switch_setup() and operate on invalid memory
    addresses. This can be easily reproduced by having e.g: the bcm_sf2
    driver built-in, but having no such switch, such that drv->setup will
    fail.

    Fix this by using PTR_ERR() consistently which is both more informative
    and avoids for the caller to use IS_ERR_OR_NULL().

    Fixes: df197195a5248 ("net: dsa: split dsa_switch_setup into two functions")
    Reported-by: Andrew Lunn
    Signed-off-by: Florian Fainelli
    Tested-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Linux 3.17 and earlier are explicitly engineered so that if the app
    doesn't specifically request a CC module on a listener before the SYN
    arrives, then the child gets the system default CC when the connection
    is established. See tcp_init_congestion_control() in 3.17 or earlier,
    which says "if no choice made yet assign the current value set as
    default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
    created") altered these semantics, so that children got their parent
    listener's congestion control even if the system default had changed
    after the listener was created.

    This commit returns to those original semantics from 3.17 and earlier,
    since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
    Congestion control initialization."), and some Linux congestion
    control workflows depend on that.

    In summary, if a listener socket specifically sets TCP_CONGESTION to
    "x", or the route locks the CC module to "x", then the child gets
    "x". Otherwise the child gets current system default from
    net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
    earlier, and this commit restores that.

    Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created")
    Cc: Florian Westphal
    Cc: Daniel Borkmann
    Cc: Glenn Judd
    Cc: Stephen Hemminger
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: Yuchung Cheng
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • We have two problems in UDP stack related to bogus checksums :

    1) We return -EAGAIN to application even if receive queue is not empty.
    This breaks applications using edge trigger epoll()

    2) Under UDP flood, we can loop forever without yielding to other
    processes, potentially hanging the host, especially on non SMP.

    This patch is an attempt to make things better.

    We might in the future add extra support for rt applications
    wanting to better control time spent doing a recv() in a hostile
    environment. For example we could validate checksums before queuing
    packets in socket receive queue.

    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 May, 2015

1 commit

  • br_multicast_query_expired() querier argument is a pointer to
    a struct bridge_mcast_querier :

    struct bridge_mcast_querier {
    struct br_ip addr;
    struct net_bridge_port __rcu *port;
    };

    Intent of the code was to clear port field, not the pointer to querier.

    Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port")
    Signed-off-by: Eric Dumazet
    Acked-by: Thadeu Lima de Souza Cascardo
    Acked-by: Linus Lüssing
    Cc: Linus Lüssing
    Cc: Steinar H. Gunderson
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 May, 2015

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2015-05-28

    1) Fix a race in xfrm_state_lookup_byspi, we need to take
    the refcount before we release xfrm_state_lock.
    From Li RongQing.

    2) Fix IV generation on ESN state. We used just the
    low order sequence numbers for IV generation on
    ESN, as a result the IV can repeat on the same
    state. Fix this by using the high order sequence
    number bits too and make sure to always initialize
    the high order bits with zero. These patches are
    serious stable candidates. Fixes from Herbert Xu.

    3) Fix the skb->mark handling on vti. We don't
    reset skb->mark in skb_scrub_packet anymore,
    so vti must care to restore the original
    value back after it was used to lookup the
    vti policy and state. Fixes from Alexander Duyck.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

28 May, 2015

5 commits

  • The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb->mark modified
    after completing the function. This resulted in the original skb->mark
    value being lost. Since we only need skb->mark to be set for
    xfrm_policy_check we can pull the assignment into the rcv_cb calls and then
    just restore the original mark after xfrm_policy_check has been completed.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Steffen Klassert

    Alexander Duyck
     
  • This change makes it so that if a tunnel is defined we just use the mark
    from the tunnel instead of the mark from the skb header. By doing this we
    can avoid the need to set skb->mark inside of the tunnel receive functions.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Steffen Klassert

    Alexander Duyck
     
  • Instead of modifying skb->mark we can simply modify the flowi_mark that is
    generated as a result of the xfrm_decode_session. By doing this we don't
    need to actually touch the skb->mark and it can be preserved as it passes
    out through the tunnel.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Steffen Klassert

    Alexander Duyck
     
  • Pull networking fixes from David Miller:

    1) Don't use MMIO on certain iwlwifi devices otherwise we get a
    firmware crash.

    2) Don't corrupt the GRO lists of mac80211 contexts by doing sends via
    timer interrupt, from Johannes Berg.

    3) SKB tailroom is miscalculated in AP_VLAN crypto code, from Michal
    Kazior.

    4) Fix fw_status memory leak in iwlwifi, from Haim Dreyfuss.

    5) Fix use after free in iwl_mvm_d0i3_enable_tx(), from Eliad Peller.

    6) JIT'ing of large BPF programs is broken on x86, from Alexei
    Starovoitov.

    7) EMAC driver ethtool register dump size is miscalculated, from Ivan
    Mikhaylov.

    8) Fix PHY initial link mode when autonegotiation is disabled in
    amd-xgbe, from Tom Lendacky.

    9) Fix NULL deref on SOCK_DEAD socket in AF_UNIX and CAIF protocols,
    from Mark Salyzyn.

    10) credit_bytes not initialized properly in xen-netback, from Ross
    Lagerwall.

    11) Fallback from MSI-X to INTx interrupts not handled properly in mlx4
    driver, fix from Benjamin Poirier.

    12) Perform ->attach() after binding dev->qdisc in packet scheduler,
    otherwise we can crash. From Cong WANG.

    13) Don't clobber data in sctp_v4_map_v6(). From Jason Gunthorpe.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
    sctp: Fix mangled IPv4 addresses on a IPv6 listening socket
    net_sched: invoke ->attach() after setting dev->qdisc
    xen-netfront: properly destroy queues when removing device
    mlx4_core: Fix fallback from MSI-X to INTx
    xen/netback: Properly initialize credit_bytes
    net: netxen: correct sysfs bin attribute return code
    tools: bpf_jit_disasm: fix segfault on disabled debugging log output
    unix/caif: sk_socket can disappear when state is unlocked
    amd-xgbe-phy: Fix initial mode when autoneg is disabled
    net: dp83640: fix improper double spin locking.
    net: dp83640: reinforce locking rules.
    net: dp83640: fix broken calibration routine.
    net: stmmac: create one debugfs dir per net-device
    net/ibm/emac: fix size of emac dump memory areas
    x86: bpf_jit: fix compilation of large bpf programs
    net: phy: bcm7xxx: Fix 7425 PHY ID and flags
    iwlwifi: mvm: avoid use-after-free on iwl_mvm_d0i3_enable_tx()
    iwlwifi: mvm: clean net-detect info if device was reset during suspend
    iwlwifi: mvm: take the UCODE_DOWN reference when resuming
    iwlwifi: mvm: BT Coex - duplicate the command if sent ASYNC
    ...

    Linus Torvalds
     
  • For mq qdisc, we add per tx queue qdisc to root qdisc
    for display purpose, however, that happens too early,
    before the new dev->qdisc is finally set, this causes
    q->list points to an old root qdisc which is going to be
    freed right before assigning with a new one.

    Fix this by moving ->attach() after setting dev->qdisc.

    For the record, this fixes the following crash:

    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
    list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
    CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
    ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
    ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
    Call Trace:
    [] dump_stack+0x4c/0x65
    [] warn_slowpath_common+0x9c/0xb6
    [] ? __list_del_entry+0x5a/0x98
    [] warn_slowpath_fmt+0x46/0x48
    [] ? dev_graft_qdisc+0x5e/0x6a
    [] __list_del_entry+0x5a/0x98
    [] list_del+0xe/0x2d
    [] qdisc_list_del+0x1e/0x20
    [] qdisc_destroy+0x30/0xd6
    [] qdisc_graft+0x11d/0x243
    [] tc_get_qdisc+0x1a6/0x1d4
    [] ? mark_lock+0x2e/0x226
    [] rtnetlink_rcv_msg+0x181/0x194
    [] ? rtnl_lock+0x17/0x19
    [] ? rtnl_lock+0x17/0x19
    [] ? __rtnl_unlock+0x17/0x17
    [] netlink_rcv_skb+0x4d/0x93
    [] rtnetlink_rcv+0x26/0x2d
    [] netlink_unicast+0xcb/0x150
    [] ? might_fault+0x59/0xa9
    [] netlink_sendmsg+0x4fa/0x51c
    [] sock_sendmsg_nosec+0x12/0x1d
    [] sock_sendmsg+0x29/0x2e
    [] ___sys_sendmsg+0x1b4/0x23a
    [] ? native_sched_clock+0x35/0x37
    [] ? sched_clock_local+0x12/0x72
    [] ? sched_clock_cpu+0x9e/0xb7
    [] ? current_kernel_time+0xe/0x32
    [] ? lock_release_holdtime.part.29+0x71/0x7f
    [] ? read_seqcount_begin.constprop.27+0x5f/0x76
    [] ? trace_hardirqs_on_caller+0x17d/0x199
    [] ? __fget_light+0x50/0x78
    [] __sys_sendmsg+0x42/0x60
    [] SyS_sendmsg+0x12/0x1c
    [] system_call_fastpath+0x12/0x6f
    ---[ end trace ef29d3fb28e97ae7 ]---

    For long term, we probably need to clean up the qdisc_graft() code
    in case it hides other bugs like this.

    Fixes: 95dc19299f74 ("pkt_sched: give visibility to mq slave qdiscs")
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    WANG Cong
     

27 May, 2015

2 commits

  • got a rare NULL pointer dereference in clear_bit

    Signed-off-by: Mark Salyzyn
    Acked-by: Hannes Frederic Sowa
    ----
    v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
    v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
    Signed-off-by: David S. Miller

    Mark Salyzyn
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    We have three more fixes:
    * AP_VLAN tailroom calculation fix, the bug leads to warnings
    along with dropped packets
    * NAPI context issue, calling napi_gro_receive() from a timer
    (obviously) can lead to crashes
    * remain-on-channel combining leads to dropped requests and not
    being able to finish certain operations, so remove it
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

24 May, 2015

1 commit


23 May, 2015

6 commits

  • Following lockdep splat was reported :

    [ 29.382286] ===============================
    [ 29.382315] [ INFO: suspicious RCU usage. ]
    [ 29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
    [ 29.382380] -------------------------------
    [ 29.382409] net/bridge/br_private.h:626 suspicious
    rcu_dereference_check() usage!
    [ 29.382455]
    other info that might help us debug this:

    [ 29.382507]
    rcu_scheduler_active = 1, debug_locks = 0
    [ 29.382549] 2 locks held by swapper/0/0:
    [ 29.382576] #0: (((&p->forward_delay_timer))){+.-...}, at:
    [] call_timer_fn+0x5/0x4f0
    [ 29.382660] #1: (&(&br->lock)->rlock){+.-...}, at:
    [] br_forward_delay_timer_expired+0x31/0x140
    [bridge]
    [ 29.382754]
    stack backtrace:
    [ 29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
    4.1.0-0.rc0.git11.1.fc23.x86_64 #1
    [ 29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
    [ 29.382882] 0000000000000000 3ebfc20364115825 ffff880666603c48
    ffffffff81892d4b
    [ 29.382943] 0000000000000000 ffffffff81e124e0 ffff880666603c78
    ffffffff8110bcd7
    [ 29.383004] ffff8800785c9d00 ffff88065485ac58 ffff880c62002800
    ffff880c5fc88ac0
    [ 29.383065] Call Trace:
    [ 29.383084] [] dump_stack+0x4c/0x65
    [ 29.383130] [] lockdep_rcu_suspicious+0xe7/0x120
    [ 29.383178] [] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
    [ 29.383225] [] br_ifinfo_notify+0x11b/0x4b0 [bridge]
    [ 29.383271] [] ? br_hold_timer_expired+0x70/0x70 [bridge]
    [ 29.383320] []
    br_forward_delay_timer_expired+0x58/0x140 [bridge]
    [ 29.383371] [] ? br_hold_timer_expired+0x70/0x70 [bridge]
    [ 29.383416] [] call_timer_fn+0xc3/0x4f0
    [ 29.383454] [] ? call_timer_fn+0x5/0x4f0
    [ 29.383493] [] ? lock_release_holdtime.part.29+0xf/0x200
    [ 29.383541] [] ? br_hold_timer_expired+0x70/0x70 [bridge]
    [ 29.383587] [] run_timer_softirq+0x244/0x490
    [ 29.383629] [] __do_softirq+0xec/0x670
    [ 29.383666] [] irq_exit+0x145/0x150
    [ 29.383703] [] smp_apic_timer_interrupt+0x46/0x60
    [ 29.383744] [] apic_timer_interrupt+0x73/0x80
    [ 29.383782] [] ? cpuidle_enter_state+0x5f/0x2f0
    [ 29.383832] [] ? cpuidle_enter_state+0x5b/0x2f0

    Problem here is that br_forward_delay_timer_expired() is a timer
    handler, calling br_ifinfo_notify() which assumes either rcu_read_lock()
    or RTNL are held.

    Simplest fix seems to add rcu read lock section.

    Signed-off-by: Eric Dumazet
    Reported-by: Josh Boyer
    Reported-by: Dominick Grift
    Cc: Vlad Yasevich
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When trying to configure the settings for PHY1, using commands
    like 'ethtool -s eth0 phyad 1 speed 100', the 'ethtool' seems to
    modify other settings apart from the speed of the PHY1, in the
    above case.

    The ethtool seems to query the settings for PHY0, and use this
    as the base to apply the new settings to the PHY1. This is
    causing the other settings of the PHY 1 to be wrongly
    configured.

    The issue is caused by the '_ethtool_get_settings()' API, which
    gets called because of the 'ETHTOOL_GSET' command, is clearing
    the 'cmd' pointer (of type 'struct ethtool_cmd') by calling
    memset. This clears all the parameters (if any) passed for the
    'ETHTOOL_GSET' cmd. So the driver's callback is always invoked
    with 'cmd->phy_address' as '0'.

    The '_ethtool_get_settings()' is called from other files in the
    'net/core'. So the fix is applied to the 'ethtool_get_settings()'
    which is only called in the context of the 'ethtool'.

    Signed-off-by: Arun Parameswaran
    Reviewed-by: Ray Jui
    Reviewed-by: Scott Branden
    Signed-off-by: David S. Miller

    Arun Parameswaran
     
  • When more than a multicast address is present in a MLDv2 report, all but
    the first address is ignored, because the code breaks out of the loop if
    there has not been an error adding that address.

    This has caused failures when two guests connected through the bridge
    tried to communicate using IPv6. Neighbor discoveries would not be
    transmitted to the other guest when both used a link-local address and a
    static address.

    This only happens when there is a MLDv2 querier in the network.

    The fix will only break out of the loop when there is a failure adding a
    multicast address.

    The mdb before the patch:

    dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
    dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
    dev ovirtmgmt port bond0.86 grp ff02::2 temp

    After the patch:

    dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
    dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
    dev ovirtmgmt port bond0.86 grp ff02::fb temp
    dev ovirtmgmt port bond0.86 grp ff02::2 temp
    dev ovirtmgmt port bond0.86 grp ff02::d temp
    dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
    dev ovirtmgmt port bond0.86 grp ff02::16 temp
    dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
    dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
    dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp

    Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.")
    Reported-by: Rik Theys
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Tested-by: Rik Theys
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     
  • When replacing an IPv4 route, tb_id member of the new fib_alias
    structure is not set in the replace code path so that the new route is
    ignored.

    Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
    Signed-off-by: Michal Kubecek
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Michal Kubeček
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contain Netfilter fixes for your net tree, they are:

    1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash.
    This problem is due to wrong order in the per-net registration and netlink
    socket events. Patch from Francesco Ruggeri.

    2) Make sure that counters that userspace pass us are higher than 0 in all the
    x_tables frontends. Discovered via Trinity, patch from Dave Jones.

    3) Revert a patch for br_netfilter to rely on the conntrack status bits. This
    breaks stateless IPv6 NAT transformations. Patch from Florian Westphal.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • ip_error does not check if in_dev is NULL before dereferencing it.

    IThe following sequence of calls is possible:
    CPU A CPU B
    ip_rcv_finish
    ip_route_input_noref()
    ip_route_input_slow()
    inetdev_destroy()
    dst_input()

    With the result that a network device can be destroyed while processing
    an input packet.

    A crash was triggered with only unicast packets in flight, and
    forwarding enabled on the only network device. The error condition
    was created by the removal of the network device.

    As such it is likely the that error code was -EHOSTUNREACH, and the
    action taken by ip_error (if in_dev had been accessible) would have
    been to not increment any counters and to have tried and likely failed
    to send an icmp error as the network device is going away.

    Therefore handle this weird case by just dropping the packet if
    !in_dev. It will result in dropping the packet sooner, and will not
    result in an actual change of behavior.

    Fixes: 251da4130115b ("ipv4: Cache ip_error() routes even when not forwarding.")
    Reported-by: Vittorio Gambaletta
    Tested-by: Vittorio Gambaletta
    Signed-off-by: Vittorio Gambaletta
    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman