06 Jul, 2016

2 commits

  • It was first reported and reproduced by Petr (thanks!) in
    https://bugzilla.kernel.org/show_bug.cgi?id=119581

    free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

    However, after fixing a deadlock bug in
    commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
    free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

    It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

    kmemleak somehow did not report it. We nailed it down by
    observing the pcpu entries in /proc/vmallocinfo (first suggested
    by Hannes, thanks!).

    Signed-off-by: Martin KaFai Lau
    Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
    Reported-by: Petr Novopashenniy
    Tested-by: Petr Novopashenniy
    Acked-by: Hannes Frederic Sowa
    Cc: Hannes Frederic Sowa
    Cc: Petr Novopashenniy
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
    (i.e. if userspace passes a malformed netlink message).

    Let's use the helpers from net/nexthop.h which take care of all this
    stuff. We can do exactly the same as e.g. fib_count_nexthops() and
    fib_get_nhs() from net/ipv4/fib_semantics.c.

    This fixes the softlockup for me.

    Cc: Thomas Graf
    Signed-off-by: Vegard Nossum
    Signed-off-by: David S. Miller

    Vegard Nossum
     

05 Jul, 2016

1 commit

  • If register_pernet_subsys() fails, we shouldn't try to call
    unregister_pernet_subsys().

    Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
    Cc: stable@vger.kernel.org
    Cc: Sowmini Varadhan
    Cc: David S. Miller
    Signed-off-by: Vegard Nossum
    Acked-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Vegard Nossum
     

02 Jul, 2016

3 commits

  • Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes
    of the link name where left out.

    Making the output of tipc-config -ls look something like:
    Link statistics:
    dcast-link
    1:data0-1.1.2:data0
    1:data0-1.1.3:data0

    Also, for the record, the patch that introduce this regression
    claims "Sending the whole object out can cause a leak". Which isn't
    very likely as this is a compat layer, where the data we are parsing
    is generated by us and we know the string to be NULL terminated. But
    you can of course never be to secure.

    Fixes: 5d2be1422e02 (tipc: fix an infoleak in tipc_nl_compat_link_dump)
    Signed-off-by: Richard Alpe
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation")
    we need to fixup the checksum for CHECKSUM_COMPLETE when
    pushing skb on RX path. Otherwise we get similar splats.

    Cc: Jamal Hadi Salim
    Cc: Tom Herbert
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
    they want packets going in both directions on a flow to hash to the
    same bucket.

    The core kernel SKB hash became non-symmetric when the ipv6 flow label
    and other entities were incorporated into the standard flow hash order
    to increase entropy.

    But there are no users of PACKET_FANOUT_HASH who want an assymetric
    hash, they all want a symmetric one.

    Therefore, use the flow dissector to compute a flat symmetric hash
    over only the protocol, addresses and ports. This hash does not get
    installed into and override the normal skb hash, so this change has
    no effect whatsoever on the rest of the stack.

    Reported-by: Eric Leblond
    Tested-by: Eric Leblond
    Signed-off-by: David S. Miller

    David S. Miller
     

30 Jun, 2016

2 commits

  • ip_skb_dst_mtu uses skb->sk, assuming it is an AF_INET socket (e.g. it
    calls ip_sk_use_pmtu which casts sk as an inet_sk).

    However, in the case of UDP tunneling, the skb->sk is not necessarily an
    inet socket (could be AF_PACKET socket, or AF_UNSPEC if arriving from
    tun/tap).

    OTOH, the sk passed as an argument throughout IP stack's output path is
    the one which is of PMTU interest:
    - In case of local sockets, sk is same as skb->sk;
    - In case of a udp tunnel, sk is the tunneling socket.

    Fix, by passing ip_finish_output's sk to ip_skb_dst_mtu.
    This augments 7026b1ddb6 'netfilter: Pass socket pointer down through okfn().'

    Signed-off-by: Shmulik Ladkani
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Shmulik Ladkani
     
  • Pull networking fixes from David Miller:
    "I've been traveling so this accumulates more than week or so of bug
    fixing. It perhaps looks a little worse than it really is.

    1) Fix deadlock in ath10k driver, from Ben Greear.

    2) Increase scan timeout in iwlwifi, from Luca Coelho.

    3) Unbreak STP by properly reinjecting STP packets back into the
    stack. Regression fix from Ido Schimmel.

    4) Mediatek driver fixes (missing malloc failure checks, leaking of
    scratch memory, wrong indexing when mapping TX buffers, etc.) from
    John Crispin.

    5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
    Sowa.

    6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.

    7) Fix netlink notifications in ovs for tunnels, delete link messages
    are never emitted because of how the device registry state is
    handled. From Nicolas Dichtel.

    8) Conntrack module leaks kmemcache on unload, from Florian Westphal.

    9) Prevent endless jump loops in nft rules, from Liping Zhang and
    Pablo Neira Ayuso.

    10) Not early enough spinlock initialization in mlx4, from Eric
    Dumazet.

    11) Bind refcount leak in act_ipt, from Cong WANG.

    12) Missing RCU locking in HTB scheduler, from Florian Westphal.

    13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
    barrier, using heap for SG and IV, and erroneous use of async flag
    when allocating AEAD conext.)

    14) RCU handling fix in TIPC, from Ying Xue.

    15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
    SIT driver, from Simon Horman.

    16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.

    17) Fix potential deadlock in team enslave, from Ido Schimmel.

    18) Memory leak in KCM procfs handling, from Jiri Slaby.

    19) ESN generation fix in ipv4 ESP, from Herbert Xu.

    20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
    WANG.

    21) Use after free in netem, from Eric Dumazet.

    22) Uninitialized last assert time in multicast router code, from Tom
    Goff.

    23) Skip raw sockets in sock_diag destruction broadcast, from Willem
    de Bruijn.

    24) Fix link status reporting in thunderx, from Sunil Goutham.

    25) Limit resegmentation of retransmit queue so that we do not
    retransmit too large GSO frames. From Eric Dumazet.

    26) Delay bpf program release after grace period, from Daniel
    Borkmann"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
    openvswitch: fix conntrack netlink event delivery
    qed: Protect the doorbell BAR with the write barriers.
    neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
    e1000e: keep VLAN interfaces functional after rxvlan off
    cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
    qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
    bpf, perf: delay release of BPF prog after grace period
    net: bridge: fix vlan stats continue counter
    tcp: do not send too big packets at retransmit time
    ibmvnic: fix to use list_for_each_safe() when delete items
    net: thunderx: Fix TL4 configuration for secondary Qsets
    net: thunderx: Fix link status reporting
    net/mlx5e: Reorganize ethtool statistics
    net/mlx5e: Fix number of PFC counters reported to ethtool
    net/mlx5e: Prevent adding the same vxlan port
    net/mlx5e: Check for BlueFlame capability before allocating SQ uar
    net/mlx5e: Change enum to better reflect usage
    net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
    net/mlx5: Update command strings
    net: marvell: Add separate config ANEG function for Marvell 88E1111
    ...

    Linus Torvalds
     

29 Jun, 2016

11 commits

  • …ux/kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    Just two small fixes
    * fix mesh peer link counter, decrement wasn't always done at all
    * fix ethertype (length) for packets without RFC 1042 or bridge
    tunnel header
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Only the first and last netlink message for a particular conntrack are
    actually sent. The first message is sent through nf_conntrack_confirm when
    the conntrack is committed. The last one is sent when the conntrack is
    destroyed on timeout. The other conntrack state change messages are not
    advertised.

    When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
    is called for each packet, from the postrouting hook, which in turn calls
    nf_ct_deliver_cached_events to send the state change netlink messages.

    This commit fixes the problem by calling nf_ct_deliver_cached_events in the
    non-commit case as well.

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    CC: Joe Stringer
    CC: Justin Pettit
    CC: Andy Zhou
    CC: Thomas Graf
    Signed-off-by: Samuel Gauthier
    Acked-by: Joe Stringer
    Signed-off-by: David S. Miller

    Samuel Gauthier
     
  • neigh_xmit() expects to be called inside an RCU-bh read side critical
    section, and while one of its two current callers gets this right, the
    other one doesn't.

    More specifically, neigh_xmit() has two callers, mpls_forward() and
    mpls_output(), and while both callers call neigh_xmit() under
    rcu_read_lock(), this provides sufficient protection for neigh_xmit()
    only in the case of mpls_forward(), as that is always called from
    softirq context and therefore doesn't need explicit BH protection,
    while mpls_output() can be called from process context with softirqs
    enabled.

    When mpls_output() is called from process context, with softirqs
    enabled, we can be preempted by a softirq at any time, and RCU-bh
    considers the completion of a softirq as signaling the end of any
    pending read-side critical sections, so if we do get a softirq
    while we are in the part of neigh_xmit() that expects to be run inside
    an RCU-bh read side critical section, we can end up with an unexpected
    RCU grace period running right in the middle of that critical section,
    making things go boom.

    This patch fixes this impedance mismatch in the callee, by making
    neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
    expects to be treated as an RCU-bh read side critical section, as this
    seems a safer option than fixing it in the callers.

    Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
    Signed-off-by: David Barroso
    Signed-off-by: Lennert Buytenhek
    Acked-by: David Ahern
    Acked-by: Robert Shearman
    Signed-off-by: David S. Miller

    David Barroso
     
  • The PDU length of incoming LLC frames is set to the total skb payload size
    in __ieee80211_data_to_8023() of net/wireless/util.c which incorrectly
    includes the length of the IEEE 802.11 header.

    The resulting LLC frame header has a too large PDU length, causing the
    llc_fixup_skb() function of net/llc/llc_input.c to reject the incoming
    skb, effectively breaking STP.

    Solve the problem by properly substracting the IEEE 802.11 frame header size
    from the PDU length, allowing the LLC processor to pick up the incoming
    control messages.

    Special thanks to Gerry Rozema for tracking down the regression and proposing
    a suitable patch.

    Fixes: 2d1c304cb2d5 ("cfg80211: add function for 802.3 conversion with separate output buffer")
    Cc: stable@vger.kernel.org
    Reported-by: Gerry Rozema
    Signed-off-by: Felix Fietkau
    Signed-off-by: Johannes Berg

    Felix Fietkau
     
  • I made a dumb off-by-one mistake when I added the vlan stats counter
    dumping code. The increment should happen before the check, not after
    otherwise we miss one entry when we continue dumping.

    Fixes: a60c090361ea ("bridge: netlink: export per-vlan stats")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Arjun reported a bug in TCP stack and bisected it to a recent commit.

    In case where we process SACK, we can coalesce multiple skbs
    into fat ones (tcp_shift_skb_data()), to lower write queue
    overhead, because we do not expect to retransmit these packets.

    However, SACK reneging can happen, forcing the sender to retransmit
    all these packets. If skb->len is above 64KB, we then send buggy
    IP packets that could hang TSO engine on cxgb4.

    Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
    so that we cook packets of optimal size vs TCP/pacing.

    Thanks to Arjun for reporting the bug and running the tests !

    Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
    Signed-off-by: Eric Dumazet
    Reported-by: Arjun V
    Tested-by: Arjun V
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The untagged vlan object is only destroyed when the interface is removed
    via the legacy sysfs interface. But it also has to be destroyed when the
    standard rtnl-link interface is used.

    Fixes: 5d2c05b21337 ("batman-adv: add per VLAN interface attribute framework")
    Signed-off-by: Sven Eckelmann
    Acked-by: Antonio Quartulli
    Signed-off-by: Marek Lindner
    Signed-off-by: David S. Miller

    Sven Eckelmann
     
  • The skb_linearize may reallocate the skb. This makes the calculated pointer
    for ethhdr invalid. But it the pointer is used later to fill in the RR
    field of the batadv_icmp_packet_rr packet.

    Instead re-evaluate eth_hdr after the skb_linearize+skb_cow to fix the
    pointer and avoid the invalid read.

    Fixes: da6b8c20a5b8 ("batman-adv: generalize batman-adv icmp packet handling")
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: David S. Miller

    Sven Eckelmann
     
  • Each batadv_tt_local_entry hold a single reference to a
    batadv_softif_vlan. In case a new entry cannot be added to the hash
    table, the error path puts the reference, but the reference will also
    now be dropped by batadv_tt_local_entry_release().

    Fixes: a33d970d0b54 ("batman-adv: Fix reference counting of vlan object for tt_local_entry")
    Signed-off-by: Ben Hutchings
    Signed-off-by: Marek Lindner
    Signed-off-by: Sven Eckelmann
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • The tt_req_node is added and removed from a list inside a spinlock. But the
    locking is sometimes removed even when the object is still referenced and
    will be used later via this reference. For example batadv_send_tt_request
    can create a new tt_req_node (including add to a list) and later
    re-acquires the lock to remove it from the list and to free it. But at this
    time another context could have already removed this tt_req_node from the
    list and freed it.

    CPU#0

    batadv_batman_skb_recv from net_device 0
    -> batadv_iv_ogm_receive
    -> batadv_iv_ogm_process
    -> batadv_iv_ogm_process_per_outif
    -> batadv_tvlv_ogm_receive
    -> batadv_tvlv_ogm_receive
    -> batadv_tvlv_containers_process
    -> batadv_tvlv_call_handler
    -> batadv_tt_tvlv_ogm_handler_v1
    -> batadv_tt_update_orig
    -> batadv_send_tt_request
    -> batadv_tt_req_node_new
    spin_lock(...)
    allocates new tt_req_node and adds it to list
    spin_unlock(...)
    return tt_req_node

    CPU#1

    batadv_batman_skb_recv from net_device 1
    -> batadv_recv_unicast_tvlv
    -> batadv_tvlv_containers_process
    -> batadv_tvlv_call_handler
    -> batadv_tt_tvlv_unicast_handler_v1
    -> batadv_handle_tt_response
    spin_lock(...)
    tt_req_node gets removed from list and is freed
    spin_unlock(...)

    CPU#0


    Tested-by: Martin Weinelt
    Tested-by: Amadeus Alfa
    Signed-off-by: Marek Lindner
    Signed-off-by: David S. Miller

    Sven Eckelmann
     
  • If a VLAN tagged frame is received and the corresponding VLAN is not
    configured on the soft interface, it will splat a WARN on every packet
    received. This is a quite annoying behaviour for some scenarios, e.g. if
    bat0 is bridged with eth0, and there are arbitrary VLAN tagged frames
    from Ethernet coming in without having any VLAN configuration on bat0.

    The code should probably create vlan objects on the fly and
    transparently transport these VLAN-tagged Ethernet frames, but until
    this is done, at least the WARN splat should be replaced by a rate
    limited output.

    Fixes: 354136bcc3c4 ("batman-adv: fix kernel crash due to missing NULL checks")
    Signed-off-by: Simon Wunderlich
    Signed-off-by: Marek Lindner
    Signed-off-by: Sven Eckelmann
    Signed-off-by: David S. Miller

    Simon Wunderlich
     

28 Jun, 2016

3 commits

  • The bridge is falsly dropping ipv6 mulitcast packets if there is:
    1. No ipv6 address assigned on the brigde.
    2. No external mld querier present.
    3. The internal querier enabled.

    When the bridge fails to build mld queries, because it has no
    ipv6 address, it slilently returns, but keeps the local querier enabled.
    This specific case causes confusing packet loss.

    Ipv6 multicast snooping can only work if:
    a) An external querier is present
    OR
    b) The bridge has an ipv6 address an is capable of sending own queries

    Otherwise it has to forward/flood the ipv6 multicast traffic,
    because snooping cannot work.

    This patch fixes the issue by adding a flag to the bridge struct that
    indicates that there is currently no ipv6 address assinged to the bridge
    and returns a false state for the local querier in
    __br_multicast_querier_exists().

    Special thanks to Linus Lüssing.

    Fixes: d1d81d4c3dd8 ("bridge: check return value of ipv6_dev_get_saddr()")
    Signed-off-by: Daniel Danzberger
    Acked-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Daniel
     
  • If a user space program (e.g., wpa_supplicant) deletes a STA entry that
    is currently in NL80211_PLINK_ESTAB state, the number of established
    plinks counter was not decremented and this could result in rejecting
    new plink establishment before really hitting the real maximum plink
    limit. For !user_mpm case, this decrementation is handled by
    mesh_plink_deactive().

    Fix this by decrementing estab_plinks on STA deletion
    (mesh_sta_cleanup() gets called from there) so that the counter has a
    correct value and the Beacon frame advertisement in Mesh Configuration
    element shows the proper value for capability to accept additional
    peers.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jouni Malinen
    Signed-off-by: Johannes Berg

    Jouni Malinen
     
  • This fixes wrong-interface signaling on 32-bit platforms for entries
    created when jiffies > 2^31 + MFC_ASSERT_THRESH.

    Signed-off-by: Tom Goff
    Signed-off-by: David S. Miller

    Tom Goff
     

27 Jun, 2016

2 commits

  • There are several places where the listener and pending or accept queue
    child sockets are accessed at the same time. Lockdep is unhappy that
    two locks from the same class are held.

    Tell lockdep that it is safe and document the lock ordering.

    Originally Claudio Imbrenda sent a similar
    patch asking whether this is safe. I have audited the code and also
    covered the vsock_pending_work() function.

    Suggested-by: Claudio Imbrenda
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     
  • with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
    nexthop lookups"), net hop lookup is first performed on route creation
    in the passed-in table.
    However device match is not enforced in table lookup, so the found
    route can be later discarded due to egress device mismatch and no
    global lookup will be performed.
    This cause the following to fail:

    ip link add dummy1 type dummy
    ip link add dummy2 type dummy
    ip link set dummy1 up
    ip link set dummy2 up
    ip route add 2001:db8:8086::/48 dev dummy1 metric 20
    ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
    ip route add 2001:db8:8086::/48 dev dummy2 metric 21
    ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
    RTNETLINK answers: No route to host

    This change fixes the issue enforcing device lookup in
    ip6_nh_lookup_table()

    v1->v2: updated commit message title

    Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
    Reported-and-tested-by: Beniamino Galvani
    Signed-off-by: Paolo Abeni
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Paolo Abeni
     

24 Jun, 2016

3 commits

  • If the packet was dropped by lower qdisc, then we must not
    access it later.

    Save qdisc_pkt_len(skb) in a temp variable.

    Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
    Signed-off-by: Eric Dumazet
    Cc: WANG Cong
    Cc: Jamal Hadi Salim
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Alexey reported that we have GFP_KERNEL allocation when
    holding the spinlock tcf_lock. Actually we don't have
    to take that spinlock for all the cases, especially
    for the new one we just create. To modify the existing
    actions, we still need this spinlock to make sure
    the whole update is atomic.

    For net-next, we can get rid of this spinlock because
    we already hold the RTNL lock on slow path, and on fast
    path we can use RCU to protect the metalist.

    Joint work with Jamal.

    Reported-by: Alexey Khoroshilov
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

23 Jun, 2016

3 commits

  • Blair Steven noticed that ESN in conjunction with UDP encapsulation
    is broken because we set the temporary ESP header to the wrong spot.

    This patch fixes this by first of all using the right spot, i.e.,
    4 bytes off the real ESP header, and then saving this information
    so that after encryption we can restore it properly.

    Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
    Reported-by: Blair Steven
    Signed-off-by: Herbert Xu
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • When extracting an individual message from a received "bundle" buffer,
    we just create a clone of the base buffer, and adjust it to point into
    the right position of the linearized data area of the latter. This works
    well for regular message reception, but during periods of extremely high
    load it may happen that an extracted buffer, e.g, a connection probe, is
    reversed and forwarded through an external interface while the preceding
    extracted message is still unhandled. When this happens, the header or
    data area of the preceding message will be partially overwritten by a
    MAC header, leading to unpredicatable consequences, such as a link
    reset.

    We now fix this by ensuring that the msg_reverse() function never
    returns a cloned buffer, and that the returned buffer always contains
    sufficient valid head and tail room to be forwarded.

    Reported-by: Erik Hugne
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Every open of /proc/net/kcm leaks 16 bytes of memory as is reported by
    kmemleak:
    unreferenced object 0xffff88059c0e3458 (size 192):
    comm "cat", pid 1401, jiffies 4294935742 (age 310.720s)
    hex dump (first 32 bytes):
    28 45 71 96 05 88 ff ff 00 10 00 00 00 00 00 00 (Eq.............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmem_cache_alloc_trace+0x16e/0x230
    [] seq_open+0x79/0x1d0
    [] kcm_seq_open+0x0/0x30 [kcm]
    [] seq_open+0x79/0x1d0
    [] __seq_open_private+0x2f/0xa0
    [] seq_open_net+0x38/0xa0
    ...

    It is caused by a missing free in the ->release path. So fix it by
    providing seq_release_net as the ->release method.

    Signed-off-by: Jiri Slaby
    Fixes: cd6e111bf5 (kcm: Add statistics and proc interfaces)
    Cc: "David S. Miller"
    Cc: Tom Herbert
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Jiri Slaby
     

19 Jun, 2016

2 commits

  • Fix coding style issues in the following files:

    ib_cm.c: add space
    loop.c: convert spaces to tabs
    sysctl.c: add space
    tcp.h: convert spaces to tabs
    tcp_connect.c:remove extra indentation in switch statement
    tcp_recv.c: convert spaces to tabs
    tcp_send.c: convert spaces to tabs
    transport.c: move brace up one line on for statement

    Signed-off-by: Joshua Houghton
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Joshua Houghton
     
  • A socket connection made in ax.25 is not closed when session is
    completed. The heartbeat timer is stopped prematurely and this is
    where the socket gets closed. Allow heatbeat timer to run to close
    socket. Symptom occurs in kernels >= 4.2.0

    Originally sent 6/15/2016. Resend with distribution list matching
    scripts/maintainer.pl output.

    Signed-off-by: Basil Gunn
    Signed-off-by: David S. Miller

    Basil Gunn
     

18 Jun, 2016

3 commits

  • The state of the rds_connection after rds_tcp_reset_callbacks() would
    be RDS_CONN_RESETTING and this is the value that should be passed
    by rds_tcp_accept_one() to rds_connect_path_complete() to transition
    the socket to RDS_CONN_UP.

    Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
    by rds_tcp_accept_one()")
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • We sometimes observe a 'deadly embrace' type deadlock occurring
    between mutually connected sockets on the same node. This happens
    when the one-hour peer supervision timers happen to expire
    simultaneously in both sockets.

    The scenario is as follows:

    CPU 1: CPU 2:
    -------- --------
    tipc_sk_timeout(sk1) tipc_sk_timeout(sk2)
    lock(sk1.slock) lock(sk2.slock)
    msg_create(probe) msg_create(probe)
    unlock(sk1.slock) unlock(sk2.slock)
    tipc_node_xmit_skb() tipc_node_xmit_skb()
    tipc_node_xmit() tipc_node_xmit()
    tipc_sk_rcv(sk2) tipc_sk_rcv(sk1)
    lock(sk2.slock) lock((sk1.slock)
    filter_rcv() filter_rcv()
    tipc_sk_proto_rcv() tipc_sk_proto_rcv()
    msg_create(probe_rsp) msg_create(probe_rsp)
    tipc_sk_respond() tipc_sk_respond()
    tipc_node_xmit_skb() tipc_node_xmit_skb()
    tipc_node_xmit() tipc_node_xmit()
    tipc_sk_rcv(sk1) tipc_sk_rcv(sk2)
    lock((sk1.slock) lock((sk2.slock)
    ===> DEADLOCK ===> DEADLOCK

    Further analysis reveals that there are three different locations in the
    socket code where tipc_sk_respond() is called within the context of the
    socket lock, with ensuing risk of similar deadlocks.

    We now solve this by passing a buffer queue along with all upcalls where
    sk_lock.slock may potentially be held. Response or rejected message
    buffers are accumulated into this queue instead of being sent out
    directly, and only sent once we know we are safely outside the slock
    context.

    Reported-by: GUNA
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are rather small patches but fixing several outstanding bugs in
    nf_conntrack and nf_tables, as well as minor problems with missing
    SYNPROXY header uapi installation:

    1) Oneliner not to leak conntrack kmemcache on module removal, this
    problem was introduced in the previous merge window, patch from
    Florian Westphal.

    2) Two fixes for insufficient ruleset loop validation, one due to
    incorrect flag check in nf_tables_bind_set() and another related to
    silly wrong generation mask logic from the walk path, from Liping
    Zhang.

    3) Fix double-free of anonymous sets on error, this fix simplifies the
    code to let the abort path take care of releasing the set object,
    also from Liping Zhang.

    4) The introduction of helper function for transactions broke the skip
    inactive rules logic from the nft_do_chain(), again from Liping
    Zhang.

    5) Two patches to install uapi xt_SYNPROXY.h header and calm down
    kbuild robot due to missing #include .
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Jun, 2016

3 commits

  • Pull nfsd bugfixes from Bruce Fields:
    "Oleg Drokin found and fixed races in the nfsd4 state code that go back
    to the big nfs4_lock_state removal around 3.17 (but that were also
    probably hard to reproduce before client changes in 3.20 allowed the
    client to perform parallel opens).

    Also fix a 4.1 backchannel crash due to rpc multipath changes in 4.6.
    Trond acked the client-side rpc fixes going through my tree"

    * tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linux:
    nfsd: Make init_open_stateid() a bit more whole
    nfsd: Extend the mutex holding region around in nfsd4_process_open2()
    nfsd: Always lock state exclusively.
    rpc: share one xps between all backchannels
    nfsd4/rpc: move backchannel create logic into rpc code
    SUNRPC: fix xprt leak on xps allocation failure
    nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfix

    Linus Torvalds
     
  • Pull overlayfs fixes from Miklos Szeredi:
    "This contains two regression fixes: one for the xattr API update and
    one for using the mounter's creds in file creation in overlayfs.

    There's also a fix for a bug in handling hard linked AF_UNIX sockets
    that's been there from day one. This fix is overlayfs only despite
    the fact that it touches code outside the overlay filesystem: d_real()
    is an identity function for all except overlay dentries"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix uid/gid when creating over whiteout
    ovl: xattr filter fix
    af_unix: fix hard linked sockets on overlay
    vfs: add d_real_inode() helper

    Linus Torvalds
     
  • Since 32b8a8e59c9c ("sit: add IPv4 over IPv4 support")
    ipip6_err() may be called for packets whose IP protocol is
    IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6.

    In the case of IPPROTO_IPIP packets the correct protocol value is not
    passed to ipv4_update_pmtu() or ipv4_redirect().

    This patch resolves this problem by using the IP protocol of the packet
    rather than a hard-coded value. This appears to be consistent
    with the usage of the protocol of a packet by icmp_socket_deliver()
    the caller of ipip6_err().

    I was able to exercise the redirect case by using a setup where an ICMP
    redirect was received for the destination of the encapsulated packet.
    However, it appears that although incorrect the protocol field is not used
    in this case and thus no problem manifests. On inspection it does not
    appear that a problem will manifest in the fragmentation needed/update pmtu
    case either.

    In short I believe this is a cosmetic fix. None the less, the use of
    IPPROTO_IPV6 seems wrong and confusing.

    Reviewed-by: Dinan Gunawardena
    Signed-off-by: Simon Horman
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Simon Horman
     

16 Jun, 2016

2 commits

  • The ctx structure passed into bpf programs is different depending on bpf
    program type. The verifier incorrectly marked ctx->data and ctx->data_end
    access based on ctx offset only. That caused loads in tracing programs
    int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
    to be incorrectly marked as PTR_TO_PACKET which later caused verifier
    to reject the program that was actually valid in tracing context.
    Fix this by doing program type specific matching of ctx offsets.

    Fixes: 969bf05eb3ce ("bpf: direct packet access")
    Reported-by: Sasha Goldshtein
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • 1) gre_parse_header() can be called from gre_err()

    At this point transport header points to ICMP header, not the inner
    header.

    2) We can not really change transport header as ipgre_err() will later
    assume transport header still points to ICMP header (using icmp_hdr())

    3) pskb_may_pull() logic in gre_parse_header() really works
    if we are interested at zone pointed by skb->data

    4) As Jiri explained in commit b7f8fe251e46 ("gre: do not pull header in
    ICMP error processing") we should not pull headers in error handler.

    So this fix :

    A) changes gre_parse_header() to use skb->data instead of
    skb_transport_header()

    B) Adds a nhs parameter to gre_parse_header() so that we can skip the
    not pulled IP header from error path.
    This offset is 0 for normal receive path.

    C) remove obsolete IPV6 includes

    Signed-off-by: Eric Dumazet
    Cc: Tom Herbert
    Cc: Maciej Żenczykowski
    Cc: Jiri Benc
    Signed-off-by: David S. Miller

    Eric Dumazet