Eric Lee / smarc-fsl-linux-kernel

06 Jul, 2016

2 commits

903ce4abd ipv6: Fix mem leak in rt6i_pcpu ... Browse Code »

It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581

free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

However, after fixing a deadlock bug in
commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

kmemleak somehow did not report it. We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).

Signed-off-by: Martin KaFai Lau
Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
Reported-by: Petr Novopashenniy
Tested-by: Petr Novopashenniy
Acked-by: Hannes Frederic Sowa
Cc: Hannes Frederic Sowa
Cc: Petr Novopashenniy
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-07-06 05:09:23 +0800
ab58298cf net: fix decnet rtnexthop parsing ... Browse Code »

dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
(i.e. if userspace passes a malformed netlink message).

Let's use the helpers from net/nexthop.h which take care of all this
stuff. We can do exactly the same as e.g. fib_count_nexthops() and
fib_get_nhs() from net/ipv4/fib_semantics.c.

This fixes the softlockup for me.

Cc: Thomas Graf
Signed-off-by: Vegard Nossum
Signed-off-by: David S. Miller

Vegard Nossum
2016-07-06 05:08:47 +0800

05 Jul, 2016

1 commit

3dad5424a RDS: fix rds_tcp_init() error path ... Browse Code »

If register_pernet_subsys() fails, we shouldn't try to call
unregister_pernet_subsys().

Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Cc: stable@vger.kernel.org
Cc: Sowmini Varadhan
Cc: David S. Miller
Signed-off-by: Vegard Nossum
Acked-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Vegard Nossum
2016-07-05 07:09:49 +0800

02 Jul, 2016

3 commits

55e77a3e8 tipc: fix nl compat regression for link statistics ... Browse Code »

Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes
of the link name where left out.

Making the output of tipc-config -ls look something like:
Link statistics:
dcast-link
1:data0-1.1.2:data0
1:data0-1.1.3:data0

Also, for the record, the patch that introduce this regression
claims "Sending the whole object out can cause a leak". Which isn't
very likely as this is a compat layer, where the data we are parsing
is generated by us and we know the string to be NULL terminated. But
you can of course never be to secure.

Fixes: 5d2be1422e02 (tipc: fix an infoleak in tipc_nl_compat_link_dump)
Signed-off-by: Richard Alpe
Signed-off-by: David S. Miller

Richard Alpe
2016-07-02 04:47:38 +0800
82a31b923 net_sched: fix mirrored packets checksum ... Browse Code »

Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim
Cc: Tom Herbert
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-07-02 04:19:34 +0800
eb70db875 packet: Use symmetric hash for PACKET_FANOUT_HASH. ... Browse Code »

People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.

The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.

But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.

Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.

Reported-by: Eric Leblond
Tested-by: Eric Leblond
Signed-off-by: David S. Miller

David S. Miller
2016-07-02 04:07:50 +0800

30 Jun, 2016

2 commits

fedbb6b4f ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_output ... Browse Code »

ip_skb_dst_mtu uses skb->sk, assuming it is an AF_INET socket (e.g. it
calls ip_sk_use_pmtu which casts sk as an inet_sk).

However, in the case of UDP tunneling, the skb->sk is not necessarily an
inet socket (could be AF_PACKET socket, or AF_UNSPEC if arriving from
tun/tap).

OTOH, the sk passed as an argument throughout IP stack's output path is
the one which is of PMTU interest:
- In case of local sockets, sk is same as skb->sk;
- In case of a udp tunnel, sk is the tunneling socket.

Fix, by passing ip_finish_output's sk to ip_skb_dst_mtu.
This augments 7026b1ddb6 'netfilter: Pass socket pointer down through okfn().'

Signed-off-by: Shmulik Ladkani
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Shmulik Ladkani
2016-06-30 21:02:48 +0800
32826ac41 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"I've been traveling so this accumulates more than week or so of bug
fixing. It perhaps looks a little worse than it really is.

1) Fix deadlock in ath10k driver, from Ben Greear.

2) Increase scan timeout in iwlwifi, from Luca Coelho.

3) Unbreak STP by properly reinjecting STP packets back into the
stack. Regression fix from Ido Schimmel.

4) Mediatek driver fixes (missing malloc failure checks, leaking of
scratch memory, wrong indexing when mapping TX buffers, etc.) from
John Crispin.

5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
Sowa.

6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.

7) Fix netlink notifications in ovs for tunnels, delete link messages
are never emitted because of how the device registry state is
handled. From Nicolas Dichtel.

8) Conntrack module leaks kmemcache on unload, from Florian Westphal.

9) Prevent endless jump loops in nft rules, from Liping Zhang and
Pablo Neira Ayuso.

10) Not early enough spinlock initialization in mlx4, from Eric
Dumazet.

11) Bind refcount leak in act_ipt, from Cong WANG.

12) Missing RCU locking in HTB scheduler, from Florian Westphal.

13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
barrier, using heap for SG and IV, and erroneous use of async flag
when allocating AEAD conext.)

14) RCU handling fix in TIPC, from Ying Xue.

15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
SIT driver, from Simon Horman.

16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.

17) Fix potential deadlock in team enslave, from Ido Schimmel.

18) Memory leak in KCM procfs handling, from Jiri Slaby.

19) ESN generation fix in ipv4 ESP, from Herbert Xu.

20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
WANG.

21) Use after free in netem, from Eric Dumazet.

22) Uninitialized last assert time in multicast router code, from Tom
Goff.

23) Skip raw sockets in sock_diag destruction broadcast, from Willem
de Bruijn.

24) Fix link status reporting in thunderx, from Sunil Goutham.

25) Limit resegmentation of retransmit queue so that we do not
retransmit too large GSO frames. From Eric Dumazet.

26) Delay bpf program release after grace period, from Daniel
Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
openvswitch: fix conntrack netlink event delivery
qed: Protect the doorbell BAR with the write barriers.
neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
e1000e: keep VLAN interfaces functional after rxvlan off
cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
bpf, perf: delay release of BPF prog after grace period
net: bridge: fix vlan stats continue counter
tcp: do not send too big packets at retransmit time
ibmvnic: fix to use list_for_each_safe() when delete items
net: thunderx: Fix TL4 configuration for secondary Qsets
net: thunderx: Fix link status reporting
net/mlx5e: Reorganize ethtool statistics
net/mlx5e: Fix number of PFC counters reported to ethtool
net/mlx5e: Prevent adding the same vxlan port
net/mlx5e: Check for BlueFlame capability before allocating SQ uar
net/mlx5e: Change enum to better reflect usage
net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
net/mlx5: Update command strings
net: marvell: Add separate config ANEG function for Marvell 88E1111
...

Linus Torvalds
2016-06-30 02:50:42 +0800

29 Jun, 2016

11 commits

751ad819b Merge tag 'mac80211-for-davem-2016-06-29-v2' of git://git.kernel.org/pub/scm/lin… ... Browse Code »

…ux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two small fixes
* fix mesh peer link counter, decrement wasn't always done at all
* fix ethertype (length) for packets without RFC 1042 or bridge
tunnel header
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2016-06-29 20:33:46 +0800
d913d3a76 openvswitch: fix conntrack netlink event delivery ... Browse Code »

Only the first and last netlink message for a particular conntrack are
actually sent. The first message is sent through nf_conntrack_confirm when
the conntrack is committed. The last one is sent when the conntrack is
destroyed on timeout. The other conntrack state change messages are not
advertised.

When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
is called for each packet, from the postrouting hook, which in turn calls
nf_ct_deliver_cached_events to send the state change netlink messages.

This commit fixes the problem by calling nf_ct_deliver_cached_events in the
non-commit case as well.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
CC: Joe Stringer
CC: Justin Pettit
CC: Andy Zhou
CC: Thomas Graf
Signed-off-by: Samuel Gauthier
Acked-by: Joe Stringer
Signed-off-by: David S. Miller

Samuel Gauthier
2016-06-29 20:13:59 +0800
b560f03dd neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() ... Browse Code »

neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso
Signed-off-by: Lennert Buytenhek
Acked-by: David Ahern
Acked-by: Robert Shearman
Signed-off-by: David S. Miller

David Barroso
2016-06-29 19:58:28 +0800
c041778c9 cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header ... Browse Code »

The PDU length of incoming LLC frames is set to the total skb payload size
in __ieee80211_data_to_8023() of net/wireless/util.c which incorrectly
includes the length of the IEEE 802.11 header.

The resulting LLC frame header has a too large PDU length, causing the
llc_fixup_skb() function of net/llc/llc_input.c to reject the incoming
skb, effectively breaking STP.

Solve the problem by properly substracting the IEEE 802.11 frame header size
from the PDU length, allowing the LLC processor to pick up the incoming
control messages.

Special thanks to Gerry Rozema for tracking down the regression and proposing
a suitable patch.

Fixes: 2d1c304cb2d5 ("cfg80211: add function for 802.3 conversion with separate output buffer")
Cc: stable@vger.kernel.org
Reported-by: Gerry Rozema
Signed-off-by: Felix Fietkau
Signed-off-by: Johannes Berg

Felix Fietkau
2016-06-29 17:50:33 +0800
565ce8f32 net: bridge: fix vlan stats continue counter ... Browse Code »

I made a dumb off-by-one mistake when I added the vlan stats counter
dumping code. The increment should happen before the check, not after
otherwise we miss one entry when we continue dumping.

Fixes: a60c090361ea ("bridge: netlink: export per-vlan stats")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2016-06-29 17:33:35 +0800
a3d2e9f8e tcp: do not send too big packets at retransmit time ... Browse Code »

Arjun reported a bug in TCP stack and bisected it to a recent commit.

In case where we process SACK, we can coalesce multiple skbs
into fat ones (tcp_shift_skb_data()), to lower write queue
overhead, because we do not expect to retransmit these packets.

However, SACK reneging can happen, forcing the sender to retransmit
all these packets. If skb->len is above 64KB, we then send buggy
IP packets that could hang TSO engine on cxgb4.

Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
so that we cook packets of optimal size vs TCP/pacing.

Thanks to Arjun for reporting the bug and running the tests !

Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Eric Dumazet
Reported-by: Arjun V
Tested-by: Arjun V
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Eric Dumazet
2016-06-29 17:25:11 +0800
420cb1b76 batman-adv: Clean up untagged vlan when destroying via rtnl-link ... Browse Code »

The untagged vlan object is only destroyed when the interface is removed
via the legacy sysfs interface. But it also has to be destroyed when the
standard rtnl-link interface is used.

Fixes: 5d2c05b21337 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: Sven Eckelmann
Acked-by: Antonio Quartulli
Signed-off-by: Marek Lindner
Signed-off-by: David S. Miller

Sven Eckelmann
2016-06-29 16:01:48 +0800
3b55e4422 batman-adv: Fix ICMP RR ethernet access after skb_linearize ... Browse Code »

The skb_linearize may reallocate the skb. This makes the calculated pointer
for ethhdr invalid. But it the pointer is used later to fill in the RR
field of the batadv_icmp_packet_rr packet.

Instead re-evaluate eth_hdr after the skb_linearize+skb_cow to fix the
pointer and avoid the invalid read.

Fixes: da6b8c20a5b8 ("batman-adv: generalize batman-adv icmp packet handling")
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: David S. Miller

Sven Eckelmann
2016-06-29 16:01:48 +0800
baceced93 batman-adv: Fix double-put of vlan object ... Browse Code »

Each batadv_tt_local_entry hold a single reference to a
batadv_softif_vlan. In case a new entry cannot be added to the hash
table, the error path puts the reference, but the reference will also
now be dropped by batadv_tt_local_entry_release().

Fixes: a33d970d0b54 ("batman-adv: Fix reference counting of vlan object for tt_local_entry")
Signed-off-by: Ben Hutchings
Signed-off-by: Marek Lindner
Signed-off-by: Sven Eckelmann
Signed-off-by: David S. Miller

Ben Hutchings
2016-06-29 16:01:47 +0800
9c4604a29 batman-adv: Fix use-after-free/double-free of tt_req_node ... Browse Code »

The tt_req_node is added and removed from a list inside a spinlock. But the
locking is sometimes removed even when the object is still referenced and
will be used later via this reference. For example batadv_send_tt_request
can create a new tt_req_node (including add to a list) and later
re-acquires the lock to remove it from the list and to free it. But at this
time another context could have already removed this tt_req_node from the
list and freed it.

CPU#0

batadv_batman_skb_recv from net_device 0
-> batadv_iv_ogm_receive
-> batadv_iv_ogm_process
-> batadv_iv_ogm_process_per_outif
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_ogm_handler_v1
-> batadv_tt_update_orig
-> batadv_send_tt_request
-> batadv_tt_req_node_new
spin_lock(...)
allocates new tt_req_node and adds it to list
spin_unlock(...)
return tt_req_node

CPU#1

batadv_batman_skb_recv from net_device 1
-> batadv_recv_unicast_tvlv
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_unicast_handler_v1
-> batadv_handle_tt_response
spin_lock(...)
tt_req_node gets removed from list and is freed
spin_unlock(...)

CPU#0

Tested-by: Martin Weinelt
Tested-by: Amadeus Alfa
Signed-off-by: Marek Lindner
Signed-off-by: David S. Miller

Sven Eckelmann
2016-06-29 16:01:47 +0800
0b3dd7dfb batman-adv: replace WARN with rate limited output on non-existing VLAN ... Browse Code »

If a VLAN tagged frame is received and the corresponding VLAN is not
configured on the soft interface, it will splat a WARN on every packet
received. This is a quite annoying behaviour for some scenarios, e.g. if
bat0 is bridged with eth0, and there are arbitrary VLAN tagged frames
from Ethernet coming in without having any VLAN configuration on bat0.

The code should probably create vlan objects on the fly and
transparently transport these VLAN-tagged Ethernet frames, but until
this is done, at least the WARN splat should be replaced by a rate
limited output.

Fixes: 354136bcc3c4 ("batman-adv: fix kernel crash due to missing NULL checks")
Signed-off-by: Simon Wunderlich
Signed-off-by: Marek Lindner
Signed-off-by: Sven Eckelmann
Signed-off-by: David S. Miller

Simon Wunderlich
2016-06-29 16:01:47 +0800

28 Jun, 2016

3 commits

0888d5f3c Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address ... Browse Code »

The bridge is falsly dropping ipv6 mulitcast packets if there is:
1. No ipv6 address assigned on the brigde.
2. No external mld querier present.
3. The internal querier enabled.

When the bridge fails to build mld queries, because it has no
ipv6 address, it slilently returns, but keeps the local querier enabled.
This specific case causes confusing packet loss.

Ipv6 multicast snooping can only work if:
a) An external querier is present
OR
b) The bridge has an ipv6 address an is capable of sending own queries

Otherwise it has to forward/flood the ipv6 multicast traffic,
because snooping cannot work.

This patch fixes the issue by adding a flag to the bridge struct that
indicates that there is currently no ipv6 address assinged to the bridge
and returns a false state for the local querier in
__br_multicast_querier_exists().

Special thanks to Linus Lüssing.

Fixes: d1d81d4c3dd8 ("bridge: check return value of ipv6_dev_get_saddr()")
Signed-off-by: Daniel Danzberger
Acked-by: Linus Lüssing
Signed-off-by: David S. Miller

Daniel
2016-06-28 20:03:04 +0800
126e75573 mac80211: Fix mesh estab_plinks counting in STA removal case ... Browse Code »

If a user space program (e.g., wpa_supplicant) deletes a STA entry that
is currently in NL80211_PLINK_ESTAB state, the number of established
plinks counter was not decremented and this could result in rejecting
new plink establishment before really hitting the real maximum plink
limit. For !user_mpm case, this decrementation is handled by
mesh_plink_deactive().

Fix this by decrementing estab_plinks on STA deletion
(mesh_sta_cleanup() gets called from there) so that the counter has a
correct value and the Beacon frame advertisement in Mesh Configuration
element shows the proper value for capability to accept additional
peers.

Cc: stable@vger.kernel.org
Signed-off-by: Jouni Malinen
Signed-off-by: Johannes Berg

Jouni Malinen
2016-06-28 18:39:50 +0800
70a0dec45 ipmr/ip6mr: Initialize the last assert time of mfc entries. ... Browse Code »

This fixes wrong-interface signaling on 32-bit platforms for entries
created when jiffies > 2^31 + MFC_ASSERT_THRESH.

Signed-off-by: Tom Goff
Signed-off-by: David S. Miller

Tom Goff
2016-06-28 16:14:09 +0800

27 Jun, 2016

2 commits

4192f672f vsock: make listener child lock ordering explicit ... Browse Code »

There are several places where the listener and pending or accept queue
child sockets are accessed at the same time. Lockdep is unhappy that
two locks from the same class are held.

Tell lockdep that it is safe and document the lock ordering.

Originally Claudio Imbrenda sent a similar
patch asking whether this is safe. I have audited the code and also
covered the vsock_pending_work() function.

Suggested-by: Claudio Imbrenda
Signed-off-by: Stefan Hajnoczi
Signed-off-by: David S. Miller

Stefan Hajnoczi
2016-06-27 22:44:46 +0800
48f1dcb55 ipv6: enforce egress device match in per table nexthop lookups ... Browse Code »

with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
nexthop lookups"), net hop lookup is first performed on route creation
in the passed-in table.
However device match is not enforced in table lookup, so the found
route can be later discarded due to egress device mismatch and no
global lookup will be performed.
This cause the following to fail:

ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link set dummy1 up
ip link set dummy2 up
ip route add 2001:db8:8086::/48 dev dummy1 metric 20
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
ip route add 2001:db8:8086::/48 dev dummy2 metric 21
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
RTNETLINK answers: No route to host

This change fixes the issue enforcing device lookup in
ip6_nh_lookup_table()

v1->v2: updated commit message title

Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Reported-and-tested-by: Beniamino Galvani
Signed-off-by: Paolo Abeni
Acked-by: David Ahern
Signed-off-by: David S. Miller

Paolo Abeni
2016-06-27 22:37:20 +0800

24 Jun, 2016

3 commits

21de12ee5 netem: fix a use after free ... Browse Code »

If the packet was dropped by lower qdisc, then we must not
access it later.

Save qdisc_pkt_len(skb) in a temp variable.

Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Eric Dumazet
Cc: WANG Cong
Cc: Jamal Hadi Salim
Cc: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2016-06-24 03:07:44 +0800
817e9f2c5 act_ife: acquire ife_mod_lock before reading ifeoplist ... Browse Code »

Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-06-24 00:02:36 +0800
067a7cd06 act_ife: only acquire tcf_lock for existing actions ... Browse Code »

Alexey reported that we have GFP_KERNEL allocation when
holding the spinlock tcf_lock. Actually we don't have
to take that spinlock for all the cases, especially
for the new one we just create. To modify the existing
actions, we still need this spinlock to make sure
the whole update is atomic.

For net-next, we can get rid of this spinlock because
we already hold the RTNL lock on slow path, and on fast
path we can use RCU to protect the metalist.

Joint work with Jamal.

Reported-by: Alexey Khoroshilov
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-06-24 00:02:36 +0800

23 Jun, 2016

3 commits

962fcef33 esp: Fix ESN generation under UDP encapsulation ... Browse Code »

Blair Steven noticed that ESN in conjunction with UDP encapsulation
is broken because we set the temporary ESP header to the wrong spot.

This patch fixes this by first of all using the right spot, i.e.,
4 bytes off the real ESP header, and then saving this information
so that after encryption we can restore it properly.

Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
Reported-by: Blair Steven
Signed-off-by: Herbert Xu
Acked-by: Steffen Klassert
Signed-off-by: David S. Miller

Herbert Xu
2016-06-23 23:52:00 +0800
27777daa8 tipc: unclone unbundled buffers before forwarding ... Browse Code »

When extracting an individual message from a received "bundle" buffer,
we just create a clone of the base buffer, and adjust it to point into
the right position of the linearized data area of the latter. This works
well for regular message reception, but during periods of extremely high
load it may happen that an extracted buffer, e.g, a connection probe, is
reversed and forwarded through an external interface while the preceding
extracted message is still unhandled. When this happens, the header or
data area of the preceding message will be partially overwritten by a
MAC header, leading to unpredicatable consequences, such as a link
reset.

We now fix this by ensuring that the msg_reverse() function never
returns a cloned buffer, and that the returned buffer always contains
sufficient valid head and tail room to be forwarded.

Reported-by: Erik Hugne
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-06-23 04:33:35 +0800
d19af0a76 kcm: fix /proc memory leak ... Browse Code »

Every open of /proc/net/kcm leaks 16 bytes of memory as is reported by
kmemleak:
unreferenced object 0xffff88059c0e3458 (size 192):
comm "cat", pid 1401, jiffies 4294935742 (age 310.720s)
hex dump (first 32 bytes):
28 45 71 96 05 88 ff ff 00 10 00 00 00 00 00 00 (Eq.............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmem_cache_alloc_trace+0x16e/0x230
[] seq_open+0x79/0x1d0
[] kcm_seq_open+0x0/0x30 [kcm]
[] seq_open+0x79/0x1d0
[] __seq_open_private+0x2f/0xa0
[] seq_open_net+0x38/0xa0
...

It is caused by a missing free in the ->release path. So fix it by
providing seq_release_net as the ->release method.

Signed-off-by: Jiri Slaby
Fixes: cd6e111bf5 (kcm: Add statistics and proc interfaces)
Cc: "David S. Miller"
Cc: Tom Herbert
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Jiri Slaby
2016-06-23 04:32:23 +0800

19 Jun, 2016

2 commits

5c3da57d7 net: rds: fix coding style issues ... Browse Code »

Fix coding style issues in the following files:

ib_cm.c: add space
loop.c: convert spaces to tabs
sysctl.c: add space
tcp.h: convert spaces to tabs
tcp_connect.c:remove extra indentation in switch statement
tcp_recv.c: convert spaces to tabs
tcp_send.c: convert spaces to tabs
transport.c: move brace up one line on for statement

Signed-off-by: Joshua Houghton
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Joshua Houghton
2016-06-19 12:34:09 +0800
4a7d99ea1 AX.25: Close socket connection on session completion ... Browse Code »

A socket connection made in ax.25 is not closed when session is
completed. The heartbeat timer is stopped prematurely and this is
where the socket gets closed. Allow heatbeat timer to run to close
socket. Symptom occurs in kernels >= 4.2.0

Originally sent 6/15/2016. Resend with distribution list matching
scripts/maintainer.pl output.

Signed-off-by: Basil Gunn
Signed-off-by: David S. Miller

Basil Gunn
2016-06-19 11:55:34 +0800

18 Jun, 2016

3 commits

3bb549ae4 RDS: TCP: rds_tcp_accept_one() should transition socket from RESETTING to UP ... Browse Code »

The state of the rds_connection after rds_tcp_reset_callbacks() would
be RDS_CONN_RESETTING and this is the value that should be passed
by rds_tcp_accept_one() to rds_connect_path_complete() to transition
the socket to RDS_CONN_UP.

Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
by rds_tcp_accept_one()")
Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-06-18 13:29:54 +0800
f1d048f24 tipc: fix socket timer deadlock ... Browse Code »

We sometimes observe a 'deadly embrace' type deadlock occurring
between mutually connected sockets on the same node. This happens
when the one-hour peer supervision timers happen to expire
simultaneously in both sockets.

The scenario is as follows:

CPU 1: CPU 2:
-------- --------
tipc_sk_timeout(sk1) tipc_sk_timeout(sk2)
lock(sk1.slock) lock(sk2.slock)
msg_create(probe) msg_create(probe)
unlock(sk1.slock) unlock(sk2.slock)
tipc_node_xmit_skb() tipc_node_xmit_skb()
tipc_node_xmit() tipc_node_xmit()
tipc_sk_rcv(sk2) tipc_sk_rcv(sk1)
lock(sk2.slock) lock((sk1.slock)
filter_rcv() filter_rcv()
tipc_sk_proto_rcv() tipc_sk_proto_rcv()
msg_create(probe_rsp) msg_create(probe_rsp)
tipc_sk_respond() tipc_sk_respond()
tipc_node_xmit_skb() tipc_node_xmit_skb()
tipc_node_xmit() tipc_node_xmit()
tipc_sk_rcv(sk1) tipc_sk_rcv(sk2)
lock((sk1.slock) lock((sk2.slock)
===> DEADLOCK ===> DEADLOCK

Further analysis reveals that there are three different locations in the
socket code where tipc_sk_respond() is called within the context of the
socket lock, with ensuing risk of similar deadlocks.

We now solve this by passing a buffer queue along with all upcalls where
sk_lock.slock may potentially be held. Response or rejected message
buffers are accumulated into this queue instead of being sent out
directly, and only sent once we know we are safely outside the slock
context.

Reported-by: GUNA
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-06-18 12:38:10 +0800
695ef16cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are rather small patches but fixing several outstanding bugs in
nf_conntrack and nf_tables, as well as minor problems with missing
SYNPROXY header uapi installation:

1) Oneliner not to leak conntrack kmemcache on module removal, this
problem was introduced in the previous merge window, patch from
Florian Westphal.

2) Two fixes for insufficient ruleset loop validation, one due to
incorrect flag check in nf_tables_bind_set() and another related to
silly wrong generation mask logic from the walk path, from Liping
Zhang.

3) Fix double-free of anonymous sets on error, this fix simplifies the
code to let the abort path take care of releasing the set object,
also from Liping Zhang.

4) The introduction of helper function for transactions broke the skip
inactive rules logic from the nft_do_chain(), again from Liping
Zhang.

5) Two patches to install uapi xt_SYNPROXY.h header and calm down
kbuild robot due to missing #include .
====================

Signed-off-by: David S. Miller

David S. Miller
2016-06-18 10:50:04 +0800

17 Jun, 2016

3 commits

41ef72181 Merge tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd bugfixes from Bruce Fields:
"Oleg Drokin found and fixed races in the nfsd4 state code that go back
to the big nfs4_lock_state removal around 3.17 (but that were also
probably hard to reproduce before client changes in 3.20 allowed the
client to perform parallel opens).

Also fix a 4.1 backchannel crash due to rpc multipath changes in 4.6.
Trond acked the client-side rpc fixes going through my tree"

* tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linux:
nfsd: Make init_open_stateid() a bit more whole
nfsd: Extend the mutex holding region around in nfsd4_process_open2()
nfsd: Always lock state exclusively.
rpc: share one xps between all backchannels
nfsd4/rpc: move backchannel create logic into rpc code
SUNRPC: fix xprt leak on xps allocation failure
nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfix

Linus Torvalds
2016-06-17 11:25:52 +0800
9c514bedb Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs fixes from Miklos Szeredi:
"This contains two regression fixes: one for the xattr API update and
one for using the mounter's creds in file creation in overlayfs.

There's also a fix for a bug in handling hard linked AF_UNIX sockets
that's been there from day one. This fix is overlayfs only despite
the fact that it touches code outside the overlay filesystem: d_real()
is an identity function for all except overlay dentries"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix uid/gid when creating over whiteout
ovl: xattr filter fix
af_unix: fix hard linked sockets on overlay
vfs: add d_real_inode() helper

Linus Torvalds
2016-06-17 11:16:56 +0800
d5d8760b7 sit: correct IP protocol used in ipip6_err ... Browse Code »

Since 32b8a8e59c9c ("sit: add IPv4 over IPv4 support")
ipip6_err() may be called for packets whose IP protocol is
IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6.

In the case of IPPROTO_IPIP packets the correct protocol value is not
passed to ipv4_update_pmtu() or ipv4_redirect().

This patch resolves this problem by using the IP protocol of the packet
rather than a hard-coded value. This appears to be consistent
with the usage of the protocol of a packet by icmp_socket_deliver()
the caller of ipip6_err().

I was able to exercise the redirect case by using a setup where an ICMP
redirect was received for the destination of the encapsulated packet.
However, it appears that although incorrect the protocol field is not used
in this case and thus no problem manifests. On inspection it does not
appear that a problem will manifest in the fragmentation needed/update pmtu
case either.

In short I believe this is a cosmetic fix. None the less, the use of
IPPROTO_IPV6 seems wrong and confusing.

Reviewed-by: Dinan Gunawardena
Signed-off-by: Simon Horman
Acked-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

Simon Horman
2016-06-17 08:10:30 +0800

16 Jun, 2016

2 commits

19de99f70 bpf: fix matching of data/data_end in verifier ... Browse Code »

The ctx structure passed into bpf programs is different depending on bpf
program type. The verifier incorrectly marked ctx->data and ctx->data_end
access based on ctx offset only. That caused loads in tracing programs
int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
to be incorrectly marked as PTR_TO_PACKET which later caused verifier
to reject the program that was actually valid in tracing context.
Fix this by doing program type specific matching of ctx offsets.

Fixes: 969bf05eb3ce ("bpf: direct packet access")
Reported-by: Sasha Goldshtein
Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
2016-06-16 14:37:54 +0800
e582615ad gre: fix error handler ... Browse Code »

1) gre_parse_header() can be called from gre_err()

At this point transport header points to ICMP header, not the inner
header.

2) We can not really change transport header as ipgre_err() will later
assume transport header still points to ICMP header (using icmp_hdr())

3) pskb_may_pull() logic in gre_parse_header() really works
if we are interested at zone pointed by skb->data

4) As Jiri explained in commit b7f8fe251e46 ("gre: do not pull header in
ICMP error processing") we should not pull headers in error handler.

So this fix :

A) changes gre_parse_header() to use skb->data instead of
skb_transport_header()

B) Adds a nhs parameter to gre_parse_header() so that we can skip the
not pulled IP header from error path.
This offset is 0 for normal receive path.

C) remove obsolete IPV6 includes

Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Cc: Maciej Żenczykowski
Cc: Jiri Benc
Signed-off-by: David S. Miller

Eric Dumazet
2016-06-16 13:15:21 +0800