Eric Lee / smarc-fsl-linux-kernel

28 Sep, 2016

1 commit

078cd8279 fs: Replace CURRENT_TIME with current_time() for inode timestamps ... Browse Code »

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
Acked-by: Felipe Balbi
Acked-by: Steven Whitehouse
Acked-by: Ryusuke Konishi
Acked-by: David Sterba
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:21 +0800

31 Aug, 2016

1 commit

0cf21c660 Merge tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

Stable patches:
- Fix a refcount leak in nfs_callback_up_net
- Fix an Oopsable condition when the flexfile pNFS driver connection
to the DS fails
- Fix an Oopsable condition in NFSv4.1 server callback races
- Ensure pNFS clients stop doing I/O to the DS if their lease has
expired, as required by the NFSv4.1 protocol

Bugfixes:
- Fix potential looping in the NFSv4.x migration code
- Patch series to close callback races for OPEN, LAYOUTGET and
LAYOUTRETURN
- Silence WARN_ON when NFSv4.1 over RDMA is in use
- Fix a LAYOUTCOMMIT race in the pNFS/blocks client
- Fix pNFS timeout issues when the DS fails"

* tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.x: Fix a refcount leak in nfs_callback_up_net
NFS4: Avoid migration loops
pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
NFSv4.1: Defer bumping the slot sequence number until we free the slot
NFSv4.1: Delay callback processing when there are referring triples
NFSv4.1: Fix Oopsable condition in server callback races
SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
pnfs/blocklayout: update last_write_offset atomically with extents
pNFS: The client must not do I/O to the DS if it's lease has expired
pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
pNFS/flexfiles: Set reasonable default retrans values for the data channel
NFS: Allow the mount option retrans=0
pNFS/flexfiles: Fix layoutstat periodic reporting

Linus Torvalds
2016-08-31 02:14:02 +0800

27 Aug, 2016

1 commit

5c1f5b457 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth ... Browse Code »

Johan Hedberg says:

====================
pull request: bluetooth 2016-08-25

Here are a couple of important Bluetooth fixes for the 4.8 kernel:

- Memory leak fix for HCI requests
- Fix sk_filter handling with L2CAP
- Fix sock_recvmsg behavior when MSG_TRUNC is not set

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-27 12:09:17 +0800

26 Aug, 2016

4 commits

166ee5b87 qdisc: fix a module refcount leak in qdisc_create_dflt() ... Browse Code »

Should qdisc_alloc() fail, we must release the module refcount
we got right before.

Fixes: 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline")
Signed-off-by: Eric Dumazet
Acked-by: John Fastabend
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-26 07:44:20 +0800
a5de125dd tipc: fix the error handling in tipc_udp_enable() ... Browse Code »

Fix to return a negative error code in enable_mcast() error handling
case, and release udp socket when necessary.

Fixes: d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2016-08-26 07:32:34 +0800
4f34228b6 Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set ... Browse Code »

Similar to bt_sock_recvmsg MSG_TRUNC shall be checked using the original
flags not msg_flags.

Signed-off-by: Luiz Augusto von Dentz
Signed-off-by: Marcel Holtmann

Luiz Augusto von Dentz
2016-08-26 02:58:47 +0800
90a56f72e Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not set ... Browse Code »

Commit b5f34f9420b50c9b5876b9a2b68e96be6d629054 attempt to introduce
proper handling for MSG_TRUNC but recv and variants should still work
as read if no flag is passed, but because the code may set MSG_TRUNC to
msg->msg_flags that shall not be used as it may cause it to be behave as
if MSG_TRUNC is always, so instead of using it this changes the code to
use the flags parameter which shall contain the original flags.

Signed-off-by: Luiz Augusto von Dentz
Signed-off-by: Marcel Holtmann

Luiz Augusto von Dentz
2016-08-26 02:58:47 +0800

25 Aug, 2016

1 commit

16590a228 SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use ... Browse Code »

Using NFSv4.1 on RDMA should be safe, so broaden the new checks in
rpc_create().

WARN_ON_ONCE is used, matching most other WARN call sites in clnt.c.

Fixes: 39a9beab5acb ("rpc: share one xps between all backchannels")
Fixes: d50039ea5ee6 ("nfsd4/rpc: move backchannel create logic...")
Signed-off-by: Chuck Lever
Reviewed-by: J. Bruce Fields
Signed-off-by: Trond Myklebust

Chuck Lever
2016-08-25 10:32:55 +0800

24 Aug, 2016

7 commits

dbb50887c Bluetooth: split sk_filter in l2cap_sock_recv_cb ... Browse Code »

During an audit for sk_filter(), we found that rx_busy_skb handling
in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as
intended.

The assumption from commit e328140fdacb ("Bluetooth: Use event-driven
approach for handling ERTM receive buffer") is that errors returned
from sock_queue_rcv_skb() are due to receive buffer shortage. However,
nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on
the socket, that could drop some of the incoming skbs when handled in
sock_queue_rcv_skb().

In that case sock_queue_rcv_skb() will return with -EPERM, propagated
from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was
that we failed due to receive buffer being full. From that point onwards,
due to the to-be-dropped skb being held in rx_busy_skb, we cannot make
any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(),
due to the filter drop verdict over and over coming from sk_filter().
Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being
dropped due to rx_busy_skb being occupied.

Instead, just use __sock_queue_rcv_skb() where an error really tells that
there's a receive buffer issue. Split the sk_filter() and enable it for
non-segmented modes at queuing time since at this point in time the skb has
already been through the ERTM state machine and it has been acked, so dropping
is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in
l2cap_data_rcv() so the packet can be dropped before the state machine sees it.

Fixes: e328140fdacb ("Bluetooth: Use event-driven approach for handling ERTM receive buffer")
Signed-off-by: Daniel Borkmann
Signed-off-by: Mat Martineau
Acked-by: Willem de Bruijn
Signed-off-by: Marcel Holtmann

Daniel Borkmann
2016-08-24 22:55:04 +0800
9afee9493 Bluetooth: Fix memory leak at end of hci requests ... Browse Code »

In hci_req_sync_complete the event skb is referenced in hdev->req_skb.
It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will
pass the skb to the caller, or __hci_req_sync which leaks.

unreferenced object 0xffff880005339a00 (size 256):
comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s)
backtrace:
[] kmemleak_alloc+0x49/0xa0
[] kmem_cache_alloc+0x128/0x180
[] skb_clone+0x4f/0xa0
[] hci_event_packet+0xc1/0x3290
[] hci_rx_work+0x18b/0x360
[] process_one_work+0x14a/0x440
[] worker_thread+0x43/0x4d0
[] kthread+0xc4/0xe0
[] ret_from_fork+0x1f/0x40
[] 0xffffffffffffffff

Signed-off-by: Frédéric Dalleau
Signed-off-by: Marcel Holtmann

Frederic Dalleau
2016-08-24 22:49:29 +0800
d7226c7a4 net: diag: Fix refcnt leak in error path destroying socket ... Browse Code »

inet_diag_find_one_icsk takes a reference to a socket that is not
released if sock_diag_destroy returns an error. Fix by changing
tcp_diag_destroy to manage the refcnt for all cases and remove
the sock_put calls from tcp_abort.

Fixes: c1e64e298b8ca ("net: diag: Support destroying TCP sockets")
Reported-by: Lorenzo Colitti
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-08-24 14:11:36 +0800
75d855a5e udp: get rid of SLAB_DESTROY_BY_RCU allocations ... Browse Code »

After commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
we do not need this special allocation mode anymore, even if it is
harmless.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-24 08:46:17 +0800
232cb53a4 sctp: fix overrun in sctp_diag_dump_one() ... Browse Code »

The function sctp_diag_dump_one() currently performs a memcpy()
of 64 bytes from a 16 byte field into another 16 byte field. Fix
by using correct size, use sizeof to obtain correct size instead
of using a hard-coded constant.

Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Lance Richardson
Reviewed-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Lance Richardson
2016-08-24 08:22:53 +0800
20a2b49fc tcp: properly scale window in tcp_v[46]_reqsk_send_ack() ... Browse Code »

When sending an ack in SYN_RECV state, we must scale the offered
window if wscale option was negotiated and accepted.

Tested:
Following packetdrill test demonstrates the issue :

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// Establish a connection.
+0 < S 0:0(0) win 20000
+0 > S. 0:0(0) ack 1 win 28960

+0 < . 1:11(10) ack 1 win 156
// check that window is properly scaled !
+0 > . 1:1(0) ack 1 win 226

Signed-off-by: Eric Dumazet
Cc: Yuchung Cheng
Cc: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-24 07:55:49 +0800
e83c6744e udp: fix poll() issue with zero sized packets ... Browse Code »

Laura tracked poll() [and friends] regression caused by commit
e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")

udp_poll() needs to know if there is a valid packet in receive queue,
even if its payload length is 0.

Change first_packet_length() to return an signed int, and use -1
as the indication of an empty queue.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Laura Abbott
Signed-off-by: Eric Dumazet
Tested-by: Laura Abbott
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-24 07:39:14 +0800

23 Aug, 2016

3 commits

28a10c426 net sched: fix encoding to use real length ... Browse Code »

Encoding of the metadata was using the padded length as opposed to
the real length of the data which is a bug per specification.
This has not been an issue todate because all metadatum specified
so far has been 32 bit where aligned and data length are the same width.
This also includes a bug fix for validating the length of a u16 field.
But since there is no metadata of size u16 yes we are fine to include it
here.

While at it get rid of magic numbers.

Fixes: ef6980b6becb ("net sched: introduce IFE action")
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jamal Hadi Salim
2016-08-23 12:01:57 +0800
c0451fe1f net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset ... Browse Code »

In b8247f095e,

"net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"

gso skbs arriving from an ingress interface that go through UDP
tunneling, are allowed to be fragmented if the resulting encapulated
segments exceed the dst mtu of the egress interface.

This aligned the behavior of gso skbs to non-gso skbs going through udp
encapsulation path.

However the non-gso vs gso anomaly is present also in the following
cases of a GRE tunnel:
- ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
(e.g. OvS vport-gre with df_default=false)
- ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set

In both of the above cases, the non-gso skbs get fragmented, whereas the
gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
as they don't go through the segment+fragment code path.

Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.

Tunnels that do set IP_DF, will not go to fragmentation of segments.
This preserves behavior of ip_gre in (the default) pmtudisc mode.

Fixes: b8247f095e ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
Reported-by: wenxu
Cc: Hannes Frederic Sowa
Signed-off-by: Shmulik Ladkani
Tested-by: wenxu
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Shmulik Ladkani
2016-08-23 08:11:01 +0800
85b51b121 net: ipv6: Remove addresses for failures with strict DAD ... Browse Code »

If DAD fails with accept_dad set to 2, global addresses and host routes
are incorrectly left in place. Even though disable_ipv6 is set,
contrary to documentation, the addresses are not dynamically deleted
from the interface. It is only on a subsequent link down/up that these
are removed. The fix is not only to set the disable_ipv6 flag, but
also to call addrconf_ifdown(), which is the action to carry out when
disabling IPv6. This results in the addresses and routes being deleted
immediately. The DAD failure for the LL addr is determined as before
via netlink, or by the absence of the LL addr (which also previously
would have had to be checked for in case of an intervening link down
and up). As the call to addrconf_ifdown() requires an rtnl lock, the
logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().

Previous behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
5: eth3: mtu 1500 qlen 1000
inet6 2000::10/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
valid_lft forever preferred_lft forever
root@vm1:/# ip -6 route show dev eth3
2000::/64 proto kernel metric 256
fe80::/64 proto kernel metric 256
root@vm1:/# ip link set down eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#

New behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#

Signed-off-by: Mike Manning
Signed-off-by: David S. Miller

Mike Manning
2016-08-23 07:59:37 +0800

20 Aug, 2016

2 commits

56cff471d l2tp: Fix the connect status check in pppol2tp_getname ... Browse Code »

The sk->sk_state is bits flag, so need use bit operation check
instead of value check.

Signed-off-by: Gao Feng
Tested-by: Guillaume Nault
Signed-off-by: David S. Miller

Gao Feng
2016-08-20 08:55:43 +0800
4c2f24549 sctp: linearize early if it's not GSO ... Browse Code »

Because otherwise when crc computation is still needed it's way more
expensive than on a linear buffer to the point that it affects
performance.

It's so expensive that netperf test gives a perf output as below:

Overhead Command Shared Object Symbol
18,62% netserver [kernel.vmlinux] [k] crc32_generic_shift
2,57% netserver [kernel.vmlinux] [k] __pskb_pull_tail
1,94% netserver [kernel.vmlinux] [k] fib_table_lookup
1,90% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
1,66% swapper [kernel.vmlinux] [k] intel_idle
1,63% netserver [kernel.vmlinux] [k] _raw_spin_lock
1,59% netserver [sctp] [k] sctp_packet_transmit
1,55% netserver [kernel.vmlinux] [k] memcpy_erms
1,42% netserver [sctp] [k] sctp_rcv

# netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

212992 212992 12000 10.00 3016.42 2.88 3.78 1.874 2.462

After patch:
Overhead Command Shared Object Symbol
2,75% netserver [kernel.vmlinux] [k] memcpy_erms
2,63% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
2,39% netserver [kernel.vmlinux] [k] fib_table_lookup
2,04% netserver [kernel.vmlinux] [k] __pskb_pull_tail
1,91% netserver [kernel.vmlinux] [k] _raw_spin_lock
1,91% netserver [sctp] [k] sctp_packet_transmit
1,72% netserver [mlx4_en] [k] mlx4_en_process_rx_cq
1,68% netserver [sctp] [k] sctp_rcv

# netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

212992 212992 12000 10.00 3681.77 3.83 3.46 2.045 1.849

Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize")
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-08-20 08:09:42 +0800

19 Aug, 2016

2 commits

98a384eca fib_trie: Fix the description of pos and bits ... Browse Code »

1) Fix one typo: s/tn/tp/
2) Fix the description about the "u" bits.

Signed-off-by: Xunlei Pang
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller

Xunlei Pang
2016-08-19 14:51:23 +0800
53409afd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter updates for your net tree,
they are:

1) Dump only conntrack that belong to this namespace via /proc file.
This is some fallout from the conversion to single conntrack table
for all netns, patch from Liping Zhang.

2) Missing MODULE_ALIAS_NF_LOGGER() for the ARP family that prevents
module autoloading, also from Liping Zhang.

3) Report overquota event to the right netnamespace, again from Liping.

4) Fix tproxy listener sk refcount that leads to crash, from
Eric Dumazet.

5) Fix racy refcounting on object deletion from nfnetlink and rule
removal both for nfacct and cttimeout, from Liping Zhang.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-19 09:45:34 +0800

18 Aug, 2016

10 commits

b75911b66 netfilter: cttimeout: fix use after free error when delete netns ... Browse Code »

In general, when we want to delete a netns, cttimeout_net_exit will
be called before ipt_unregister_table, i.e. before ctnl_timeout_put.

But after call kfree_rcu in cttimeout_net_exit, we will still decrease
the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
and will cause a use after free error.

It is easy to reproduce this problem:
# while : ; do
ip netns add xxx
ip netns exec xxx nfct add timeout testx inet icmp timeout 200
ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
ip netns del xxx
done

=======================================================================
BUG kmalloc-96 (Tainted: G B E ): Poison overwritten
-----------------------------------------------------------------------
INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
0x6b
INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
age=104 cpu=0 pid=3330
___slab_alloc+0x4da/0x540
__slab_alloc+0x20/0x40
__kmalloc+0x1c8/0x240
cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
[ ... ]

So only when the refcnt decreased to 0, we call kfree_rcu to free the
timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
avoid race between ctnl_timeout_try_del and ctnl_timeout_put.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-18 21:17:00 +0800
12be15dd5 netfilter: nfnetlink_acct: fix race between nfacct del and xt_nfacct destroy ... Browse Code »

Suppose that we input the following commands at first:
# nfacct add test
# iptables -A INPUT -m nfacct --nfacct-name test

And now "test" acct's refcnt is 2, but later when we try to delete the
"test" nfacct and the related iptables rule at the same time, race maybe
happen:
CPU0 CPU1
nfnl_acct_try_del nfnl_acct_put
atomic_dec_and_test //ref=1,testfail -
- atomic_dec_and_test //ref=0,testok
- kfree_rcu
atomic_inc //ref=1 -

So after the rcu grace period, nf_acct will be freed but it is still linked
in the nfnl_acct_list, and we can access it later, then oops will happen.

Convert atomic_dec_and_test and atomic_inc combinaiton to one atomic
operation atomic_cmpxchg here to fix this problem.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-18 21:16:36 +0800
184ca8234 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.

2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.

4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.

5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.

6) Several sctp_diag fixes from Phil Sutter.

7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

8) Checksum fixup fixes in bpf from Daniel Borkmann.

9) Memork leaks in nfnetlink, from Liping Zhang.

10) Use after free in rxrpc, from David Howells.

11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.

12) Calipso resource leak, from Colin Ian King.

13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

15) Fix lockdep splats in macsec, from Sabrina Dubroca.

16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.

17) Various tc-action bug fixes, from CONG Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...

Linus Torvalds
2016-08-18 08:26:58 +0800
b5ac85188 net_sched: allow flushing tc police actions ... Browse Code »

The act_police uses its own code to walk the
action hashtable, which leads to that we could
not flush standalone tc police actions, so just
switch to tcf_generic_walker() like other actions.

(Joint work from Roman and Cong.)

Signed-off-by: Roman Mashak
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Roman Mashak
2016-08-18 07:27:51 +0800
0852e4552 net_sched: unify the init logic for act_police ... Browse Code »

Jamal reported a crash when we create a police action
with a specific index, this is because the init logic
is not correct, we should always create one for this
case. Just unify the logic with other tc actions.

Fixes: a03e6fe56971 ("act_police: fix a crash during removal")
Reported-by: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800
22dc13c83 net_sched: convert tcf_exts from list to pointer array ... Browse Code »

As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.

The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800
824a7e886 net_sched: remove an unnecessary list_del() ... Browse Code »

This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.

Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800
f07fed82a net_sched: remove the leftover cleanup_a() ... Browse Code »

After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-08-18 07:27:51 +0800
dcbe35909 netfilter: tproxy: properly refcount tcp listeners ... Browse Code »

inet_lookup_listener() and inet6_lookup_listener() no longer
take a reference on the found listener.

This minimal patch adds back the refcounting, but we might do
this differently in net-next later.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Reported-and-tested-by: Denys Fedoryshchenko
Signed-off-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso

Eric Dumazet
2016-08-18 06:51:13 +0800
aca300183 netfilter: nfnetlink_acct: report overquota to the right netns ... Browse Code »

We should report the over quota message to the right net namespace
instead of the init netns.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-18 06:38:23 +0800

17 Aug, 2016

3 commits

2497b8462 netfilter: nfnetlink_log: add "nf-logger-3-1" module alias name ... Browse Code »

Otherwise, if nfnetlink_log.ko is not loaded, we cannot add rules
to log packets to the userspace when we specify it with arp family,
such as:

# nft add rule arp filter input log group 0
:1:1-37: Error: Could not process rule: No such file or
directory
add rule arp filter input log group 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-17 23:44:53 +0800
e77e6ff50 netfilter: conntrack: do not dump other netns's conntrack entries via proc ... Browse Code »

We should skip the conntracks that belong to a different namespace,
otherwise other unrelated netns's conntrack entries will be dumped via
/proc/net/nf_conntrack.

Fixes: 56d52d4892d0 ("netfilter: conntrack: use a single hashtable for all namespaces")
Signed-off-by: Liping Zhang
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-17 23:41:58 +0800
3ec60b92d Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost ... Browse Code »

Pull virtio/vhost fixes from Michael Tsirkin:
- test fixes
- a vsock fix

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: add dma stubs
vhost/test: fix after swiotlb changes
vhost/vsock: drop space available check for TX vq
ringtest: test build fix

Linus Torvalds
2016-08-17 06:51:57 +0800

16 Aug, 2016

3 commits

d2fbdf76b tipc: fix NULL pointer dereference in shutdown() ... Browse Code »

tipc_msg_create() can return a NULL skb and if so, we shouldn't try to
call tipc_node_xmit_skb() on it.

general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000
RIP: 0010:[] [] tipc_node_xmit_skb+0x6b/0x140
RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580
RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000
R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f
R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000
FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0
DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
Stack:
0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208
ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018
0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611
Call Trace:
[] tipc_shutdown+0x553/0x880
[] SyS_shutdown+0x14b/0x170
[] do_syscall_64+0x19c/0x410
[] entry_SYSCALL64_slow_path+0x25/0x25
Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00
RIP [] tipc_node_xmit_skb+0x6b/0x140
RSP
---[ end trace 57b0484e351e71f1 ]---

I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
userspace is equipped to handle that. Anyway, this is better than a GPF
and looks somewhat consistent with other tipc_msg_create() callers.

Signed-off-by: Vegard Nossum
Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-16 04:55:36 +0800
3d7b33209 gre: set inner_protocol on xmit ... Browse Code »

Ensure that the inner_protocol is set on transmit so that GSO segmentation,
which relies on that field, works correctly.

This is achieved by setting the inner_protocol in gre_build_header rather
than each caller of that function. It ensures that the inner_protocol is
set when gre_fb_xmit() is used to transmit GRE which was not previously the
case.

I have observed this is not the case when OvS transmits GRE using
lwtunnel metadata (which it always does).

Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Cc: Pravin Shelar
Acked-by: Alexander Duyck
Signed-off-by: Simon Horman
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Simon Horman
2016-08-16 04:37:12 +0800
5e4578969 net: ipv6: Fix ping to link-local addresses. ... Browse Code »

ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.

Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.

Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti
Signed-off-by: David S. Miller

Lorenzo Colitti
2016-08-16 03:19:09 +0800

15 Aug, 2016

1 commit

21bc54fc0 vhost/vsock: drop space available check for TX vq ... Browse Code »

Remove unnecessary use of enable/disable callback notifications
and the incorrect more space available check.

The virtio_transport_tx_work handles when the TX virtqueue
has more buffers available.

Signed-off-by: Gerard Garcia
Acked-by: Stefan Hajnoczi
Signed-off-by: Michael S. Tsirkin

Gerard Garcia
2016-08-15 10:05:21 +0800

14 Aug, 2016

1 commit

952fcfd08 net: remove type_check from dev_get_nest_level() ... Browse Code »

The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:

eth0
Signed-off-by: David S. Miller

Sabrina Dubroca
2016-08-14 06:15:54 +0800