Eric Lee / smarc-fsl-linux-kernel

29 Aug, 2016

5 commits

cc8e9ebf9 net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ ... Browse Code »

The driver RQ has two possible configurations: striding RQ and
non-striding RQ. Until this patch, the driver always reported the
number of hardware WQEs (ring descriptors). For non striding RQ
configuration, this was OK since we have one WQE per pending packet
For striding RQ, multiple packets can fit into one WQE. For better
user experience we normalize the rx_pending parameter (size of wqe/mtu)
as the average ring size in case of striding RQ.

Fixes: 461017cb006a ('net/mlx5e: Support RX multi-packet WQE ...')
Signed-off-by: Eran Ben Elisha
Signed-off-by: Saeed Mahameed
Signed-off-by: David S. Miller

Eran Ben Elisha
2016-08-29 11:24:15 +0800
6e8dd6d6f net/mlx5e: Don't wait for SQ completions on close ... Browse Code »

Instead of asking the firmware to flush the SQ (Send Queue) via
asynchronous completions when moved to error, we handle SQ flush
manually (mlx5e_free_tx_descs) same as we did when SQ flush got
timed out or on tx_timeout.

This will reduce SQs flush time and speedup interface down procedure.

Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
critical code locality.

Fixes: 29429f3300a3 ('net/mlx5e: Timeout if SQ doesn't flush during close')
Signed-off-by: Saeed Mahameed
Signed-off-by: David S. Miller

Saeed Mahameed
2016-08-29 11:24:15 +0800
8484f9ed1 net/mlx5e: Don't post fragmented MPWQE when RQ is disabled ... Browse Code »

ICO (Internal control operations) SQ (Send Queue) is closed/disabled
after RQ (Receive Queue). After RQ is closed an ICO SQ completion
might post a fragmented MPWQE (Multi Packet Work Queue Element) into
that RQ.

As on regular RQ post, check if we are allowed to post to that
RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
if needed.

Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: Saeed Mahameed
Signed-off-by: David S. Miller

Saeed Mahameed
2016-08-29 11:24:15 +0800
f2fde18c5 net/mlx5e: Don't wait for RQ completions on close ... Browse Code »

This will significantly reduce receive queue flush time on interface
down.

Instead of asking the firmware to flush the RQ (Receive Queue) via
asynchronous completions when moved to error, we handle RQ flush
manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
out.

This will reduce RQs flush time and speedup interface down procedure
(ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.

Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
free form non critical data path code for better code locality.

Fixes: 6cd392a082de ('net/mlx5e: Handle RQ flush in error cases')
Signed-off-by: Saeed Mahameed
Signed-off-by: David S. Miller

Saeed Mahameed
2016-08-29 11:24:15 +0800
fe4c988bd net/mlx5e: Limit UMR length to the device's limitation ... Browse Code »

ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
is limited to U16_MAX, before this patch we ignored that limitation and
requested the maximum possible UMR translation length that the netdev
might need (MAX channels * MAX pages per channel).
In case of a system with #cores > 32 and when linear WQE allocation fails,
falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
stuck.

Here we limit UMR length to min(U16_MAX, max required pages) (while
considering the required alignments) on driver load, by default U16_MAX is
sufficient since the default RX rings value guarantees that we are in
range, dynamically (on set_ringparam/set_channels) we will check if the
new required UMR length (num mtts) is still in range, if not, fail the
request.

Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: Saeed Mahameed
Signed-off-by: David S. Miller

Saeed Mahameed
2016-08-29 11:24:15 +0800

27 Aug, 2016

5 commits

9dbeea7f0 rhashtable: fix a memory leak in alloc_bucket_locks() ... Browse Code »

If vmalloc() was successful, do not attempt a kmalloc_array()

Fixes: 4cf0b354d92e ("rhashtable: avoid large lock-array allocations")
Reported-by: CAI Qian
Signed-off-by: Eric Dumazet
Cc: Florian Westphal
Acked-by: Herbert Xu
Tested-by: CAI Qian
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-27 12:59:53 +0800
e70c70c38 sfc: fix potential stack corruption from running past stat bitmask ... Browse Code »

On 32-bit systems, mask is only an array of 3 longs, not 4, so don't try
to write to mask[3].
Also include build-time checks in case the size of the bitmask changes.

Fixes: 3c36a2aded8c ("sfc: display vadaptor statistics for all interfaces")
Signed-off-by: Edward Cree
Signed-off-by: David S. Miller

Andrew Rybchenko
2016-08-27 12:40:44 +0800
5c1f5b457 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth ... Browse Code »

Johan Hedberg says:

====================
pull request: bluetooth 2016-08-25

Here are a couple of important Bluetooth fixes for the 4.8 kernel:

- Memory leak fix for HCI requests
- Fix sk_filter handling with L2CAP
- Fix sock_recvmsg behavior when MSG_TRUNC is not set

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-27 12:09:17 +0800
c15e07b02 team: loadbalance: push lacpdus to exact delivery ... Browse Code »

When team is in bridge and LACP is utilized, LACPDU packets are pushed
to userspace using raw socket and there they are processed. However,
since 8626c56c8279b, LACPDU skbs are dropped by bridge rx_handler so
they never reach packet handlers in rx path. Fix this by explicity treat
LACPDUs to be pushed to exact delivery in team rx_handler.

Reported-by: Ido Schimmel
Fixes: 8626c56c8279b ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2016-08-27 04:08:59 +0800
c234af587 net: hns: dereference ppe_cb->ppe_common_cb if it is non-null ... Browse Code »

ppe_cb->ppe_common_cb is being dereferenced before a null check is
being made on it. If ppe_cb->ppe_common_cb is null then we end up
with a null pointer dereference when assigning dsaf_dev. Fix this
by moving the initialisation of dsaf_dev once we know
ppe_cb->ppe_common_cb is OK to dereference.

Signed-off-by: Colin Ian King
Acked-by: Yisen Zhuang
Signed-off-by: David S. Miller

Colin Ian King
2016-08-27 02:44:56 +0800

26 Aug, 2016

8 commits

b628d611a 8139cp: Fix one possible deadloop in cp_rx_poll ... Browse Code »

When cp_rx_poll does not get enough packet, it will check the rx
interrupt status again. If so, it will jumpt to rx_status_loop again.
But the goto jump resets the rx variable as zero too.

As a result, it causes one possible deadloop. Assume this case,
rx_status_loop only gets the packet count which is less than budget,
and (cpr16(IntrStatus) & cp_rx_intr_mask) condition is always true.
It causes the deadloop happens and system is blocked.

Signed-off-by: Gao Feng
Signed-off-by: David S. Miller

Gao Feng
2016-08-26 08:02:48 +0800
f38ff2ee7 i40e: Change some init flow for the client ... Browse Code »

This change makes a common flow for Client instance open during init
and reset path. The Client subtask can handle both the cases instead of
making a separate notify_client_of_open call.
Also it may fix a bug during reset where the service task was leaking
some memory and causing issues.

Change-Id: I7232a32fd52b82e863abb54266fa83122f80a0cd
Signed-off-by: Anjali Singhai Jain
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher
Signed-off-by: David S. Miller

Anjali Singhai Jain
2016-08-26 07:59:30 +0800
c3e70edd7 Revert "phy: IRQ cannot be shared" ... Browse Code »

This reverts:
commit 33c133cc7598 ("phy: IRQ cannot be shared")

On hardware with multiple PHY devices hooked up to the same IRQ line, allow
them to share it.

Sergei Shtylyov says:
"I'm not sure now what was the reason I concluded that the IRQ sharing
was impossible... most probably I thought that the kernel IRQ handling
code exited the loop over the IRQ actions once IRQ_HANDLED was returned
-- which is obviously not so in reality..."

Signed-off-by: Xander Huff
Signed-off-by: Nathan Sullivan
Signed-off-by: David S. Miller

Xander Huff
2016-08-26 07:53:47 +0800
4f101c477 net: dsa: bcm_sf2: Fix race condition while unmasking interrupts ... Browse Code »

We kept shadow copies of which interrupt sources we have enabled and
disabled, but due to an order bug in how intrl2_mask_clear was defined,
we could run into the following scenario:

CPU0 CPU1
intrl2_1_mask_clear(..)
sets INTRL2_CPU_MASK_CLEAR
bcm_sf2_switch_1_isr
read INTRL2_CPU_STATUS and masks with stale
irq1_mask value
updates irq1_mask value

Which would make us loop again and again trying to process and interrupt
we are not clearing since our copy of whether it was enabled before
still indicates it was not. Fix this by updating the shadow copy first,
and then unasking at the HW level.

Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller

Florian Fainelli
2016-08-26 07:49:25 +0800
166ee5b87 qdisc: fix a module refcount leak in qdisc_create_dflt() ... Browse Code »

Should qdisc_alloc() fail, we must release the module refcount
we got right before.

Fixes: 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline")
Signed-off-by: Eric Dumazet
Acked-by: John Fastabend
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-26 07:44:20 +0800
a5de125dd tipc: fix the error handling in tipc_udp_enable() ... Browse Code »

Fix to return a negative error code in enable_mcast() error handling
case, and release udp socket when necessary.

Fixes: d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2016-08-26 07:32:34 +0800
4f34228b6 Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set ... Browse Code »

Similar to bt_sock_recvmsg MSG_TRUNC shall be checked using the original
flags not msg_flags.

Signed-off-by: Luiz Augusto von Dentz
Signed-off-by: Marcel Holtmann

Luiz Augusto von Dentz
2016-08-26 02:58:47 +0800
90a56f72e Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not set ... Browse Code »

Commit b5f34f9420b50c9b5876b9a2b68e96be6d629054 attempt to introduce
proper handling for MSG_TRUNC but recv and variants should still work
as read if no flag is passed, but because the code may set MSG_TRUNC to
msg->msg_flags that shall not be used as it may cause it to be behave as
if MSG_TRUNC is always, so instead of using it this changes the code to
use the flags parameter which shall contain the original flags.

Signed-off-by: Luiz Augusto von Dentz
Signed-off-by: Marcel Holtmann

Luiz Augusto von Dentz
2016-08-26 02:58:47 +0800

25 Aug, 2016

2 commits

51af96b53 mlxsw: router: Enable neighbors to be created on stacked devices ... Browse Code »

Make the function mlxsw_router_neigh_construct search the rif according
to the neighbour dev other than the dev that was passed to the ndo, thus
allowing creating neigbhours upon stacked devices.

Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Yotam Gigi
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Yotam Gigi
2016-08-25 00:39:04 +0800
f888f5879 mlxsw: spectrum: Add missing flood to router port ... Browse Code »

In case we have a layer 3 interface on top of a bridge (VLAN / FID RIF),
then we should flood the following packet types to the router:

* Broadcast: If DIP is the broadcast address of the interface, then we
need to be able to get it to CPU by trapping it following route lookup.

* Reserved IP multicast (224.0.0.X): Some control packets (e.g. OSPF)
use this range and are trapped in the router block.

Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2016-08-25 00:39:03 +0800

24 Aug, 2016

14 commits

dbb50887c Bluetooth: split sk_filter in l2cap_sock_recv_cb ... Browse Code »

During an audit for sk_filter(), we found that rx_busy_skb handling
in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as
intended.

The assumption from commit e328140fdacb ("Bluetooth: Use event-driven
approach for handling ERTM receive buffer") is that errors returned
from sock_queue_rcv_skb() are due to receive buffer shortage. However,
nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on
the socket, that could drop some of the incoming skbs when handled in
sock_queue_rcv_skb().

In that case sock_queue_rcv_skb() will return with -EPERM, propagated
from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was
that we failed due to receive buffer being full. From that point onwards,
due to the to-be-dropped skb being held in rx_busy_skb, we cannot make
any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(),
due to the filter drop verdict over and over coming from sk_filter().
Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being
dropped due to rx_busy_skb being occupied.

Instead, just use __sock_queue_rcv_skb() where an error really tells that
there's a receive buffer issue. Split the sk_filter() and enable it for
non-segmented modes at queuing time since at this point in time the skb has
already been through the ERTM state machine and it has been acked, so dropping
is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in
l2cap_data_rcv() so the packet can be dropped before the state machine sees it.

Fixes: e328140fdacb ("Bluetooth: Use event-driven approach for handling ERTM receive buffer")
Signed-off-by: Daniel Borkmann
Signed-off-by: Mat Martineau
Acked-by: Willem de Bruijn
Signed-off-by: Marcel Holtmann

Daniel Borkmann
2016-08-24 22:55:04 +0800
9afee9493 Bluetooth: Fix memory leak at end of hci requests ... Browse Code »

In hci_req_sync_complete the event skb is referenced in hdev->req_skb.
It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will
pass the skb to the caller, or __hci_req_sync which leaks.

unreferenced object 0xffff880005339a00 (size 256):
comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s)
backtrace:
[] kmemleak_alloc+0x49/0xa0
[] kmem_cache_alloc+0x128/0x180
[] skb_clone+0x4f/0xa0
[] hci_event_packet+0xc1/0x3290
[] hci_rx_work+0x18b/0x360
[] process_one_work+0x14a/0x440
[] worker_thread+0x43/0x4d0
[] kthread+0xc4/0xe0
[] ret_from_fork+0x1f/0x40
[] 0xffffffffffffffff

Signed-off-by: Frédéric Dalleau
Signed-off-by: Marcel Holtmann

Frederic Dalleau
2016-08-24 22:49:29 +0800
d7226c7a4 net: diag: Fix refcnt leak in error path destroying socket ... Browse Code »

inet_diag_find_one_icsk takes a reference to a socket that is not
released if sock_diag_destroy returns an error. Fix by changing
tcp_diag_destroy to manage the refcnt for all cases and remove
the sock_put calls from tcp_abort.

Fixes: c1e64e298b8ca ("net: diag: Support destroying TCP sockets")
Reported-by: Lorenzo Colitti
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-08-24 14:11:36 +0800
7b996243f tun: fix transmit timestamp support ... Browse Code »

Instead of using sock_tx_timestamp, use skb_tx_timestamp to record
software transmit timestamp of a packet.

sock_tx_timestamp resets and overrides the tx_flags of the skb.
The function is intended to be called from within the protocol
layer when creating the skb, not from a device driver. This is
inconsistent with other drivers and will cause issues for TCP.

In TCP, we intend to sample the timestamps for the last byte
for each sendmsg/sendpage. For that reason, tcp_sendmsg calls
tcp_tx_timestamp only with the last skb that it generates.
For example, if a 128KB message is split into two 64KB packets
we want to sample the SND timestamp of the last packet. The current
code in the tun driver, however, will result in sampling the SND
timestamp for both packets.

Also, when the last packet is split into smaller packets for
retranmission (see tcp_fragment), the tun driver will record
timestamps for all of the retransmitted packets and not only the
last packet.

Fixes: eda297729171 (tun: Support software transmit time stamping.)
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Francis Yan
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Soheil Hassas Yeganeh
2016-08-24 14:09:27 +0800
75d855a5e udp: get rid of SLAB_DESTROY_BY_RCU allocations ... Browse Code »

After commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
we do not need this special allocation mode anymore, even if it is
harmless.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-24 08:46:17 +0800
232cb53a4 sctp: fix overrun in sctp_diag_dump_one() ... Browse Code »

The function sctp_diag_dump_one() currently performs a memcpy()
of 64 bytes from a 16 byte field into another 16 byte field. Fix
by using correct size, use sizeof to obtain correct size instead
of using a hard-coded constant.

Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Lance Richardson
Reviewed-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Lance Richardson
2016-08-24 08:22:53 +0800
a8184003c dwc_eth_qos: fix interrupt enable race ... Browse Code »

We currently enable interrupts before we enable NAPI. If an RX interrupt
hits before we enabled NAPI then the NAPI callback is never called and
we leave the hardware with RX interrupts disabled, which of course leads
us to never handling received packets. Fix this by moving the interrupt
enable to after we've enable NAPI and the reclaim tasklet.

Fixes: cd5e41234729 ("dwc_eth_qos: do phy_start before resetting hardware")
Signed-off-by: Rabin Vincent
Signed-off-by: Lars Persson
Signed-off-by: David S. Miller

Rabin Vincent
2016-08-24 08:11:05 +0800
53080fe9c net: lpc_eth: Check clk_prepare_enable() error ... Browse Code »

clk_prepare_enable() may fail, so we should better check its return
value and propagate it in the case of failure

While at it, replace __lpc_eth_clock_enable() with a plain
clk_prepare_enable/clk_disable_unprepare() call in order to
simplify the code.

Signed-off-by: Fabio Estevam
Acked-by: Vladimir Zapolskiy
Signed-off-by: David S. Miller

Fabio Estevam
2016-08-24 08:10:16 +0800
1bc261fab net: mv88e6xxx: Fix ingress rate removal for mv6131 chips ... Browse Code »

The PORT_RATE_CONTROL register works differently on 88e6095/6095f/6131
in comparison to 6123/61/65, and 0x0 disables. The distinction was lost
Linux 4.1 --> 4.2

Signed-off-by: Jamie Lentin
Reviewed-by: Andrew Lunn
Signed-off-by: David S. Miller

Jamie Lentin
2016-08-24 07:57:33 +0800
f64f14820 phy: micrel: Reenable interrupts during resume for ksz9031 ... Browse Code »

Like the ksz8081, the ksz9031 has the behavior where it will clear the
interrupt enable bits when leaving power down. This takes advantage of the
solution provided by f5aba91.

Signed-off-by: Xander Huff
Signed-off-by: Nathan Sullivan
Reviewed-by: Florian Fainelli
Signed-off-by: David S. Miller

Xander Huff
2016-08-24 07:56:54 +0800
20a2b49fc tcp: properly scale window in tcp_v[46]_reqsk_send_ack() ... Browse Code »

When sending an ack in SYN_RECV state, we must scale the offered
window if wscale option was negotiated and accepted.

Tested:
Following packetdrill test demonstrates the issue :

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// Establish a connection.
+0 < S 0:0(0) win 20000
+0 > S. 0:0(0) ack 1 win 28960

+0 < . 1:11(10) ack 1 win 156
// check that window is properly scaled !
+0 > . 1:1(0) ack 1 win 226

Signed-off-by: Eric Dumazet
Cc: Yuchung Cheng
Cc: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-24 07:55:49 +0800
6c389fc93 gianfar: fix size of scatter-gathered frames ... Browse Code »

The current scatter-gather logic in gianfar is flawed, since
it does not consider the eTSEC's RxBD 'Data Length' field is
context depening: for the last fragment it contains the full
frame size, while fragments contain the fragment size, which
equals the value written to register MRBLR.

This causes data corruption as soon as the hardware starts
to fragment receiving frames. As a result, the size of
fragmented frames is increased by
(nr_frags - 1) * MRBLR

We first noticed this issue working with DSA, where an ICMP
request sized 1472 bytes causes the scatter-gather logic to
kick in. The full Ethernet frame (1518) gets increased by
DSA (4), GMAC_FCB_LEN (8), and FSL_GIANFAR_DEV_HAS_TIMER
(priv->padding=8) to a total of 1538 octets, which is
fragmented by the hardware and reconstructed by the driver
to a 3074 octet frame.

This patch fixes the problem by adjusting the size of
the last fragment.

It was tested by setting MRBLR to different multiples of
64, proving correct scatter-gather operation on frames
with up to 9000 octets in size.

Signed-off-by: Zefir Kurtisi
Signed-off-by: David S. Miller

Zefir Kurtisi
2016-08-24 07:49:00 +0800
b323431bc gianfar: prevent fragmentation in DSA environments ... Browse Code »

The eTSEC register MRBLR defines the maximum space in
the RX buffers and is set to 1536 by gianfar. This
reasonably covers the common use case where the MTU
is kept at default 1500. In that case, the largest
Ethernet frame size of 1518 plus an optional
GMAC_FCB_LEN of 8, and an additional padding of 8
to handle FSL_GIANFAR_DEV_HAS_TIMER totals to 1534
and nicely fit within the chosen MRBLR.

Alas, if the eTSEC is attached to a DSA enabled switch,
the (E)DSA header extension (4 or 8 bytes) causes every
maximum sized frame to be fragmented by the hardware.

This patch increases the maximum RX buffer size by 8
and rounds up to the next multiple of 64, which the
hardware's defines as RX buffer granularity.

Signed-off-by: Zefir Kurtisi
Signed-off-by: David S. Miller

Zefir Kurtisi
2016-08-24 07:48:59 +0800
e83c6744e udp: fix poll() issue with zero sized packets ... Browse Code »

Laura tracked poll() [and friends] regression caused by commit
e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")

udp_poll() needs to know if there is a valid packet in receive queue,
even if its payload length is 0.

Change first_packet_length() to return an signed int, and use -1
as the indication of an empty queue.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Laura Abbott
Signed-off-by: Eric Dumazet
Tested-by: Laura Abbott
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-24 07:39:14 +0800

23 Aug, 2016

6 commits

28a10c426 net sched: fix encoding to use real length ... Browse Code »

Encoding of the metadata was using the padded length as opposed to
the real length of the data which is a bug per specification.
This has not been an issue todate because all metadatum specified
so far has been 32 bit where aligned and data length are the same width.
This also includes a bug fix for validating the length of a u16 field.
But since there is no metadata of size u16 yes we are fine to include it
here.

While at it get rid of magic numbers.

Fixes: ef6980b6becb ("net sched: introduce IFE action")
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jamal Hadi Salim
2016-08-23 12:01:57 +0800
4870e704d qed: FLR of active VFs might lead to FW assert ... Browse Code »

Driver never bothered marking the VF's vport with the VF's sw_fid.
As a result, FLR flows are not going to clean those vports.

If the vport was active when FLRed, re-activating it would lead
to a FW assertion.

Fixes: dacd88d6f6851 ("qed: IOV l2 functionality")
Signed-off-by: Yuval Mintz
Signed-off-by: David S. Miller

Yuval Mintz
2016-08-23 09:11:38 +0800
c0451fe1f net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset ... Browse Code »

In b8247f095e,

"net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"

gso skbs arriving from an ingress interface that go through UDP
tunneling, are allowed to be fragmented if the resulting encapulated
segments exceed the dst mtu of the egress interface.

This aligned the behavior of gso skbs to non-gso skbs going through udp
encapsulation path.

However the non-gso vs gso anomaly is present also in the following
cases of a GRE tunnel:
- ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
(e.g. OvS vport-gre with df_default=false)
- ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set

In both of the above cases, the non-gso skbs get fragmented, whereas the
gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
as they don't go through the segment+fragment code path.

Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.

Tunnels that do set IP_DF, will not go to fragmentation of segments.
This preserves behavior of ip_gre in (the default) pmtudisc mode.

Fixes: b8247f095e ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
Reported-by: wenxu
Cc: Hannes Frederic Sowa
Signed-off-by: Shmulik Ladkani
Tested-by: wenxu
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Shmulik Ladkani
2016-08-23 08:11:01 +0800
85b51b121 net: ipv6: Remove addresses for failures with strict DAD ... Browse Code »

If DAD fails with accept_dad set to 2, global addresses and host routes
are incorrectly left in place. Even though disable_ipv6 is set,
contrary to documentation, the addresses are not dynamically deleted
from the interface. It is only on a subsequent link down/up that these
are removed. The fix is not only to set the disable_ipv6 flag, but
also to call addrconf_ifdown(), which is the action to carry out when
disabling IPv6. This results in the addresses and routes being deleted
immediately. The DAD failure for the LL addr is determined as before
via netlink, or by the absence of the LL addr (which also previously
would have had to be checked for in case of an intervening link down
and up). As the call to addrconf_ifdown() requires an rtnl lock, the
logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().

Previous behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
5: eth3: mtu 1500 qlen 1000
inet6 2000::10/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
valid_lft forever preferred_lft forever
root@vm1:/# ip -6 route show dev eth3
2000::/64 proto kernel metric 256
fe80::/64 proto kernel metric 256
root@vm1:/# ip link set down eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#

New behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#

Signed-off-by: Mike Manning
Signed-off-by: David S. Miller

Mike Manning
2016-08-23 07:59:37 +0800
53dc65d4d include/uapi/linux/ipx.h: fix conflicting defitions with glibc netipx/ipx.h ... Browse Code »

Fixes these compiler warnings via libc-compat.h when glibc netipx/ipx.h is
included before linux/ipx.h:

./linux/ipx.h:9:8: error: redefinition of ‘struct sockaddr_ipx’
./linux/ipx.h:26:8: error: redefinition of ‘struct ipx_route_definition’
./linux/ipx.h:32:8: error: redefinition of ‘struct ipx_interface_definition’
./linux/ipx.h:49:8: error: redefinition of ‘struct ipx_config_data’
./linux/ipx.h:58:8: error: redefinition of ‘struct ipx_route_def’

Signed-off-by: Mikko Rapeli
Signed-off-by: David S. Miller

Mikko Rapeli
2016-08-23 07:25:15 +0800
a1d1f65ff include/uapi/linux/openvswitch.h: use __u32 from linux/types.h ... Browse Code »

Kernel uapi header are supposed to use them. Fixes userspace compile error:

linux/openvswitch.h:583:2: error: unknown type name ‘uint32_t’

Signed-off-by: Mikko Rapeli
Signed-off-by: David S. Miller

Mikko Rapeli
2016-08-23 07:25:15 +0800