Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

09 Oct, 2015

1 commit

ddd01b0ef Merge tag 'v4.1.10' into linux-linaro-lsk-v4.1 ... Browse Code »

This is the 4.1.10 stable release

Alex Shi
2015-10-09 09:45:12 +0800

03 Oct, 2015

30 commits

27f1b7fed Linux 4.1.10 Browse Code »

Greg Kroah-Hartman
2015-10-03 19:49:38 +0800
85695fdcb hp-wmi: limit hotkey enable ... Browse Code »

commit 8a1513b49321e503fd6c8b6793e3b1f9a8a3285b upstream.

Do not write initialize magic on systems that do not have
feature query 0xb. Fixes Bug #82451.

Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd
for code clearity.

Add a new test function, hp_wmi_bios_2008_later() & simplify
hp_wmi_bios_2009_later(), which fixes a bug in cases where
an improper value is returned. Probably also fixes Bug #69131.

Add missing __init tag.

Signed-off-by: Kyle Evans
Signed-off-by: Darren Hart
Signed-off-by: Greg Kroah-Hartman

Kyle Evans
2015-10-03 19:49:18 +0800
6e3105d5b zram: fix possible use after free in zcomp_create() ... Browse Code »

commit 3aaf14da807a4e9931a37f21e4251abb8a67021b upstream.

zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
through comp->stream, which can potentially be pointing to memory that
was freed if these functions returned an error.

While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
could return other error codes. Function documentation updated
accordingly.

Fixes: beca3ec71fe5 ("zram: add multi stream functionality")
Signed-off-by: Luis Henriques
Acked-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Luis Henriques
2015-10-03 19:49:18 +0800
d48623677 netlink: Replace rhash_portid with bound ... Browse Code »

[ Upstream commit da314c9923fed553a007785a901fd395b7eb6c19 ]

On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
>
> store_release and load_acquire are different from the usual memory
> barriers and can't be paired this way. You have to pair store_release
> and load_acquire. Besides, it isn't a particularly good idea to

OK I've decided to drop the acquire/release helpers as they don't
help us at all and simply pessimises the code by using full memory
barriers (on some architectures) where only a write or read barrier
is needed.

> depend on memory barriers embedded in other data structures like the
> above. Here, especially, rhashtable_insert() would have write barrier
> *before* the entry is hashed not necessarily *after*, which means that
> in the above case, a socket which appears to have set bound to a
> reader might not visible when the reader tries to look up the socket
> on the hashtable.

But you are right we do need an explicit write barrier here to
ensure that the hashing is visible.

> There's no reason to be overly smart here. This isn't a crazy hot
> path, write barriers tend to be very cheap, store_release more so.
> Please just do smp_store_release() and note what it's paired with.

It's not about being overly smart. It's about actually understanding
what's going on with the code. I've seen too many instances of
people simply sprinkling synchronisation primitives around without
any knowledge of what is happening underneath, which is just a recipe
for creating hard-to-debug races.

> > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
> > }
> > }
> >
> > - if (!nlk->portid) {
> > + if (!nlk->bound) {
>
> I don't think you can skip load_acquire here just because this is the
> second deref of the variable. That doesn't change anything. Race
> condition could still happen between the first and second tests and
> skipping the second would lead to the same kind of bug.

The reason this one is OK is because we do not use nlk->portid or
try to get nlk from the hash table before we return to user-space.

However, there is a real bug here that none of these acquire/release
helpers discovered. The two bound tests here used to be a single
one. Now that they are separate it is entirely possible for another
thread to come in the middle and bind the socket. So we need to
repeat the portid check in order to maintain consistency.

> > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
> > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
> > return -EPERM;
> >
> > - if (!nlk->portid)
> > + if (!nlk->bound)
>
> Don't we need load_acquire here too? Is this path holding a lock
> which makes that unnecessary?

Ditto.

---8bound once in netlink_bind fixes
a race where two threads that bind the socket at the same time
with different port IDs may both succeed.

Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID")
Reported-by: Tejun Heo
Reported-by: Linus Torvalds
Signed-off-by: Herbert Xu
Nacked-by: Tejun Heo
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Herbert Xu
2015-10-03 19:49:18 +0800
4e2776241 netlink: Fix autobind race condition that leads to zero port ID ... Browse Code »

[ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ]

The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
Reset portid after netlink_insert failure") introduced a race
condition where if two threads try to autobind the same socket
one of them may end up with a zero port ID. This led to kernel
deadlocks that were observed by multiple people.

This patch reverts that commit and instead fixes it by introducing
a separte rhash_portid variable so that the real portid is only set
after the socket has been successfully hashed.

Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
Reported-by: Tejun Heo
Reported-by: Linus Torvalds
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Herbert Xu
2015-10-03 19:49:18 +0800
d60017646 mvneta: use inband status only when explicitly enabled ... Browse Code »

[ Upstream commit f8af8e6eb95093d5ce5ebcc52bd1929b0433e172 in net-next tree,
will be pushed to Linus very soon. ]

The commit 898b2970e2c9 ("mvneta: implement SGMII-based in-band link state
signaling") implemented the link parameters auto-negotiation unconditionally.
Unfortunately it appears that some HW that implements SGMII protocol,
doesn't generate the inband status, so it is not possible to auto-negotiate
anything with such HW.

This patch enables the auto-negotiation only if explicitly requested with
the 'managed' DT property.

This patch fixes the following regression:
https://lkml.org/lkml/2015/7/8/865

Signed-off-by: Stas Sergeev

CC: Thomas Petazzoni
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Stas Sergeev
2015-10-03 19:49:17 +0800
ebfd3e10b of_mdio: add new DT property 'managed' to specify the PHY management type ... Browse Code »

[ Upstream commit 4cba5c2103657d43d0886e4cff8004d95a3d0def in net-next tree,
will be pushed to Linus very soon. ]

Currently the PHY management type is selected by the MAC driver arbitrary.
The decision is based on the presence of the "fixed-link" node and on a
will of the driver's authors.
This caused a regression recently, when mvneta driver suddenly started
to use the in-band status for auto-negotiation on fixed links.
It appears the auto-negotiation may not work when expected by the MAC driver.
Sebastien Rannou explains:
<< Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's
a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with
inband status) is connected to the switch through QSGMII, and in this context
we are on the media side of the PHY. >>
https://lkml.org/lkml/2015/7/10/206

This patch introduces the new string property 'managed' that allows
the user to set the management type explicitly.
The supported values are:
"auto" - default. Uses either MDIO or nothing, depending on the presence
of the fixed-link node
"in-band-status" - use in-band status

Signed-off-by: Stas Sergeev

CC: Rob Herring
CC: Pawel Moll
CC: Mark Rutland
CC: Ian Campbell
CC: Kumar Gala
CC: Florian Fainelli
CC: Grant Likely
CC: devicetree@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Stas Sergeev
2015-10-03 19:49:17 +0800
282117acd net: phy: fixed_phy: handle link-down case ... Browse Code »

[ Upstream 868a4215be9a6d80548ccb74763b883dc99d32a2 in net-next tree,
will be pushed to Linus very soon. ]

fixed_phy_register() currently hardcodes the fixed PHY link to 1, and
expects to find a "speed" parameter to provide correct information
towards the fixed PHY consumer.

In a subsequent change, where we allow "managed" (e.g: (RS)GMII in-band
status auto-negotiation) fixed PHYs, none of these parameters can be
provided since they will be auto-negotiated, hence, we just provide a
zero-initialized fixed_phy_status to fixed_phy_register() which makes it
fail when we call fixed_phy_update_regs() since status.speed = 0 which
makes us hit the "default" label and error out.

Without this change, we would also see potentially inconsistent
speed/duplex parameters for fixed PHYs when the link is DOWN.

CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Stas Sergeev
[florian: add more background to why this is correct and desirable]
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Stas Sergeev
2015-10-03 19:49:17 +0800
90eb52c94 net: dsa: bcm_sf2: Do not override speed settings ... Browse Code »

[ Upstream d2eac98f7d1b950b762a7eca05a9ce0ea1d878d2 in net-next tree,
will be pushed to Linus very soon. ]

The SF2 driver currently overrides speed settings for its port
configured using a fixed PHY, this is both unnecessary and incorrect,
because we keep feedback to the hardware parameters that we read from
the PHY device, which in the case of a fixed PHY cannot possibly change
speed.

This is a required change to allow the fixed PHY code to allow
registering a PHY with a link configured as DOWN by default and avoid
some sort of circular dependency where we require the link_update
callback to run to program the hardware, and we then utilize the fixed
PHY parameters to program the hardware with the same settings.

Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Florian Fainelli
2015-10-03 19:49:17 +0800
fce134644 fib_rules: fix fib rule dumps across multiple skbs ... Browse Code »

[ Upstream commit 41fc014332d91ee90c32840bf161f9685b7fbf2b ]

dump_rules returns skb length and not error.
But when family == AF_UNSPEC, the caller of dump_rules
assumes that it returns an error. Hence, when family == AF_UNSPEC,
we continue trying to dump on -EMSGSIZE errors resulting in
incorrect dump idx carried between skbs belonging to the same dump.
This results in fib rule dump always only dumping rules that fit
into the first skb.

This patch fixes dump_rules to return error so that we exit correctly
and idx is correctly maintained between skbs that are part of the
same dump.

Signed-off-by: Wilson Kok
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Wilson Kok
2015-10-03 19:49:17 +0800
74bff4a07 net: revert "net_sched: move tp->root allocation into fw_init()" ... Browse Code »

[ Upstream commit d8aecb10115497f6cdf841df8c88ebb3ba25fa28 ]

fw filter uses tp->root==NULL to check if it is the old method,
so it doesn't need allocation at all in this case. This patch
reverts the offending commit and adds some comments for old
method to make it obvious.

Fixes: 33f8b9ecdb15 ("net_sched: move tp->root allocation into fw_init()")
Reported-by: Akshat Kakkar
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

WANG Cong
2015-10-03 19:49:17 +0800
c3647e60c tcp: add proper TS val into RST packets ... Browse Code »

[ Upstream commit 675ee231d960af2af3606b4480324e26797eb010 ]

RST packets sent on behalf of TCP connections with TS option (RFC 7323
TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.

A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
ecr 0], length 0
B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
1460,nop,nop,TS val 7264344 ecr 100], length 0
A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
7264344], length 0

B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
ecr 110], length 0

We need to call skb_mstamp_get() to get proper TS val,
derived from skb->skb_mstamp

Note that RFC 1323 was advocating to not send TS option in RST segment,
but RFC 7323 recommends the opposite :

Once TSopt has been successfully negotiated, that is both and
contain TSopt, the TSopt MUST be sent in every non-
segment for the duration of the connection, and SHOULD be sent in an
segment (see Section 5.2 for details)

Note this RFC recommends to send TS val = 0, but we believe it is
premature : We do not know if all TCP stacks are properly
handling the receive side :

When an segment is
received, it MUST NOT be subjected to the PAWS check by verifying an
acceptable value in SEG.TSval, and information from the Timestamps
option MUST NOT be used to update connection state information.
SEG.TSecr MAY be used to provide stricter acceptance checks.

In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
to decide to send TS val = 0, if it buys something.

Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet
Acked-by: Yuchung Cheng
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2015-10-03 19:49:17 +0800
6d80e3507 openvswitch: Zero flows on allocation. ... Browse Code »

[ Upstream commit ae5f2fb1d51fa128a460bcfbe3c56d7ab8bf6a43 ]

When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.

While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.

In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.

This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.

Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Jesse Gross
2015-10-03 19:49:16 +0800
cf9cf6bc2 macvtap: fix TUNSETSNDBUF values > 64k ... Browse Code »

[ Upstream commit 3ea79249e81e5ed051f2e6480cbde896d99046e8 ]

Upon TUNSETSNDBUF, macvtap reads the requested sndbuf size into
a local variable u.
commit 39ec7de7092b ("macvtap: fix uninitialized access on
TUNSETIFF") changed its type to u16 (which is the right thing to
do for all other macvtap ioctls), breaking all values > 64k.

The value of TUNSETSNDBUF is actually a signed 32 bit integer, so
the right thing to do is to read it into an int.

Cc: David S. Miller
Fixes: 39ec7de7092b ("macvtap: fix uninitialized access on TUNSETIFF")
Reported-by: Mark A. Peloquin
Bisected-by: Matthew Rosato
Reported-by: Christian Borntraeger
Signed-off-by: Michael S. Tsirkin
Tested-by: Matthew Rosato
Acked-by: Christian Borntraeger
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Michael S. Tsirkin
2015-10-03 19:49:16 +0800
fd0a1a9da net/mlx4_en: really allow to change RSS key ... Browse Code »

[ Upsteam commit 4671fc6d47e0a0108fe24a4d830347d6a6ef4aa7 ]

When changing rss key, we do not want to overwrite user provided key
by the one provided by netdev_rss_key_fill(), which is the host random
key generated at boot time.

Fixes: 947cbb0ac242 ("net/mlx4_en: Support for configurable RSS hash function")
Signed-off-by: Eric Dumazet
Cc: Eyal Perry
CC: Amir Vadai
Acked-by: Or Gerlitz
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2015-10-03 19:49:16 +0800
022ce825b bridge: fix igmpv3 / mldv2 report parsing ... Browse Code »

[ Upstream commit c2d4fbd2163e607915cc05798ce7fb7f31117cc1 ]

With the newly introduced helper functions the skb pulling is hidden in
the checksumming function - and undone before returning to the caller.

The IGMPv3 and MLDv2 report parsing functions in the bridge still
assumed that the skb is pointing to the beginning of the IGMP/MLD
message while it is now kept at the beginning of the IPv4/6 header,
breaking the message parsing and creating packet loss.

Fixing this by taking the offset between IP and IGMP/MLD header into
account, too.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Tobias Powalowski
Tested-by: Tobias Powalowski
Signed-off-by: Linus Lüssing
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Linus Lüssing
2015-10-03 19:49:15 +0800
5cadd6bac sctp: fix race on protocol/netns initialization ... Browse Code »

[ Upstream commit 8e2d61e0aed2b7c4ecb35844fe07e0b2b762dee4 ]

Consider sctp module is unloaded and is being requested because an user
is creating a sctp socket.

During initialization, sctp will add the new protocol type and then
initialize pernet subsys:

status = sctp_v4_protosw_init();
if (status)
goto err_protosw_init;

status = sctp_v6_protosw_init();
if (status)
goto err_v6_protosw_init;

status = register_pernet_subsys(&sctp_net_ops);

The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
is possible for userspace to create SCTP sockets like if the module is
already fully loaded. If that happens, one of the possible effects is
that we will have readers for net->sctp.local_addr_list list earlier
than expected and sctp_net_init() does not take precautions while
dealing with that list, leading to a potential panic but not limited to
that, as sctp_sock_init() will copy a bunch of blank/partially
initialized values from net->sctp.

The race happens like this:

CPU 0 | CPU 1
socket() |
__sock_create | socket()
inet_create | __sock_create
list_for_each_entry_rcu( |
answer, &inetsw[sock->type], |
list) { | inet_create
/* no hits */ |
if (unlikely(err)) { |
... |
request_module() |
/* socket creation is blocked |
* the module is fully loaded |
*/ |
sctp_init |
sctp_v4_protosw_init |
inet_register_protosw |
list_add_rcu(&p->list, |
last_perm); |
| list_for_each_entry_rcu(
| answer, &inetsw[sock->type],
sctp_v6_protosw_init | list) {
| /* hit, so assumes protocol
| * is already loaded
| */
| /* socket creation continues
| * before netns is initialized
| */
register_pernet_subsys |

Simply inverting the initialization order between
register_pernet_subsys() and sctp_v4_protosw_init() is not possible
because register_pernet_subsys() will create a control sctp socket, so
the protocol must be already visible by then. Deferring the socket
creation to a work-queue is not good specially because we loose the
ability to handle its errors.

So, as suggested by Vlad, the fix is to split netns initialization in
two moments: defaults and control socket, so that the defaults are
already loaded by when we register the protocol, while control socket
initialization is kept at the same moment it is today.

Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
Signed-off-by: Vlad Yasevich
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Marcelo Ricardo Leitner
2015-10-03 19:49:15 +0800
65d48c630 netlink, mmap: transform mmap skb into full skb on taps ... Browse Code »

[ Upstream commit 1853c949646005b5959c483becde86608f548f24 ]

Ken-ichirou reported that running netlink in mmap mode for receive in
combination with nlmon will throw a NULL pointer dereference in
__kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable
to handle kernel paging request". The problem is the skb_clone() in
__netlink_deliver_tap_skb() for skbs that are mmaped.

I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink
skb has it pointed to netlink_skb_destructor(), set in the handler
netlink_ring_setup_skb(). There, skb->head is being set to NULL, so
that in such cases, __kfree_skb() doesn't perform a skb_release_data()
via skb_release_all(), where skb->head is possibly being freed through
kfree(head) into slab allocator, although netlink mmap skb->head points
to the mmap buffer. Similarly, the same has to be done also for large
netlink skbs where the data area is vmalloced. Therefore, as discussed,
make a copy for these rather rare cases for now. This fixes the issue
on my and Ken-ichirou's test-cases.

Reference: http://thread.gmane.org/gmane.linux.network/371129
Fixes: bcbde0d449ed ("net: netlink: virtual tap device management")
Reported-by: Ken-ichirou MATSUZAWA
Signed-off-by: Daniel Borkmann
Tested-by: Ken-ichirou MATSUZAWA
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Daniel Borkmann
2015-10-03 19:49:15 +0800
85b722085 net: dsa: bcm_sf2: Fix 64-bits register writes ... Browse Code »

[ Upstream commit 03679a14739a0d4c14b52ba65a69ff553bfba73b ]

The macro to write 64-bits quantities to the 32-bits register swapped
the value and offsets arguments, we want to preserve the ordering of the
arguments with respect to how writel() is implemented for instance:
value first, offset/base second.

Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli
Reviewed-by: Vivien Didelot
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Florian Fainelli
2015-10-03 19:49:15 +0800
c59231c87 ipv6: fix multipath route replace error recovery ... Browse Code »

[ Upstream commit 6b9ea5a64ed5eeb3f68f2e6fcce0ed1179801d1e ]

Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route leaving the fib
in an inconsistent state.

This patch reduces the possibility of this by doing the following:
a) Changes the existing multipath route add code to a two stage process:
build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that most errors are caught during building rt6_infos
and we fail early
c) Separates multipath add and del code. Because add needs the special
two stage mode in a) and delete essentially does not care.
d) In any event if the code fails during inserting a route again, a
warning is printed (This should be unlikely)

Before the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
/* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
* kernel */

After the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Roopa Prabhu
Reviewed-by: Nikolay Aleksandrov
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Roopa Prabhu
2015-10-03 19:49:14 +0800
0f5262e8d net: dsa: bcm_sf2: Fix ageing conditions and operation ... Browse Code »

[ Upstream commit 39797a279d62972cd914ef580fdfacb13e508bf8 ]

The comparison check between cur_hw_state and hw_state is currently
invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
going to cause an additional aging to occur. Fix this by not shifting
cur_hw_state while reading it, but instead, mask the value with the
appropriately shitfted bitmask.

The other problem with the fast-ageing process is that we did not set
the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
register to avoid leaving spurious bits sets from one operation to the
other.

Fixes: 12f460f23423 ("net: dsa: bcm_sf2: add HW bridging support")
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Florian Fainelli
2015-10-03 19:49:14 +0800
5008d77ef net/ipv6: Correct PIM6 mrt_lock handling ... Browse Code »

[ Upstream commit 25b4a44c19c83d98e8c0807a7ede07c1f28eab8b ]

In the IPv6 multicast routing code the mrt_lock was not being released
correctly in the MFC iterator, as a result adding or deleting a MIF would
cause a hang because the mrt_lock could not be acquired.

This fix is a copy of the code for the IPv4 case and ensures that the lock
is released correctly.

Signed-off-by: Richard Laing
Acked-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Richard Laing
2015-10-03 19:49:14 +0800
8bb9225a7 net: eth: altera: fix napi poll_list corruption ... Browse Code »

[ Upstream commit 4548a697e4969d695047cebd6d9af5e2f6cc728e ]

tse_poll() calls __napi_complete() with irq enabled. This leads napi
poll_list corruption and may stop all napi drivers working.
Use napi_complete() instead of __napi_complete().

Signed-off-by: Atsushi Nemoto
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Atsushi Nemoto
2015-10-03 19:49:14 +0800
5e0674808 net: fec: clear receive interrupts before processing a packet ... Browse Code »

[ Upstream commit ed63f1dcd5788d36f942fbcce350742385e3e18c ]

The patch just to re-submit the patch "db3421c114cfa6326" because the
patch "4d494cdc92b3b9a0" remove the change.

Clear any pending receive interrupt before we process a pending packet.
This helps to avoid any spurious interrupts being raised after we have
fully cleaned the receive ring, while still allowing an interrupt to be
raised if we receive another packet.

The position of this is critical: we must do this prior to reading the
next packet status to avoid potentially dropping an interrupt when a
packet is still pending.

Acked-by: Fugang Duan
Signed-off-by: Russell King
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Russell King
2015-10-03 19:49:14 +0800
8d14167cc ipv6: fix exthdrs offload registration in out_rt path ... Browse Code »

[ Upstream commit e41b0bedba0293b9e1e8d1e8ed553104b9693656 ]

We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
but in error path, we try to unregister it with inet_del_offload(). This
doesn't seem correct, it should actually be inet6_del_offload(), also
ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
also uses rthdr_offload twice), but it got removed entirely later on.

Fixes: 3336288a9fea ("ipv6: Switch to using new offload infrastructure.")
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Daniel Borkmann
2015-10-03 19:49:14 +0800
6a7ca4e15 sock, diag: fix panic in sock_diag_put_filterinfo ... Browse Code »

[ Upstream commit b382c08656000c12a146723a153b85b13a855b49 ]

diag socket's sock_diag_put_filterinfo() dumps classic BPF programs
upon request to user space (ss -0 -b). However, native eBPF programs
attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method:

Their orig_prog is always NULL. However, sock_diag_put_filterinfo()
unconditionally tries to access its filter length resp. wants to copy
the filter insns from there. Internal cBPF to eBPF transformations
attached to sockets don't have this issue, as orig_prog state is kept.

It's currently only used by packet sockets. If we would want to add
native eBPF support in the future, this needs to be done through
a different attribute than PACKET_DIAG_FILTER to not confuse possible
user space disassemblers that work on diag data.

Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Daniel Borkmann
Acked-by: Nicolas Dichtel
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Daniel Borkmann
2015-10-03 19:49:13 +0800
26b732f03 usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared ... Browse Code »

[ Upstream commit f50791ac1aca1ac1b0370d62397b43e9f831421a ]

It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop(), but its value should be read before it is cleared
when dev->flags is set to 0.

The problem was spotted and the fix was provided by
Oliver Neukum .

Signed-off-by: Eugene Shatokhin
Acked-by: Oliver Neukum
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eugene Shatokhin
2015-10-03 19:49:13 +0800
5184fc66f cls_u32: complete the check for non-forced case in u32_destroy() ... Browse Code »

[ Upstream commit a6c1aea044e490da3e59124ec55991fe316818d5 ]

In commit 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
I added a check in u32_destroy() to see if all real filters are gone
for each tp, however, that is only done for root_ht, same is needed
for others.

This can be reproduced by the following tc commands:

tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
ht 15:2: match ip src 10.0.0.2 flowid 1:10
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
ht 15:2: match ip src 10.0.0.3 flowid 1:10

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Reported-by: Akshat Kakkar
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

WANG Cong
2015-10-03 19:49:12 +0800
0fade09c7 vxlan: re-ignore EADDRINUSE from igmp_join ... Browse Code »

[ Upstream commit bef0057b7ba881d5ae67eec876df7a26fe672a59 ]

Before 56ef9c909b40[1] it used to ignore all errors from igmp_join().
That commit enhanced that and made it error out whatever error happened
with igmp_join(), but that's not good because when using multicast
groups vxlan will try to join it multiple times if the socket is reused
and then the 2nd and further attempts will fail with EADDRINUSE.

As we don't track to which groups the socket is already subscribed, it's
okay to just ignore that error.

Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Reported-by: John Nielsen
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Marcelo Ricardo Leitner
2015-10-03 19:49:12 +0800
5c02ee314 ip6_gre: release cached dst on tunnel removal ... Browse Code »

[ Upstream commit d4257295ba1b389c693b79de857a96e4b7cd8ac0 ]

When a tunnel is deleted, the cached dst entry should be released.

This problem may prevent the removal of a netns (seen with a x-netns IPv6
gre tunnel):
unregister_netdevice: waiting for lo to become free. Usage count = 3

CC: Dmitry Kozlov
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: huaibin Wang
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

huaibin Wang
2015-10-03 19:49:12 +0800

30 Sep, 2015

9 commits

cbc890891 Linux 4.1.9 Browse Code »

Greg Kroah-Hartman
2015-09-30 01:26:41 +0800
c3a0355bd cxl: Don't remove AFUs/vPHBs in cxl_reset ... Browse Code »

commit 4e1efb403c1c016ae831bd9988a7d2e5e0af41a0 upstream.

If the driver doesn't participate in EEH, the AFUs will be removed
by cxl_remove, which will be invoked by EEH.

If the driver does particpate in EEH, the vPHB needs to stick around
so that the it can particpate.

In both cases, we shouldn't remove the AFU/vPHB.

Reviewed-by: Cyril Bur
Signed-off-by: Daniel Axtens
Signed-off-by: Michael Ellerman
Reported-by: Guenter Roeck
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Daniel Axtens
2015-09-30 01:26:26 +0800
e55ffaf45 ipv4: off-by-one in continuation handling in /proc/net/route ... Browse Code »

[ Upstream commit 25b97c016b26039982daaa2c11d83979f93b71ab ]

When generating /proc/net/route we emit a header followed by a line for
each route. When a short read is performed we will restart this process
based on the open file descriptor. When calculating the start point we
fail to take into account that the 0th entry is the header. This leads
us to skip the first entry when doing a continuation read.

This can be easily seen with the comparison below:

while read l; do echo "$l"; done A
cat /proc/net/route >B
diff -bu A B | grep '^[+-]'

On my example machine I have approximatly 10KB of route output. There we
see the very first non-title element is lost in the while read case,
and an entry around the 8K mark in the cat case:

+wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
-tun1 00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0

Fix up the off-by-one when reaquiring position on continuation.

Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
BugLink: http://bugs.launchpad.net/bugs/1483440
Acked-by: Alexander Duyck
Signed-off-by: Andy Whitcroft
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Andy Whitcroft
2015-09-30 01:26:26 +0800
b21ee3425 net: dsa: Do not override PHY interface if already configured ... Browse Code »

[ Upstream commit 211c504a444710b1d8ce3431ac19f2578602ca27 ]

In case we need to divert reads/writes using the slave MII bus, we may have
already fetched a valid PHY interface property from Device Tree, and that
mode is used by the PHY driver to make configuration decisions.

If we could not fetch the "phy-mode" property, we will assign p->phy_interface
to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
to whether or not we should override the interface value.

Fixes: 19334920eaf7 ("net: dsa: Set valid phy interface type")
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Florian Fainelli
2015-09-30 01:26:26 +0800
0c1122ae6 inet: fix races with reqsk timers ... Browse Code »

[ Upstream commit 2235f2ac75fd2501c251b0b699a9632e80239a6d ]

reqsk_queue_destroy() and reqsk_queue_unlink() should use
del_timer_sync() instead of del_timer() before calling reqsk_put(),
otherwise we could free a req still used by another cpu.

But before doing so, reqsk_queue_destroy() must release syn_wait_lock
spinlock or risk a dead lock, as reqsk_timer_handler() might
need to take this same spinlock from reqsk_queue_unlink() (called from
inet_csk_reqsk_queue_drop())

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2015-09-30 01:26:26 +0800
d36f8434d inet: fix possible request socket leak ... Browse Code »

[ Upstream commit 3257d8b12f954c462d29de6201664a846328a522 ]

In commit b357a364c57c9 ("inet: fix possible panic in
reqsk_queue_unlink()"), I missed fact that tcp_check_req()
can return the listener socket in one case, and that we must
release the request socket refcount or we leak it.

Tested:

Following packetdrill test template shows the issue

0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

+0 < S 0:0(0) win 2920
+0 > S. 0:0(0) ack 1
+.002 < . 1:1(0) ack 21 win 2920
+0 > R 21:21(0)

Fixes: b357a364c57c9 ("inet: fix possible panic in reqsk_queue_unlink()")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2015-09-30 01:26:26 +0800
d397617f7 netlink: make sure -EBUSY won't escape from netlink_insert ... Browse Code »

[ Upstream commit 4e7c1330689e27556de407d3fdadc65ffff5eb12 ]

Linus reports the following deadlock on rtnl_mutex; triggered only
once so far (extract):

[12236.694209] NetworkManager D 0000000000013b80 0 1047 1 0x00000000
[12236.694218] ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018
[12236.694224] ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff
[12236.694230] ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80
[12236.694235] Call Trace:
[12236.694250] [] ? schedule_preempt_disabled+0x9/0x10
[12236.694257] [] ? schedule+0x2a/0x70
[12236.694263] [] ? schedule_preempt_disabled+0x9/0x10
[12236.694271] [] ? __mutex_lock_slowpath+0x7f/0xf0
[12236.694280] [] ? mutex_lock+0x16/0x30
[12236.694291] [] ? rtnetlink_rcv+0x10/0x30
[12236.694299] [] ? netlink_unicast+0xfb/0x180
[12236.694309] [] ? rtnl_getlink+0x113/0x190
[12236.694319] [] ? rtnetlink_rcv_msg+0x7a/0x210
[12236.694331] [] ? sock_has_perm+0x5c/0x70
[12236.694339] [] ? rtnetlink_rcv+0x30/0x30
[12236.694346] [] ? netlink_rcv_skb+0x9c/0xc0
[12236.694354] [] ? rtnetlink_rcv+0x1f/0x30
[12236.694360] [] ? netlink_unicast+0xfb/0x180
[12236.694367] [] ? netlink_sendmsg+0x484/0x5d0
[12236.694376] [] ? __wake_up+0x2f/0x50
[12236.694387] [] ? sock_sendmsg+0x33/0x40
[12236.694396] [] ? ___sys_sendmsg+0x22e/0x240
[12236.694405] [] ? ___sys_recvmsg+0x135/0x1a0
[12236.694415] [] ? eventfd_write+0x82/0x210
[12236.694423] [] ? fsnotify+0x32e/0x4c0
[12236.694429] [] ? wake_up_q+0x60/0x60
[12236.694434] [] ? __sys_sendmsg+0x39/0x70
[12236.694440] [] ? entry_SYSCALL_64_fastpath+0x12/0x6a

It seems so far plausible that the recursive call into rtnetlink_rcv()
looks suspicious. One way, where this could trigger is that the senders
NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
the rtnl_getlink() request's answer would be sent to the kernel instead
to the actual user process, thus grabbing rtnl_mutex() twice.

One theory would be that netlink_autobind() triggered via netlink_sendmsg()
internally overwrites the -EBUSY error to 0, but where it is wrongly
originating from __netlink_insert() instead. That would reset the
socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
later on. As commit d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs.")
also puts it, -EBUSY should not be propagated from netlink_insert().

It looks like it's very unlikely to reproduce. We need to trigger the
rhashtable_insert_rehash() handler under a situation where rehashing
currently occurs (one /rare/ way would be to hit ht->elasticity limits
while not filled enough to expand the hashtable, but that would rather
require a specifically crafted bind() sequence with knowledge about
destination slots, seems unlikely). It probably makes sense to guard
__netlink_insert() in any case and remap that error. It was suggested
that EOVERFLOW might be better than an already overloaded ENOMEM.

Reference: http://thread.gmane.org/gmane.linux.network/372676
Reported-by: Linus Torvalds
Signed-off-by: Daniel Borkmann
Acked-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Daniel Borkmann
2015-09-30 01:26:25 +0800
1d79bc603 bna: fix interrupts storm caused by erroneous packets ... Browse Code »

[ Upstream commit ade4dc3e616e33c80d7e62855fe1b6f9895bc7c3 ]

The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter
increment from the beginning of the NAPI processing loop after the check
for erroneous packets so they are never accounted. This counter is used
to inform firmware about number of processed completions (packets).
As these packets are never acked the firmware fires IRQs for them again
and again.

Fixes: e29aa33 ("bna: Enable Multi Buffer RX")
Signed-off-by: Ivan Vecera
Acked-by: Rasesh Mody
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Ivan Vecera
2015-09-30 01:26:25 +0800
b291dba31 bridge: netlink: account for the IFLA_BRPORT_PROXYARP_WIFI attribute size and policy ... Browse Code »

[ Upstream commit 786c2077ec8e9eab37a88fc14aac4309a8061e18 ]

The attribute size wasn't accounted for in the get_slave_size() callback
(br_port_get_slave_size) when it was introduced, so fix it now. Also add
a policy entry for it in br_port_policy.

Signed-off-by: Nikolay Aleksandrov
Fixes: 842a9ae08a25 ("bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi")
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Nikolay Aleksandrov
2015-09-30 01:26:25 +0800