15 Jan, 2017

37 commits

  • [ Upstream commit 24c63bbc18e25d5d8439422aa5fd2d66390b88eb ]

    Frank reported that vrf devices can be created with a table id of 0.
    This breaks many of the run time table id checks and should not be
    allowed. Detect this condition at create time and fail with EINVAL.

    Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
    Reported-by: Frank Kellermann
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 7a18c5b9fb31a999afc62b0e60978aa896fc89e9 ]

    fib_select_path does not call fib_select_multipath if oif is set in the
    flow struct. For VRF use cases oif is always set, so multipath route
    selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
    check similar to what is done in fib_table_lookup.

    Add saddr and proto to the flow struct for the fib lookup done by the
    VRF driver to better match hash computation for a flow.

    Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 0bbcc0a8fc394d01988fe0263ccf7fddb77a12c3 ]

    When trying to do interface down or changing interface configuration
    under heavy traffic, some of the adaptive moderation corner cases can
    occur and leave a WARN_ONCE call trace in the kernel log.

    Those WARN_ONCE are meant for debug only, and should have been inserted
    only under debug. We avoid such call traces by removing those WARN_ONCE.

    Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing")
    Signed-off-by: Gil Rockah
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Gil Rockah
     
  • [ Upstream commit 57ea52a865144aedbcd619ee0081155e658b6f7d ]

    The GRO fast path caches the frag0 address. This address becomes
    invalid if frag0 is modified by pskb_may_pull or its variants.
    So whenever that happens we must disable the frag0 optimization.

    This is usually done through the combination of gro_header_hard
    and gro_header_slow, however, the IPv6 extension header path did
    the pulling directly and would continue to use the GRO fast path
    incorrectly.

    This patch fixes it by disabling the fast path when we enter the
    IPv6 extension header path.

    Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
    Reported-by: Slava Shwartsman
    Signed-off-by: Herbert Xu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     
  • [ Upstream commit 7cfd5fd5a9813f1430290d20c0fead9b4582a307 ]

    On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
    so we shall use min_t() instead of min() to avoid a compiler error.

    Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
    Reported-by: kernel test robot
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 1272ce87fa017ca4cf32920764d879656b7a005a ]

    The GRO path has a fast-path where we avoid calling pskb_may_pull
    and pskb_expand by directly accessing frag0. However, this should
    only be done if we have enough tailroom in the skb as otherwise
    we'll have to expand it later anyway.

    This patch adds the check by capping frag0_len with the skb tailroom.

    Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull")
    Reported-by: Slava Shwartsman
    Signed-off-by: Herbert Xu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     
  • [ Upstream commit 5d722b3024f6762addb8642ffddc9f275b5107ae ]

    Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a
    new address family. Update the family name tables accordingly so
    that the lockdep initialization can use the proper names for this
    family.

    Cc: Courtney Cavin
    Cc: Bjorn Andersson
    Signed-off-by: Suman Anna
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Anna, Suman
     
  • [ Upstream commit faf3a932fbeb77860226a8323eacb835edc98648 ]

    It is perfectly possible to have non zero indexed switches being present
    in a DSA switch tree, in such a case, we will be deferencing a NULL
    pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive
    and ensure that dst->ds[0] is valid before doing anything with it.

    Fixes: 0c73c523cf73 ("net: dsa: Initialize CPU port ethtool ops per tree")
    Signed-off-by: Florian Fainelli
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit 75dc692eda114cb234a46cb11893a9c3ea520934 ]

    Pause the rx and make sure the rx fifo is empty when the autosuspend
    occurs.

    If the rx data comes when the driver is canceling the rx urb, the host
    controller would stop getting the data from the device and continue
    it after next rx urb is submitted. That is, one continuing data is
    split into two different urb buffers. That let the driver take the
    data as a rx descriptor, and unexpected behavior happens.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    hayeswang
     
  • [ Upstream commit 8fb280616878b81c0790a0c33acbeec59c5711f4 ]

    Split rtl8152_suspend() into rtl8152_system_suspend() and
    rtl8152_rumtime_suspend().

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    hayeswang
     
  • [ Upstream commit 2cfe8f8290bd28cf1ee67db914a6e76cf8e6437b ]

    We are implementing a MDIO bus which is behind another one, so use the
    nested version of the accessors to get lockdep annotations correct.

    Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
    Signed-off-by: Florian Fainelli
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit a4c61b92b3a4cbda35bb0251a5063a68f0861b2c ]

    We make the bcm_sf2 driver override ds->ops which points to
    b53_switch_ops since b53_switch_alloc() did the assignent. This is all
    well and good until a second b53 switch comes in, and ends up using the
    bcm_sf2 operations. Make a proper local copy, substitute the ds->ops
    pointer and then override the operations.

    Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when possible")
    Signed-off-by: Florian Fainelli
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit 9d5ecb09d525469abd1a10c096cb5a17206523f2 ]

    If after too many passes still no image could be emitted, then
    swap back to the original program as we do in all other cases
    and don't use the one with blinding.

    Fixes: 959a75791603 ("bpf, x86: add support for constant blinding")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit 926d93a33e59b2729afdbad357233c17184de9d2 ]

    The move from rx-handler to L3 receive handler inadvertantly dropped the
    rx counters. Restore them.

    Fixes: 74b20582ac38 ("net: l3mdev: Add hook in ip and ipv6")
    Reported-by: Dinesh Dutt
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 5350d54f6cd12eaff623e890744c79b700bd3f17 ]

    In the case of custom rules being present we need to handle the case of the
    LOCAL table being intialized after the new rule has been added. To address
    that I am adding a new check so that we can make certain we don't use an
    alias of MAIN for LOCAL when allocating a new table.

    Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
    Reported-by: Oliver Brunel
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexander Duyck
     
  • [ Upstream commit 7ababb782690e03b78657e27bd051e20163af2d6 ]

    5.2. Action on Reception of a Query

    When a system receives a Query, it does not respond immediately.
    Instead, it delays its response by a random amount of time, bounded
    by the Max Resp Time value derived from the Max Resp Code in the
    received Query message. A system may receive a variety of Queries on
    different interfaces and of different kinds (e.g., General Queries,
    Group-Specific Queries, and Group-and-Source-Specific Queries), each
    of which may require its own delayed response.

    Before scheduling a response to a Query, the system must first
    consider previously scheduled pending responses and in many cases
    schedule a combined response. Therefore, the system must be able to
    maintain the following state:

    o A timer per interface for scheduling responses to General Queries.

    o A per-group and interface timer for scheduling responses to Group-
    Specific and Group-and-Source-Specific Queries.

    o A per-group and interface list of sources to be reported in the
    response to a Group-and-Source-Specific Query.

    When a new Query with the Router-Alert option arrives on an
    interface, provided the system has state to report, a delay for a
    response is randomly selected in the range (0, [Max Resp Time]) where
    Max Resp Time is derived from Max Resp Code in the received Query
    message. The following rules are then used to determine if a Report
    needs to be scheduled and the type of Report to schedule. The rules
    are considered in order and only the first matching rule is applied.

    1. If there is a pending response to a previous General Query
    scheduled sooner than the selected delay, no additional response
    needs to be scheduled.

    2. If the received Query is a General Query, the interface timer is
    used to schedule a response to the General Query after the
    selected delay. Any previously pending response to a General
    Query is canceled.
    --8
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Michal Tesar
     
  • [ Upstream commit d0af683407a26a4437d8fa6e283ea201f2ae8146 ]

    __skb_flow_dissect can be called with a skb or a data packet, either
    can be NULL. All calls seems to have been moved to __skb_header_pointer
    except the pptp handling which is still calling skb_header_pointer.

    skb_header_pointer will use skb->data and thus:
    [ 109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
    [ 109.557102] IP: [] __skb_flow_dissect+0xa88/0xce0
    [ 109.557263] PGD 0
    [ 109.557338]
    [ 109.557484] Oops: 0000 [#1] SMP
    [ 109.557562] Modules linked in: chaoskey
    [ 109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79
    [ 109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015
    [ 109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000
    [ 109.558041] RIP: 0010:[] [] __skb_flow_dissect+0xa88/0xce0
    [ 109.558203] RSP: 0018:ffff94087fc83d40 EFLAGS: 00010206
    [ 109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800
    [ 109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000
    [ 109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022
    [ 109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000
    [ 109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f
    [ 109.558979] FS: 0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000
    [ 109.559326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0
    [ 109.559753] Stack:
    [ 109.559957] 000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0
    [ 109.560578] 0000000000000001 0000000000002000 0000000000000000 0000000000000000
    [ 109.561200] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [ 109.561820] Call Trace:
    [ 109.562027]
    [ 109.562108] [] ? eth_get_headlen+0x7a/0xf0
    [ 109.562522] [] ? igb_poll+0x96a/0xe80
    [ 109.562737] [] ? net_rx_action+0x20b/0x350
    [ 109.562953] [] ? __do_softirq+0xe8/0x280
    [ 109.563169] [] ? irq_exit+0xaa/0xb0
    [ 109.563382] [] ? do_IRQ+0x4b/0xc0
    [ 109.563597] [] ? common_interrupt+0x7f/0x7f
    [ 109.563810]
    [ 109.563890] [] ? cpuidle_enter_state+0x130/0x2c0
    [ 109.564304] [] ? cpuidle_enter_state+0x120/0x2c0
    [ 109.564520] [] ? cpu_startup_entry+0x19f/0x1f0
    [ 109.564737] [] ? start_secondary+0x12a/0x140
    [ 109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00
    00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2
    04 66 85 c0 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84
    24 84
    [ 109.569959] RIP [] __skb_flow_dissect+0xa88/0xce0
    [ 109.570245] RSP
    [ 109.570453] CR2: 0000000000000080

    Fixes: ab10dccb1160 ("rps: Inspect PPTP encapsulated by GRE to get flow hash")
    Signed-off-by: Ian Kumlien
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ian Kumlien
     
  • [ Upstream commit 3b48ab2248e61408910e792fe84d6ec466084c1a ]

    Final nlmsg_len field update must reflect inserted net_dm_drop_point
    data.

    This patch depends on previous patch:
    "drop_monitor: add missing call to genlmsg_end"

    Signed-off-by: Reiter Wolfgang
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Reiter Wolfgang
     
  • [ Upstream commit 4200462d88f47f3759bdf4705f87e207b0f5b2e4 ]

    Update nlmsg_len field with genlmsg_end to enable userspace processing
    using nlmsg_next helper. Also adds error handling.

    Signed-off-by: Reiter Wolfgang
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Reiter Wolfgang
     
  • [ Upstream commit f5a0aab84b74de68523599817569c057c7ac1622 ]

    IPv4 output routes already use l3mdev device instead of loopback for dst's
    if it is applicable. Change local input routes to do the same.

    This fixes icmp responses for unreachable UDP ports which are directed
    to the wrong table after commit 9d1a6c4ea43e4 because local_input
    routes use the loopback device. Moving from ingress device to loopback
    loses the L3 domain causing responses based on the dst to get to lost.

    Fixes: 9d1a6c4ea43e4 ("net: icmp_route_lookup should use rt dev to
    determine L3 domain")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit f0c16ba8933ed217c2688b277410b2a37ba81591 ]

    When we send a packet for our own local address on a non-loopback
    interface (e.g. eth0), due to the change had been introduced from
    commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
    original ingress device index would be set as the loopback interface.
    However, the packet should be considered as if it is being arrived via the
    sending interface (eth0), otherwise it would break the expectation of the
    userspace application (e.g. the DHCPRELEASE message from dhcp_release
    binary would be ignored by the dnsmasq daemon, since it come from lo which
    is not the interface dnsmasq bind to)

    Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
    Acked-by: David Ahern
    Signed-off-by: Wei Zhang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wei Zhang
     
  • [ Upstream commit 4775cc1f2d5abca894ac32774eefc22c45347d1c ]

    We miss to check if the netlink message is actually big enough to contain
    a struct if_stats_msg.

    Add a check to prevent userland from sending us short messages that would
    make us access memory beyond the end of the message.

    Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...")
    Signed-off-by: Mathias Krause
    Cc: Roopa Prabhu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mathias Krause
     
  • [ Upstream commit 37f304d10030bb425c19099e7b955d9c3ec4cba3 ]

    Disable netdev should come after it was closed, although no harm of doing it
    before -hence the MLX5E_STATE_DESTROYING bit- but it is more natural this way.

    Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
    Signed-off-by: Saeed Mahameed
    Reviewed-by: Mohamad Haj Yahia
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Saeed Mahameed
     
  • [ Upstream commit 610e89e05c3f28a7394935aa6b91f99548c4fd3c ]

    Skip setting netdev vxlan ports and netdev rx_mode on driver load
    when netdev is not yet registered.

    Synchronizing with netdev state is needed only on reset flow where the
    netdev remains registered for the whole reset period.

    This also fixes an access before initialization of net_device.addr_list_lock
    - which for some reason initialized on register_netdev - where we queued
    set_rx_mode work on driver load before netdev registration.

    Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
    Signed-off-by: Saeed Mahameed
    Reported-by: Sebastian Ott
    Reviewed-by: Mohamad Haj Yahia
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Saeed Mahameed
     
  • [ Upstream commit ccce1700263d8b5b219359d04180492a726cea16 ]

    Need to check that VF mac address entered by the admin user is either
    zero or unicast mac.
    Multicast mac addresses are prohibited.

    Fixes: 77256579c6b4 ('net/mlx5: E-Switch, Introduce Vport administration functions')
    Signed-off-by: Mohamad Haj Yahia
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mohamad Haj Yahia
     
  • [ Upstream commit 077b1e8069b9b74477b01d28f6b83774dc19a142 ]

    We need to mask the destination mac value with the destination mac
    mask when adding steering rule via ethtool.

    Fixes: 1174fce8d1410 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
    Signed-off-by: Maor Gottlieb
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Maor Gottlieb
     
  • [ Upstream commit d151d73dcc99de87c63bdefebcc4cb69de1cdc40 ]

    Avoid using a local variable named numa_node to avoid shadowing a public
    one.

    Fixes: db058a186f98 ('net/mlx5_core: Set irq affinity hints')
    Signed-off-by: Eli Cohen
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eli Cohen
     
  • [ Upstream commit 689a248df83b6032edc57e86267b4e5cc8d7174e ]

    If there is pending delayed work for health recovery it must be canceled
    if the device is being unloaded.

    Fixes: 05ac2c0b7438 ("net/mlx5: Fix race between PCI error handlers and health work")
    Signed-off-by: Daniel Jurgens
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Jurgens
     
  • [ Upstream commit 883371c453b937f9eb581fb4915210865982736f ]

    When setting HCA capabilities, set log_max_qp to be the minimum
    between the selected profile's value and the HCA limitation.

    Fixes: 938fe83c8dcb ('net/mlx5_core: New device capabilities...')
    Signed-off-by: Noa Osherovich
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Noa Osherovich
     
  • [ Upstream commit 0df0f207aab4f42e5c96a807adf9a6845b69e984 ]

    Since we now use a non zero mask on addr_type, we are matching on its
    value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
    failed in SW/classify path since its value was zero.
    This patch sets the proper value of addr_type for encapsulated packets.

    Fixes: 970bfcd09791 ('net/sched: cls_flower: Use mask for addr_type')
    Signed-off-by: Paul Blakey
    Reviewed-by: Hadar Hen Zion
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paul Blakey
     
  • [ Upstream commit 5701659004d68085182d2fd4199c79172165fa65 ]

    There is currently a small window during which the network device registered by
    stmmac can be made visible, yet all resources, including and clock and MDIO bus
    have not had a chance to be set up, this can lead to the following error to
    occur:

    [ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
    stmmac_dvr_probe: warning: cannot get CSR clock
    [ 473.919382] stmmaceth 0000:01:00.0: no reset control found
    [ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
    [ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
    [ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
    [ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
    [ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
    Enable RX Mitigation via HW Watchdog Timer
    [ 473.921395] libphy: PHY stmmac-1:00 not found
    [ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
    [ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
    PHY (error: -19)
    [ 473.959710] libphy: stmmac: probed
    [ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
    (stmmac-1:00) active
    [ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
    (stmmac-1:01)
    [ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
    (stmmac-1:02)
    [ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
    (stmmac-1:03)

    Fix this by making sure that register_netdev() is the last thing being done,
    which guarantees that the clock and the MDIO bus are available.

    Fixes: 4bfcbd7abce2 ("stmmac: Move the mdio_register/_unregister in probe/remove")
    Reported-by: Kweh, Hock Leong
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit 628185cfddf1dfb701c4efe2cfd72cf5b09f5702 ]

    Shahar reported a soft lockup in tc_classify(), where we run into an
    endless loop when walking the classifier chain due to tp->next == tp
    which is a state we should never run into. The issue only seems to
    trigger under load in the tc control path.

    What happens is that in tc_ctl_tfilter(), thread A allocates a new
    tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
    with it. In that classifier callback we had to unlock/lock the rtnl
    mutex and returned with -EAGAIN. One reason why we need to drop there
    is, for example, that we need to request an action module to be loaded.

    This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
    after we loaded and found the requested action, we need to redo the
    whole request so we don't race against others. While we had to unlock
    rtnl in that time, thread B's request was processed next on that CPU.
    Thread B added a new tp instance successfully to the classifier chain.
    When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
    and destroying its tp instance which never got linked, we goto replay
    and redo A's request.

    This time when walking the classifier chain in tc_ctl_tfilter() for
    checking for existing tp instances we had a priority match and found
    the tp instance that was created and linked by thread B. Now calling
    again into tp->ops->change() with that tp was successful and returned
    without error.

    tp_created was never cleared in the second round, thus kernel thinks
    that we need to link it into the classifier chain (once again). tp and
    *back point to the same object due to the match we had earlier on. Thus
    for thread B's already public tp, we reset tp->next to tp itself and
    link it into the chain, which eventually causes the mentioned endless
    loop in tc_classify() once a packet hits the data path.

    Fix is to clear tp_created at the beginning of each request, also when
    we replay it. On the paths that can cause -EAGAIN we already destroy
    the original tp instance we had and on replay we really need to start
    from scratch. It seems that this issue was first introduced in commit
    12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
    and avoid kernel panic when we use cls_cgroup").

    Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
    Reported-by: Shahar Klein
    Signed-off-by: Daniel Borkmann
    Cc: Cong Wang
    Acked-by: Eric Dumazet
    Tested-by: Shahar Klein
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit a98f91758995cb59611e61318dddd8a6956b52c3 ]

    By setting certain socket options on ipv6 raw sockets, we can confuse the
    length calculation in rawv6_push_pending_frames triggering a BUG_ON.

    RIP: 0010:[] [] rawv6_sendmsg+0xc30/0xc40
    RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282
    RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
    RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
    RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
    R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
    R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

    Call Trace:
    [] ? unmap_page_range+0x693/0x830
    [] inet_sendmsg+0x67/0xa0
    [] sock_sendmsg+0x38/0x50
    [] SYSC_sendto+0xef/0x170
    [] SyS_sendto+0xe/0x10
    [] do_syscall_64+0x50/0xa0
    [] entry_SYSCALL64_slow_path+0x25/0x25

    Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

    Reproducer:

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define LEN 504

    int main(int argc, char* argv[])
    {
    int fd;
    int zero = 0;
    char buf[LEN];

    memset(buf, 0, LEN);

    fd = socket(AF_INET6, SOCK_RAW, 7);

    setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
    setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

    sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
    }

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dave Jones
     
  • [ Upstream commit 39b2dd765e0711e1efd1d1df089473a8dd93ad48 ]

    Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
    the packet. For sockets that have transport headers pulled, transport
    offset can be negative. Use signed comparison to avoid overflow.

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Reported-by: Nisar Jagabar
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 08abb79542c9e8c367d1d8e44fe1026868d3f0a7 ]

    Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
    when it failed to find a transport by sctp_addrs_lookup_transport.

    This patch is to fix it by moving up rcu_read_unlock right before checking
    transport and also to remove the out path.

    Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit eb63ecc1706b3e094d0f57438b6c2067cfc299f2 ]

    Locally originated traffic in a VRF fails in the presence of a POSTROUTING
    rule. For example,

    $ iptables -t nat -A POSTROUTING -s 11.1.1.0/24 -j MASQUERADE
    $ ping -I red -c1 11.1.1.3
    ping: Warning: source address might be selected on device other than red.
    PING 11.1.1.3 (11.1.1.3) from 11.1.1.2 red: 56(84) bytes of data.
    ping: sendmsg: Operation not permitted

    Worse, the above causes random corruption resulting in a panic in random
    places (I have not seen a consistent backtrace).

    Call nf_reset to drop the conntrack info following the pass through the
    VRF device. The nf_reset is needed on Tx but not Rx because of the order
    in which NF_HOOK's are hit: on Rx the VRF device is after the real ingress
    device and on Tx it is is before the real egress device. Connection
    tracking should be tied to the real egress device and not the VRF device.

    Fixes: 8f58336d3f78a ("net: Add ethernet header for pass through VRF device")
    Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit a0f37efa82253994b99623dbf41eea8dd0ba169b ]

    Connection tracking with VRF is broken because the pass through the VRF
    device drops the connection tracking info. Removing the call to nf_reset
    allows DNAT and MASQUERADE to work across interfaces within a VRF.

    Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

12 Jan, 2017

3 commits