18 Oct, 2014

15 commits

  • Commit b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
    replaces mii_bus with a generic host_dev, and introduces
    dsa_host_dev_to_mii_bus() to support conversion from host_dev to mii_bus.
    However, in some cases it uses to_mii_bus to perform that conversion.
    Since host_dev is not the phy bus device but typically a platform device,
    this fails and results in a crash with the affected drivers.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] __mutex_lock_slowpath+0x75/0x100
    PGD 406783067 PUD 406784067 PMD 0
    Oops: 0002 [#1] SMP
    ...
    Call Trace:
    [] ? pick_next_task_fair+0x61b/0x880
    [] mutex_lock+0x23/0x37
    [] mdiobus_read+0x34/0x60
    [] __mv88e6xxx_reg_read+0x8a/0xa0
    [] mv88e6xxx_reg_read+0x4c/0xa0

    Fixes: b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
    Cc: Alexander Duyck
    Signed-off-by: Guenter Roeck
    Acked-by: Alexander Duyck
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • In commit ec8a2e5621db2da24badb3969eda7fd359e1869f ("tipc: same receive
    code path for connection protocol and data messages") we omitted the
    the possiblilty that an arriving message extracted from a bundle buffer
    may be a multicast message. Such messages need to be to be delivered to
    the socket via a separate function, tipc_sk_mcast_rcv(). As a result,
    small multicast messages arriving as members of a bundle buffer will be
    silently dropped.

    This commit corrects the error by considering this case in the function
    tipc_link_bundle_rcv().

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line
    misses") added a regression for SO_BINDTODEVICE on IPv6.

    This is because we still use inet6_iif() which expects that IP6 control
    block is still at the beginning of skb->cb[]

    This patch adds tcp_v6_iif() helper and uses it where necessary.

    Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
    parameter to it.

    Signed-off-by: Eric Dumazet
    Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Don't ring the doorbell, and don't do PIO. This will also prevent
    TX Push, because there will be more than one buffer waiting when
    the doorbell is rung.

    Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • Remove calling cancel_delayed_work_sync() for runtime suspend,
    because it would cause dead lock. Instead, return -EBUSY to
    avoid the device enters suspending if the net is running and
    the delayed work is pending or running. The delayed work would
    try to wake up the device later, so the suspending is not
    necessary.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller

    hayeswang
     
  • pskb_may_pull() maybe change skb->data and make uh pointer oboslete,
    so reload uh and guehdr

    Fixes: 37dd0247 ("gue: Receive side for Generic UDP Encapsulation")
    Cc: Tom Herbert
    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • pskb_may_pull() maybe change skb->data and make eth pointer oboslete,
    so set eth after pskb_may_pull()

    Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module")
    Cc: Pravin B Shelar
    Signed-off-by: Li RongQing
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Li RongQing
     
  • In case that the IP header has optional field at the end, this patch will
    get the port numbers after that field, and compute the hash. The general
    parser skb_flow_dissect() is used here.

    Signed-off-by: Haiyang Zhang
    Reviewed-by: K. Y. Srinivasan
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • If megaflows are disabled, the userspace does not send the netlink attribute
    OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.

    sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
    bytes that represent padding for struct sw_flow, or the bytes that represent
    fields that may not be set during ovs_flow_extract().
    This is a problem, because when we extract a flow from a packet,
    we do not memset() anymore the struct sw_flow to 0.

    This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
    which operates on the netlink attributes rather than on the mask key. Using
    this approach we are sure that only the bytes that the user provided in the
    flow are matched.

    Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
    now return with an error.

    This bug is introduced by commit 0714812134d7dcadeb7ecfbfeb18788aa7e1eaac
    ("openvswitch: Eliminate memset() from flow_extract").

    Reported-by: Alex Wang
    Signed-off-by: Daniele Di Proietto
    Signed-off-by: Andy Zhou
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • pskb_may_pull maybe change skb->data and make eth pointer oboslete,
    so eth needs to reload

    Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible")
    Cc: Eric Dumazet
    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
    setting after arphdr_ok to avoid the use the freed memory

    Fixes: 0714812134d7d ("openvswitch: Eliminate memset() from flow_extract.")
    Cc: Jesse Gross
    Cc: Eric Dumazet
    Signed-off-by: Li RongQing
    Acked-by: Jesse Gross
    Signed-off-by: David S. Miller

    Li RongQing
     
  • ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
    and in case errors in __ip_append_data() nobody frees stolen dst entry

    Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
    Signed-off-by: Vasily Averin
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Vasily Averin
     
  • We can retrieve opt from skb, no need to pass it as a parameter.
    And opt should always be non-NULL, no need to check.

    Cc: Krzysztof Kolasa
    Cc: Eric Dumazet
    Tested-by: Krzysztof Kolasa
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Cong Wang
     
  • cookie_v4_check() allocates ip_options_rcu in the same way
    with tcp_v4_save_options(), we can just make it a helper function.

    Cc: Krzysztof Kolasa
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Cong Wang
     
  • commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    missed that cookie_v4_check() still calls ip_options_echo() which uses
    IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo()
    instead.

    Fixes: commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    Cc: Krzysztof Kolasa
    Cc: Eric Dumazet
    Reported-by: Krzysztof Kolasa
    Tested-by: Krzysztof Kolasa
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Cong Wang
     

17 Oct, 2014

3 commits

  • This simplifies the lanai.c driver by using
    the module_pci_driver() macro, at the expense
    of losing only debugging messages.

    Signed-off-by: Michael Opdenacker
    Signed-off-by: David S. Miller

    Michael Opdenacker
     
  • Avoid confusion between pid and portid.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • Jeff Kirsher says:

    ====================
    Intel Wired LAN Driver Updates 2014-10-16

    This series contains updates to fm10k and ixgbe.

    Matthew provides two fixes for fm10k, first sets the flag to fetch the
    host state before kicking off the service task that reads the host
    state when bringing the interface up. The second makes sure that we
    release the mailbox lock after detecting an error and before we return
    the error code.

    Andy Zhou provides a compile fix for fm10k, when the driver is compiled
    into the kernel and the VXLAN driver is compiled as a module.

    Emil provides a fix for ixgbe to prevent against a panic by trying
    to dereference a NULL pointer in ixgbe_ndo_set_vf_spoofchk().
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Oct, 2014

13 commits

  • The check for vfinfo is not sufficient because it does not protect
    against specifying vf that is outside of sriov_num_vfs range.
    All of the ndo functions have a check for it except for
    ixgbevf_ndo_set_spoofcheck().

    The following patch is all we need to protect against this panic:

    ip link set p96p1 vf 0 spoofchk off
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000052
    IP: []
    ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]

    Reported-by: Thierry Herbelot
    Signed-off-by: Emil Tantilov
    Acked-by: Thierry Herbelot
    Signed-off-by: Jeff Kirsher

    Emil Tantilov
     
  • Compiling with CONFIG_FM10K=y and VXLAN=m resulting in linking error:

    drivers/built-in.o: In function `fm10k_open':
    (.text+0x1f9d7a): undefined reference to `vxlan_get_rx_port'
    make: *** [vmlinux] Error 1

    The fix follows the same strategy as I40E.

    Signed-off-by: Andy Zhou
    Acked-by: Alexander Duyck
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Andy Zhou
     
  • After grabbing the mailbox lock and detecting an error, the lock must be
    released before the error code can be returned.

    Signed-off-by: Matthew Vick
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Matthew Vick
     
  • Set the flag to fetch the host state before kicking off the service task
    that reads the host state when bringing the interface back up.

    Signed-off-by: Matthew Vick
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Matthew Vick
     
  • pskb_may_pull should be used to check if skb->data has enough space,
    skb->len can not ensure that.

    Cc: Cong Wang
    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • when netif_rx() is done, the netif_rx handled skb maybe be freed,
    and should not be used.

    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • All functions used struct vport *vport except
    ovs_vport_find_upcall_portid.

    This fixes 1 kerneldoc warning

    Signed-off-by: Fabian Frederick
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Fabian Frederick
     
  • s/sock/gs

    Signed-off-by: Fabian Frederick
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Fabian Frederick
     
  • For each Rx frame the eTSEC writes its FCS (Frame Check Sequence)
    to the Rx buffer.

    The eTSEC h/w manual states in the "Receive Buffer Descriptor Field
    Descriptions" table:
    "Data length is the number of octets written by the eTSEC into this BD's
    data buffer if L is cleared (the value is equal to MRBLR), or, if L is
    set, the length of the frame including *CRC*, FCB (if RCTRL[PRSDEP > 00),
    preamble (if MACCFG2[PreAmRxEn]=1), time stamp (if RCTRL[TS] = 1) and
    any padding (RCTRL[PAL])."

    Though the FCS bytes are removed by the driver before passing the skb
    to the net stack, the Rx buffer size computation does not currently
    take into account the FCS bytes (4 bytes).
    Because the Rx buffer size is multiple of 512 bytes, leaving out the
    FCS is not a problem for the default MTU of 1500, as the Rx buffer size
    is 1536 in this case. However, for custom MTUs, where the difference
    between the MTU size and the Rx buffer size is less, this can be a
    problem as the computed Rx buffer size won't be enough to accomodate
    the FCS for a received frame that is big enough (close to MTU size).
    In such case the received frame is considered to be incomplete (L flag
    not set in the RxBD status) and silently dropped.

    Note that the driver does not currently support S/G on Rx, so it has to
    compute its Rx buffer size based on the MTU of the device.

    Reported-by: Kristian Otnes
    Signed-off-by: Claudiu Manoil
    Signed-off-by: David S. Miller

    Claudiu Manoil
     
  • commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
    net: Remove ndo_xmit_flush netdev operation, use signalling instead.

    added code that looks at skb->xmit_more after the skb has
    been put in TX VQ. Since some paths process the ring and free the skb
    immediately, this can cause use after free.

    Fix by storing xmit_more in a local variable.

    Cc: David S. Miller
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • iMX6SX IEEE 1588 module has one hw issue in capturing the ATVR register.
    The current SW flow is:
    ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
    ts_counter_ns = ENET0->ATVR;
    The ATVR value is not expected value that cause LinuxPTP stack cannot be convergent.

    ENET Block Guide/ Chapter for the iMX6SX (PELE) address the issue:
    After set ENET_ATCR[Capture], there need some time cycles before the counter
    value is capture in the register clock domain. The wait-time-cycles is at least
    6 clock cycles of the slower clock between the register clock and the 1588 clock.
    So need something like:
    ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
    wait();
    ts_counter_ns = ENET0->ATVR;

    For iMX6SX, the 1588 ts_clk is fixed to 25Mhz, register clock is 66Mhz, so the
    wait-time-cycles must be greater than 240ns (40ns * 6). The patch add 1us delay
    before cpu read ATVR register.

    Changes V2:
    Modify the commit/comments log to describe the issue clearly.

    Signed-off-by: Fugang Duan
    Acked-by: Richard Cochran
    Signed-off-by: David S. Miller

    Nimrod Andy
     
  • Identified by kbuild test robot. csk family is always set to be AF_INET or
    AF_INET6, so skb will always be initialized to some value but there is no harm
    in silencing the warning anyways.

    Signed-off-by: Anish Bhatt
    Fixes : f42bb57c61fd ('cxgb4i : Fix -Wunused-function warning')
    Signed-off-by: David S. Miller

    Anish Bhatt
     
  • Add ndo_gso_check which a device can define to indicate whether is
    is capable of doing GSO on a packet. This funciton would be called from
    the stack to determine whether software GSO is needed to be done. A
    driver should populate this function if it advertises GSO types for
    which there are combinations that it wouldn't be able to handle. For
    instance a device that performs UDP tunneling might only implement
    support for transparent Ethernet bridging type of inner packets
    or might have limitations on lengths of inner headers.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

15 Oct, 2014

9 commits

  • this patch is to fix the stmmac data compatibilities for
    all the SoCs inside the platform file.

    Reported-by: Stephen Rothwell
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: David S. Miller

    Giuseppe CAVALLARO
     
  • Anish Bhatt says:

    ====================
    ipv6 and related cleanup for cxgb4/cxgb4i

    This patch set removes some duplicated/extraneous code from cxgb4i, guards
    cxgb4 against compilation failure based on ipv6 tristate, make ipv6 related
    code no longer be enabled by default irrespective of ipv6 tristate and fixes
    a refcnt issue.
    -Anish

    v2 : Provide more detailed commit messages, make subject more concise as
    recommended by Dave Miller.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There is an extra call to dst_neigh_lookup() leftover in cxgb4i that can cause
    an unreleased refcnt issue. Remove extraneous call.

    Signed-off-by: Anish Bhatt

    Fixes : 759a0cc5a3e1b ('cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api')
    Signed-off-by: David S. Miller

    Anish Bhatt
     
  • A bunch of ipv6 related code is left on by default. While this causes no
    compilation issues, there is no need to have this enabled by default. Guard
    with an ipv6 check, which also takes care of a -Wunused-function warning.

    Signed-off-by: Anish Bhatt
    Signed-off-by: David S. Miller

    Anish Bhatt
     
  • cxgb4 ipv6 does not guard against ipv6 being disabled, or the standard
    ipv6 module vs inbuilt tri-state issue. This was fixed for cxgb4i & iw_cxgb4
    but missed for cxgb4.

    Signed-off-by: Anish Bhatt
    Signed-off-by: David S. Miller

    Anish Bhatt
     
  • cxgb4 already handles CLIP updates from a previous changeset for iw_cxgb4,
    there is no need to have this functionality in cxgb4i. Remove duplicated code

    Signed-off-by: Anish Bhatt
    Signed-off-by: David S. Miller

    Anish Bhatt
     
  • TCP Small queues tries to keep number of packets in qdisc
    as small as possible, and depends on a tasklet to feed following
    packets at TX completion time.
    Choice of tasklet was driven by latencies requirements.

    Then, TCP stack tries to avoid reorders, by locking flows with
    outstanding packets in qdisc in a given TX queue.

    What can happen is that many flows get attracted by a low performing
    TX queue, and cpu servicing TX completion has to feed packets for all of
    them, making this cpu 100% busy in softirq mode.

    This became particularly visible with latest skb->xmit_more support

    Strategy adopted in this patch is to detect when tcp_wfree() is called
    from ksoftirqd and let the outstanding queue for this flow being drained
    before feeding additional packets, so that skb->ooo_okay can be set
    to allow select_queue() to select the optimal queue :

    Incoming ACKS are normally handled by different cpus, so this patch
    gives more chance for these cpus to take over the burden of feeding
    qdisc with future packets.

    Tested:

    lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &

    lpaa23:~# sar -n DEV 1 10 | grep eth1
    06:16:18 AM eth1 595448.00 1190564.00 38381.09 1760253.12 0.00 0.00 1.00
    06:16:19 AM eth1 594858.00 1189686.00 38340.76 1758952.72 0.00 0.00 0.00
    06:16:20 AM eth1 597017.00 1194019.00 38480.79 1765370.29 0.00 0.00 1.00
    06:16:21 AM eth1 595450.00 1190936.00 38380.19 1760805.05 0.00 0.00 0.00
    06:16:22 AM eth1 596385.00 1193096.00 38442.56 1763976.29 0.00 0.00 1.00
    06:16:23 AM eth1 598155.00 1195978.00 38552.97 1768264.60 0.00 0.00 0.00
    06:16:24 AM eth1 594405.00 1188643.00 38312.57 1757414.89 0.00 0.00 1.00
    06:16:25 AM eth1 593366.00 1187154.00 38252.16 1755195.83 0.00 0.00 0.00
    06:16:26 AM eth1 593188.00 1186118.00 38232.88 1753682.57 0.00 0.00 1.00
    06:16:27 AM eth1 596301.00 1192241.00 38440.94 1762733.09 0.00 0.00 0.00
    Average: eth1 595457.30 1190843.50 38381.69 1760664.84 0.00 0.00 0.50
    lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
    backlog 7606336b 2513p requeues 167982
    backlog 224072b 74p requeues 566
    backlog 581376b 192p requeues 5598
    backlog 181680b 60p requeues 1070
    backlog 5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows
    backlog 157456b 52p requeues 1758
    backlog 672216b 222p requeues 3025
    backlog 60560b 20p requeues 24541
    backlog 448144b 148p requeues 21258

    lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect

    Immediate jump to full bandwidth, and traffic is properly
    shard on all tx queues.

    lpaa23:~# sar -n DEV 1 10 | grep eth1
    06:16:46 AM eth1 1397632.00 2795397.00 90081.87 4133031.26 0.00 0.00 1.00
    06:16:47 AM eth1 1396874.00 2793614.00 90032.99 4130385.46 0.00 0.00 0.00
    06:16:48 AM eth1 1395842.00 2791600.00 89966.46 4127409.67 0.00 0.00 1.00
    06:16:49 AM eth1 1395528.00 2791017.00 89946.17 4126551.24 0.00 0.00 0.00
    06:16:50 AM eth1 1397891.00 2795716.00 90098.74 4133497.39 0.00 0.00 1.00
    06:16:51 AM eth1 1394951.00 2789984.00 89908.96 4125022.51 0.00 0.00 0.00
    06:16:52 AM eth1 1394608.00 2789190.00 89886.90 4123851.36 0.00 0.00 1.00
    06:16:53 AM eth1 1395314.00 2790653.00 89934.33 4125983.09 0.00 0.00 0.00
    06:16:54 AM eth1 1396115.00 2792276.00 89984.25 4128411.21 0.00 0.00 1.00
    06:16:55 AM eth1 1396829.00 2793523.00 90030.19 4130250.28 0.00 0.00 0.00
    Average: eth1 1396158.40 2792297.00 89987.09 4128439.35 0.00 0.00 0.50

    lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
    backlog 7900052b 2609p requeues 173287
    backlog 878120b 290p requeues 589
    backlog 1068884b 354p requeues 5621
    backlog 996212b 329p requeues 1088
    backlog 984100b 325p requeues 115316
    backlog 956848b 316p requeues 1781
    backlog 1080996b 357p requeues 3047
    backlog 975016b 322p requeues 24571
    backlog 990156b 327p requeues 21274

    (All 8 TX queues get a fair share of the traffic)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Rajesh Borundia says:

    ====================
    qlcnic: Bug fixes

    This series fixes following issues.

    * We were programming maximum number of arguments supported by
    adapter instead of required in a command.
    * Destroy tx command requires three arguments instead of two.

    Please apply these patches to net.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • o Number of arguments taken by destroy tx command is three
    instead of two.

    Signed-off-by: Rajesh Borundia
    Signed-off-by: David S. Miller

    Rajesh Borundia