04 Mar, 2016

11 commits

  • Currently, a copy of the Rx packet header is copied into the the sk_buff
    private data so that we can advance the pointer into the buffer,
    potentially discarding the original. At the moment, this copy is held in
    network byte order, but this means we're doing a lot of unnecessary
    translations.

    The reasons it was done this way are that we need the values in network
    byte order occasionally and we can use the copy, slightly modified, as part
    of an iov array when sending an ack or an abort packet.

    However, it seems more reasonable on review that it would be better kept in
    host byte order and that we make up a new header when we want to send
    another packet.

    To this end, rename the original header struct to rxrpc_wire_header (with
    BE fields) and institute a variant called rxrpc_host_header that has host
    order fields. Change the struct in the sk_buff private data into an
    rxrpc_host_header and translate the values when filling it in.

    This further allows us to keep values kept in various structures in host
    byte order rather than network byte order and allows removal of some fields
    that are byteswapped duplicates.

    Signed-off-by: David Howells

    David Howells
     
  • Rename call event names to begin RXRPC_CALL_EV_ to distinguish them from the
    flags.

    Signed-off-by: David Howells

    David Howells
     
  • Convert call flag and event numbers into enums and move their definitions
    outside of the struct.

    Also move the call state enum outside of the struct and add an extra
    element to count the number of states.

    Signed-off-by: David Howells

    David Howells
     
  • Fix a case where RXRPC_CALL_RELEASE (an event) is being used to specify a
    flag bit. RXRPC_CALL_RELEASED should be used instead.

    Signed-off-by: David Howells

    David Howells
     
  • Some devices declare a high number of TX queues, then set a much
    lower real_num_tx_queues

    This cause setups using fq_codel, sfq or fq as the default qdisc to consume
    more memory than really needed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Andrew and Ying Huang's test robot both reported usage count problems that
    trace back to the 'keep address on ifdown' patch.

    >From Andrew:
    We execute CRIU test on linux-next. On the current linux-next kernel
    they hangs on creating a network namespace.

    The kernel log contains many massages like this:
    [ 1036.122108] unregister_netdevice: waiting for lo to become free.
    Usage count = 2
    [ 1046.165156] unregister_netdevice: waiting for lo to become free.
    Usage count = 2
    [ 1056.210287] unregister_netdevice: waiting for lo to become free.
    Usage count = 2

    I tried to revert this patch and the bug disappeared.

    Here is a set of commands to reproduce this bug:

    [root@linux-next-test linux-next]# uname -a
    Linux linux-next-test 4.5.0-rc6-next-20160301+ #3 SMP Wed Mar 2
    17:32:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

    [root@linux-next-test ~]# unshare -n
    [root@linux-next-test ~]# ip link set up dev lo
    [root@linux-next-test ~]# ip a
    1: lo: mtu 65536 qdisc noqueue state UNKNOWN
    group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    [root@linux-next-test ~]# logout
    [root@linux-next-test ~]# unshare -n

    -----

    The problem is a change made to RTM_DELADDR case in __ipv6_ifa_notify that
    was added in an early version of the offending patch and is no longer
    needed.

    Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
    Cc: Andrey Wagin
    Cc: Ying Huang
    Signed-off-by: David Ahern
    Tested-by: Jeremiah Mahler
    Signed-off-by: David S. Miller

    David Ahern
     
  • Since ezchip network driver is written with big endian EZChip platform it
    is necessary to add support for little endian architecture.

    The first issue is that the order of the bits in a bit field is
    implementation specific. So all the bit fields are removed.
    Named constants are used to access necessary fields.

    And the second one is that network byte order is big endian.
    For example, data on ethernet is transmitted with most-significant
    octet (byte) first. So in case of little endian architecture
    it is important to swap data byte order when we read it from
    register. In case of unaligned access we can use "get_unaligned_be32"
    and in other case we can use function "ioread32_rep" which reads all
    data from register and works either with little endian or big endian
    architecture.

    And then when we are going to write data to register we need to restore
    byte order using the function "put_unaligned_be32" in case of
    unaligned access and in other case "iowrite32_rep".

    The last little fix is a space between type and pointer to observe
    coding style.

    Signed-off-by: Lada Trimasova
    Cc: Alexey Brodkin
    Cc: Noam Camus
    Cc: Tal Zilcer
    Cc: Arnd Bergmann
    Acked-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Lada Trimasova
     
  • Make use of ARCH_RENESAS in place of ARCH_SHMOBILE.

    This is part of an ongoing process to migrate from ARCH_SHMOBILE to
    ARCH_RENESAS the motivation for which being that RENESAS seems to be a more
    appropriate name than SHMOBILE for the majority of Renesas ARM based SoCs.

    Signed-off-by: Simon Horman
    Acked-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Simon Horman
     
  • The new NET_DEVLINK infrastructure can be a loadable module, but the drivers
    using it might be built-in, which causes link errors like:

    drivers/net/built-in.o: In function `mlx4_load_one':
    :(.text+0x2fbfda): undefined reference to `devlink_port_register'
    :(.text+0x2fc084): undefined reference to `devlink_port_unregister'
    drivers/net/built-in.o: In function `mlxsw_sx_port_remove':
    :(.text+0x33a03a): undefined reference to `devlink_port_type_clear'
    :(.text+0x33a04e): undefined reference to `devlink_port_unregister'

    There are multiple ways to avoid this:

    a) add 'depends on NET_DEVLINK || !NET_DEVLINK' dependencies
    for each user
    b) use 'select NET_DEVLINK' from each driver that uses it
    and hide the symbol in Kconfig.
    c) make NET_DEVLINK a 'bool' option so we don't have to
    list it as a dependency, and rely on the APIs to be
    stubbed out when it is disabled
    d) use IS_REACHABLE() rather than IS_ENABLED() to check for
    NET_DEVLINK in include/net/devlink.h

    This implements a variation of approach a) by adding an
    intermediate symbol that drivers can depend on, and changes
    the three drivers using it.

    Signed-off-by: Arnd Bergmann
    Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface")
    Fixes: c4745500e988 ("mlxsw: Implement devlink interface")
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    pull request: bluetooth-next 2016-03-01

    Here's our main set of Bluetooth & 802.15.4 patches for the 4.6 kernel.

    - New Bluetooth HCI driver for Intel/AG6xx controllers
    - New Broadcom ACPI IDs
    - LED trigger support for indicating Bluetooth powered state
    - Various fixes in mac802154, 6lowpan and related drivers
    - New USB IDs for AR3012 Bluetooth controllers

    Please let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • I added this check in setup_tc to multiple drivers,

    if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)

    Unfortunately restricting to TC_H_ROOT like this breaks the old
    instantiation of mqprio to setup a hardware qdisc. This patch
    relaxes the test to only check the type to make it equivalent
    to the check before I broke it. With this the old instantiation
    continues to work.

    A good smoke test is to setup mqprio with,

    # tc qdisc add dev eth4 root mqprio num_tc 8 \
    map 0 1 2 3 4 5 6 7 \
    queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7

    Fixes: e4c6734eaab9 ("net: rework ndo tc op to consume additional qdisc handle paramete")
    Reported-by: Singh Krishneil
    Reported-by: Jake Keller
    CC: Murali Karicheri
    CC: Shradha Shah
    CC: Or Gerlitz
    CC: Ariel Elior
    CC: Jeff Kirsher
    CC: Bruce Allan
    CC: Jesse Brandeburg
    CC: Don Skidmore
    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

03 Mar, 2016

29 commits

  • Left over from c24588afc536a35c924d014f13b669b20ccf8553
    ("atl1c: using fixed TXQ configuration for l2cb and l1c")

    Signed-off-by: Eric Engestrom
    Signed-off-by: David S. Miller

    Eric Engestrom
     
  • 8cc785f6f429c2a3fb81745dc142cbd72a462c4a ("net: ipv4: make the ping
    /proc code AF-independent") removed the code using it, but renamed this
    variable instead of removing it.

    Signed-off-by: Eric Engestrom
    Signed-off-by: David S. Miller

    Eric Engestrom
     
  • 3b766cd832328fcb87db3507e7b98cf42f21689d ("net/core: Add reading VF
    statistics through the PF netdevice") added that variable but it's never
    been used.

    Signed-off-by: Eric Engestrom
    Signed-off-by: David S. Miller

    Eric Engestrom
     
  • Hariprasad Shenai says:

    ====================
    cxgb4/cxgb4vf: Cleanup and minor fixes

    This series sets FBMIN to 64 bytes for Chelsio's T6 series of adapters,
    check to replenish fl is revised, some code cleanup in cxgb4vf sge
    initialization code and removes dead code.

    This patch series has been created against net-next tree and includes
    patches on cxgb4 and cxgb4vf driver.

    We have included all the maintainers of respective drivers. Kindly review
    the change and let us know in case of any review comments.
    ====================

    David S. Miller
     
  • Signed-off-by: Hariprasad Shenai
    Signed-off-by: David S. Miller

    Hariprasad Shenai
     
  • Function t4vf_wait_dev_ready() is already called in t4vf_prep_adapter(),
    no need to call it again in adap_init0().

    Signed-off-by: Hariprasad Shenai
    Signed-off-by: David S. Miller

    Hariprasad Shenai
     
  • Adds a new function t4vf_fl_pkt_align() and use the same in SGE
    initialization code to find out freelist packet alignment

    Signed-off-by: Hariprasad Shenai
    Signed-off-by: David S. Miller

    Hariprasad Shenai
     
  • T4 and T5 hardware will not coalesce Free List PCI-E Fetch Requests if
    the Host Driver provides more Free List Pointers than the Fetch Burst
    Minimum value. So if we set FBMIN to 64 bytes and the Host Driver
    supplies 128 bytes of Free List Pointer data, the hardware will issue two
    64-byte PCI-E Fetch Requests rather than a single coallesced 128-byte
    Fetch Request. T6 fixes this. So, for T4/T5 we set the FBMIN value to 128

    Signed-off-by: Hariprasad Shenai
    Signed-off-by: David S. Miller

    Hariprasad Shenai
     
  • Use freelist capacity instead of freelist size while checking, if
    freelist needs to be refilled

    Signed-off-by: Hariprasad Shenai
    Signed-off-by: David S. Miller

    Hariprasad Shenai
     
  • Alexandre TORGUE says:

    ====================
    stmmac: enhance driver performances and update the version

    According to Giuseppe, I send the v3 series.

    This is a subset of patches to rework the driver in order to improve its
    performances and make it more robust under stress conditions.

    All patches have been ported on STi mainstream kernel branch and
    tested on ARM STiH4xx platforms and newer ones.

    This series also updates the driver version and prepares it
    to include further development to support new chips.

    In detail, these patches are:

    o to rework and improve the internal DMA bus settings

    Fine tuning is mandatory on some platforms for both
    performance and stability issues.

    o to rework and optimize the descriptor management.

    This will help a lot on performance side and preparing
    the inclusion on the GMAC4.x.

    o to add a set of optimizations for both xmit and rx functions.

    These will help a lot on performance side and making the driver
    more robust in case of low memory conditions and under some
    stress test, performed for example on IP-STB.

    Below some throughput figures obtained on some boxes before and after
    the patches.

    nuttcp (mbps) iperf (Mbps)
    ------------------------------------------------------------------
    tcp udp tcp udp
    tx rx tx rx tx rx tx rx
    ------------------------------------------
    old 680 800 480 506 760 800 600 700
    new 830 880 540 630 840 880 700 800

    V2: - rx_copybreak is now managed by using ethtool.
    V3: - improve comments on PCIe detailing that there are no regressions
    - rework some APIs to properly define some params as bool as expected
    - rework the formula to get the element inside the ring. Comparing V2,
    patches 4 and 13 have been merged because the same formula have been
    used. After this rework, no evident benefit has been noticed in terms
    of performances so the table above is still valid. Disassembling the
    code for SH4 and ARM, with the new formula just an instr is saved
    (depending on compiler flags) and this gives us not so relevanti gain,
    for example, on SH4 where some instr are executed in the same pipeline
    stage.
    Ring sizes are now fixed and maybe they can be reworked to be tuned
    w/o using stmmaceth= cmdline option. Indeed, nobody change these sizes
    and indeed the numbers selected by default respect the budget and
    avoid to pass invalid setup. These are the best driver default sizes
    for ring and chain.

    ====================

    David S. Miller
     
  • This patch just updates the driver to the version fully
    tested on STi platforms. This version is Oct_2015.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • There is a threshold now used to also limit the skb allocation
    when use zero-copy. This is to avoid that there are incoherence
    in the ring due to a failure on skb allocation under very
    aggressive testing and under low memory conditions.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch is to allow this driver to copy tiny frames during the reception
    process. This is giving more stability while stressing the driver on STi
    embedded systems.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • phy_bus_name can be NULL when "fixed-link" property isn't used.
    Then, since "stmmac: do not poll phy handler when attach a switch",
    phy_bus_name ptr needs to be checked before strcmp is called.

    Signed-off-by: Fabrice Gasnier
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Fabrice Gasnier
     
  • This patch avoids to call the stmmac_adjust_link when
    the driver is connected to a switch by using the FIXED_PHY
    support. Prior this patch the phydev->irq was set as PHY_POLL
    so periodically the phy handler was invoked spending useless
    time because the link cannot actually change.
    Note that the stmmac_adjust_link will be called just one
    time and this guarantees that the ST glue logic will be
    setup according to the mode and speed fixed.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch is to fill the first descriptor just before granting
    the DMA engine so at the end of the xmit.
    The patch takes care about the algorithm adopted to mitigate the
    interrupts, then it fixes the last segment in case of no fragments.
    Moreover, this new implementation does not pass any "ter" field when
    prepare the descriptors because this is not necessary.
    The patch also details the memory barrier in the xmit.

    As final results, this patch guarantees the same performances
    but fixing a case if small datagram are sent. In fact, this
    kind of test is impacted if no coalesce is done.

    Signed-off-by: Fabrice Gasnier
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • The dirty index can be updated out of the loop where all the
    tx resources are claimed. This will help on performances too.
    Also a useless debug printk has been removed from the main loop.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch "inline" get_tx_owner and get_ls routines. It Results in a
    unique read to tdes0, instead of three, to check TX_OWN and LS bits,
    and other status bits.

    It helps improve driver TX path by removing two uncached read/writes
    inside TX clean loop for enhanced descriptors but not for normal ones
    because the des1 must be read in any case.

    Signed-off-by: Fabrice Gasnier
    Acked-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Fabrice Gasnier
     
  • This patch is to optimize the way to manage the TDES inside the
    xmit function. When prepare the frame, some settings (e.g. OWN
    bit) can be merged. This has been reworked to improve the tx
    performances.

    Signed-off-by: Fabrice Gasnier
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • The RDES0 register can be read several times while doing RX of a
    packet.
    This patch slightly improves RX path performance by reading rdes0
    once for two operation: check rx owner, get rx status bits.

    Signed-off-by: Fabrice Gasnier
    Acked-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Fabrice Gasnier
     
  • Optimize tx_clean by avoiding a des3 read in stmmac_clean_desc3().

    In ring mode, TX, des3 seems only used when xmit a jumbo frame.
    In case of normal descriptors, it may also be used for time
    stamping.
    Clean it in the above two case, without reading it.

    Signed-off-by: Fabrice Gasnier
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • last_segment field is read twice from dma descriptors in stmmac_clean().
    Add last_segment to dma data so that this flag is from priv
    structure in cache instead of memory.
    It avoids reading twice from memory for each loop in stmmac_clean().

    Signed-off-by: Fabrice Gasnier
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • Currently, the code pulls out the length field when
    unmapping a buffer directly from the descriptor. This will result
    in an uncached read to a dma_alloc_coherent() region. There is no
    need to do this, so this patch simply puts the value directly into
    a data structure which will hit the cache.

    Signed-off-by: Fabrice Gasnier
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch is to rework the ring management now optimized.
    The indexes into the ring buffer are always incremented, and
    the entry is accessed via doing a modulo to find the "real"
    position in the ring.
    It is inefficient, modulo is an expensive operation.

    The formula [(entry + 1) & (size - 1)] is now adopted on
    a ring that is power-of-2 in size.
    Then, the number of elements cannot be set by command line but
    it is fixed.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch completely changes the descriptor layout to improve
    the whole performances due to the single read usage of the
    descriptors in critical paths.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch restructures the DMA bus settings and this is done
    by introducing a new platform structure used for programming
    the AXI Bus Mode Register inside the DMA module.
    This structure can be populated from device-tree as documented in the
    binding txt file.

    After initializing the DMA, the AXI register can be optionally tuned
    for platform drivers based.
    This patch also reworks some parameters to make coherent the DMA
    configuration now that AXI register is introduced.
    For example, the burst_len is managed by using the mentioned axi
    support above; so the snps,burst-len parameter has been removed.
    It makes sense to provide the AAL parameter from DT to Address-Aligned
    Beats inside the Register0 and review the PBL settings when initialize
    the engine.

    For PCI glue, rebuilding the story of this setting, it
    was added to align a configuration so not for fixing some
    known problem. No issue raised after this patch.
    It is safe to use the default burst length instead of
    tuning it to the maximum value

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch is to share the same reset procedure between dwmac100 and
    dwmac1000 chips.
    This will also help on enhancing the driver and support new chips.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • Santosh Shilimkar says:

    ====================
    RDS: Major clean-up with couple of new features for 4.6

    v3:
    Re-generated the same series by omitting "-D" option from git format-patch
    command. Since first patch has file removals, git apply/am can't deal
    with it when formated with '-D' option.

    v2:
    Dropped module parameter from [PATCH 11/13] as suggested by David Miller

    Series is generated against net-next but also applies against Linus's tip
    cleanly. Entire patchset is available at below git tree:

    git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.6/net-next/rds_v2

    The diff-stat looks bit scary since almost ~4K lines of code is
    getting removed. Brief summary of the series:

    - Drop the stale iWARP support:
    RDS iWarp support code has become stale and non testable for
    sometime. As discussed and agreed earlier on list, am dropping
    its support for good. If new iWarp user(s) shows up in future,
    the plan is to adapt existing IB RDMA with special sink case.
    - RDS gets SO_TIMESTAMP support
    - Long due RDS maintainer entry gets updated
    - Some RDS IB code refactoring towards new FastReg Memory registration (FRMR)
    - Lastly the initial support for FRMR

    RDS IB RDMA performance with FRMR is not yet as good as FMR and I do have
    some patches in progress to address that. But they are not ready for 4.6
    so I left them out of this series.

    Also am keeping eye on new CQ API adaptations like other ULPs doing and
    will try to adapt RDS for the same most likely in 4.7+ timeframe.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Fastreg MR(FRMR) is another method with which one can
    register memory to HCA. Some of the newer HCAs supports only fastreg
    mr mode, so we need to add support for it to have RDS functional
    on them.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Avinash Repaka
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Avinash Repaka