20 Jul, 2012

25 commits

  • This patch adds support for a second clock to the flexcan driver. On
    modern freescale ARM cores like the imx53 and imx6q two clocks ("ipg"
    and "per") must be enabled in order to access the CAN core.

    In the original driver, the clock was requested without specifying the
    connection id, further all mainline ARM archs with flexcan support
    (imx28, imx25, imx35) register their flexcan clock without a
    connection id, too.

    This patch first renames the existing clk variable to clk_ipg and
    converts it to devm for easier error handling. The connection id "ipg"
    is added to the devm_clk_get() call. Then a second clock "per" is
    requested. As all archs don't specify a connection id, both clk_get
    return the same clock. This ensures compatibility to existing flexcan
    support and adds support for imx53 at the same time.

    After this patch hits mainline, the archs may give their existing
    flexcan clock the "ipg" connection id and implement a dummy "per"
    clock.

    This patch has been tested on imx28 (unmodified clk tree) and on imx53
    with a seperate "ipg" and "per" clock.

    Cc: Sascha Hauer
    Cc: Shawn Guo
    Signed-off-by: Steffen Trumtrar
    Acked-by: Hui Wang
    Signed-off-by: Marc Kleine-Budde

    Steffen Trumtrar
     
  • This patch marks the bittiming_const pointer as in the struct can_pric as
    "const". This allows us to mark the struct can_bittiming_const in the CAN
    drivers as "const", too.

    Signed-off-by: Marc Kleine-Budde

    Marc Kleine-Budde
     
  • David S. Miller
     
  • Fix again the diff value in rt_bind_exception
    after collision of two latest patches, my original commit
    actually fixed the same problem.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Conflicts:
    drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c

    David S. Miller
     
  • In trusted networks, e.g., intranet, data-center, the client does not
    need to use Fast Open cookie to mitigate DoS attacks. In cookie-less
    mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless
    of cookie availability.

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • On paths with firewalls dropping SYN with data or experimental TCP options,
    Fast Open connections will have experience SYN timeout and bad performance.
    The solution is to track such incidents in the cookie cache and disables
    Fast Open temporarily.

    Since only the original SYN includes data and/or Fast Open option, the
    SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect
    such drops. If a path has recurring Fast Open SYN drops, Fast Open is
    disabled for 2^(recurring_losses) minutes starting from four minutes up to
    roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but
    it behaves as connect() then write().

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
    and write(2). The application should replace connect() with it to
    send data in the opening SYN packet.

    For blocking socket, sendmsg() blocks until all the data are buffered
    locally and the handshake is completed like connect() call. It
    returns similar errno like connect() if the TCP handshake fails.

    For non-blocking socket, it returns the number of bytes queued (and
    transmitted in the SYN-data packet) if cookie is available. If cookie
    is not available, it transmits a data-less SYN packet with Fast Open
    cookie request option and returns -EINPROGRESS like connect().

    Using MSG_FASTOPEN on connecting or connected socket will result in
    simlar errno like repeating connect() calls. Therefore the application
    should only use this flag on new sockets.

    The buffer size of sendmsg() is independent of the MSS of the connection.

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • On receiving the SYN-ACK after SYN-data, the client needs to
    a) update the cached MSS and cookie (if included in SYN-ACK)
    b) retransmit the data not yet acknowledged by the SYN-ACK in the final ACK of
    the handshake.

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • This patch implements sending SYN-data in tcp_connect(). The data is
    from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

    The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
    type of the SYN. If the cookie is not cached (len==0), the host sends
    data-less SYN with Fast Open cookie request option to solicit a cookie
    from the remote. If cookie is not available (len > 0), the host sends
    a SYN-data with Fast Open cookie option. If cookie length is negative,
    the SYN will not include any Fast Open option (for fall back operations).

    To deal with middleboxes that may drop SYN with data or experimental TCP
    option, the SYN-data is only sent once. SYN retransmits do not include
    data or Fast Open options. The connection will fall back to regular TCP
    handshake.

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
    The basic ones are MSS and the cookies. Later patch will cache more to
    handle unfriendly middleboxes.

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • This patch impelements the common code for both the client and server.

    1. TCP Fast Open option processing. Since Fast Open does not have an
    option number assigned by IANA yet, it shares the experiment option
    code 254 by implementing draft-ietf-tcpm-experimental-options
    with a 16 bits magic number 0xF989. This enables global experiments
    without clashing the scarce(2) experimental options available for TCP.

    When the draft status becomes standard (maybe), the client should
    switch to the new option number assigned while the server supports
    both numbers for transistion.

    2. The new sysctl tcp_fastopen

    3. A place holder init function

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • In its receive path, mlx4_en driver maps each page chunk that it pushes
    to the hardware and unmaps it when pushing it up the stack. This limits
    throughput to about 3Gbps on a Power7 8-core machine.

    One solution is to map the entire allocated page at once. However, this
    requires that we keep track of every page fragment we give to a
    descriptor. We also need to work with the discipline that all fragments will
    be released (in the sense that it will not be reused by the driver
    anymore) in the order they are allocated to the driver.

    This requires that we don't reuse any fragments, every single one of
    them must be reallocated. We do that by releasing all the fragments that
    are processed and only after finished processing the descriptors, we
    start the refill.

    We also must somehow guarantee that we either refill all fragments in a
    descriptor or none at all, without resorting to giving up a page
    fragment that we would have already given. Otherwise, we would break the
    discipline of only releasing the fragments in the order they were
    allocated.

    This has passed page allocation fault injections (restricted to the
    driver by using required-start and required-end) and device hotplug
    while 16 TCP streams were able to deliver more than 9Gbps.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     
  • Dynamically allocated sysfs attributes must be initialized using
    sysfs_attr_init(), otherwise lockdep complains:
    BUG: key

    not in .data!

    Signed-off-by: Michal Schmidt
    Acked-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Michal Schmidt
     
  • Update the references to bridge utilities and web pages
    to current locations

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • commit 9ac32e1b firmware: convert e100 driver to request_firmware()

    did a straight conversion of the in-driver ucode to external
    files. This introduced the possibility of the driver failing
    to enable an interface due to missing ucode. There was no
    evaluation of the importance of the ucode at the time.

    Based on comments in earlier versions of this driver, and in
    the source code for the FreeBSD fxp driver, we can assume that
    the ucode implements the "CPU Cycle Saver" feature on supported
    adapters. Although generally wanted, this is an optional
    feature. The ucode source is not available, preventing it from
    being included in free distributions. This creates unnecessary
    problems for the end users. Doing a network install based on a
    free distribution installer requires the user to download and
    insert the ucode into the installer.

    Making the ucode optional when possible improves the user
    experience and driver usability.

    The ucode for some adapters include a bugfix, making it
    essential. We continue to fail for these adapters unless the
    ucode is available.

    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller

    Bjørn Mork
     
  • Since commit 16626b0cc3d5afe250850f96759b241f8a403b52 the asix
    driver depends on the phylib. Select phylib when the asix driver is
    selected.

    Reported-by: Fengguang Wu
    Cc: kernel-janitors@vger.kernel.org
    Signed-off-by: Christian Riesch
    Tested-by: Fengguang Wu
    Signed-off-by: David S. Miller

    Christian Riesch
     
  • This patch adds the asix_set_eeprom() function to provide support for
    programming the configuration EEPROM via ethtool.

    Signed-off-by: Christian Riesch
    Signed-off-by: David S. Miller

    Christian Riesch
     
  • The current code for reading the EEPROM via ethtool in the asix
    driver has a few issues. It cannot handle odd length values
    (accesses must be aligned at 16 bit boundaries) and interprets the
    offset provided by ethtool as 16 bit word offset instead as byte offset.

    The new code for asix_get_eeprom() introduced by this patch is
    modeled after the code in
    drivers/net/ethernet/atheros/atl1e/atl1e_ethtool.c
    and provides read access to the entire EEPROM with arbitrary
    offsets and lengths.

    Signed-off-by: Christian Riesch
    Signed-off-by: David S. Miller

    Christian Riesch
     
  • Because there are multiple variants to the stmmac/dwmac driver, the
    dts bindings should be updated to include version of the IP used.

    Signed-off-by: Dinh Nguyen
    Acked-by: Stefan Roese
    Signed-off-by: David S. Miller

    Dinh Nguyen
     
  • cxgb3 interface has a bad performance when VLAN is set. On my current
    setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM
    (8 instances, 4k message).
    With this patch, I am able to reach 9.5 Gbps.

    Signed-off-by: Breno Leitao
    Signed-off-by: David S. Miller

    brenohl@br.ibm.com
     
  • The Ethernet II wrapper is only used by IPX protocol, may have once
    been used by Appletalk but not currently. Therefore it makes sense to
    move it to the IPX dust bin and drop the exports.

    Build tested only.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list

    Signed-off-by: David S. Miller

    David S. Miller
     
  • tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
    per network namespace.

    This leads to bad behavior on multiqueue NICS, because many cpus
    contend for the socket lock and once socket lock is acquired, extra
    false sharing on various socket fields slow down the operations.

    To better resist to attacks, we use a percpu socket. Each cpu can
    run without contention, using appropriate memory (local node)

    Additional features :

    1) We also mirror the queue_mapping of the incoming skb, so that
    answers use the same queue if possible.

    2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

    3) We now limit the number of in-flight RST/ACK [1] packets
    per cpu, instead of per namespace, and we honor the sysctl_wmem_default
    limit dynamically. (Prior to this patch, sysctl_wmem_default value was
    copied at boot time, so any further change would not affect tcp_sock
    limit)

    [1] These packets are only generated when no socket was matched for
    the incoming packet.

    Reported-by: Bill Sommerfeld
    Signed-off-by: Eric Dumazet
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Use global seqlock for the nh_exceptions. Call
    fnhe_oldest with the right hash chain. Correct the diff
    value for dst_set_expires.

    v2: after suggestions from Eric Dumazet:
    * get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
    * continue daddr search in rt_bind_exception

    v3:
    * remove the daddr check before seqlock in rt_bind_exception
    * restart lookup in rt_bind_exception on detected seqlock change,
    as suggested by David Miller

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

19 Jul, 2012

15 commits

  • Reported-by: Steffen Klassert
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Use RFS infrastructure and flow steering in HW to keep CPU
    affinity of rx interrupts and application per TCP stream.

    A flow steering filter is added to the HW whenever the RFS
    ndo callback is invoked by core networking code.

    Because the invocation takes place in interrupt context, the
    actual setup of HW is done using workqueue. Whenever new filter
    is added, the driver checks for expiry of existing filters.

    Since there's window in time between the point where the core
    RFS code invoked the ndo callback, to the point where the HW
    is configured from the workqueue context, the 2nd, 3rd etc
    packets from that stream will cause the net core to invoke
    the callback again and again.

    To prevent inefficient/double configuration of the HW, the filters
    are kept in a database which is indexed using hash function to enable
    fast access.

    Signed-off-by: Amir Vadai
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
    If supplied, the assigned IRQ is tracked using rmap infrastructure.

    Signed-off-by: Amir Vadai
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Signed-off-by: Amir Vadai
    Signed-off-by: Or Gerlitz
    Acked-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Define this macro is one common place instead of duplicating it over the code

    Signed-off-by: Amir Vadai
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • ip_options_compile can be called for forwarded packets,
    make sure the specific-destionation address is a local one as
    specified in RFC 1812, 4.2.2.2 Addresses in Options

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Move fib_compute_spec_dst at the only place where it
    is needed.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Pull three md bugfixes from NeilBrown:
    "One of the bugs was introduced in 3.5-rc1. Others have been there for
    longer."

    * tag 'md-3.5-fixes' of git://neil.brown.name/md:
    md/raid1: close some possible races on write errors during resync
    md: avoid crash when stopping md array races with closing other open fds.
    md: fix bug in handling of new_data_offset

    Linus Torvalds
     
  • Pull networking changes from David Miller:
    "Ok, we should be good to go now"

    1) We have to statically initialize the init_net device list head rather
    than do so in an initcall, otherwise netprio_cgroup crashes if it's
    built statically rather than modular (Mark D. Rustad)

    2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore)

    3) Qlogic maintainers update (Anirban Chakraborty)

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: Statically initialize init_net.dev_base_head
    MAINTAINERS: Changes in qlcnic and qlge maintainers list
    cipso: don't follow a NULL pointer when setsockopt() is called

    Linus Torvalds
     
  • Pull HID update from Jiri Kosina:
    "A final round of changes for HID for 3.5: just device ID additions."

    * 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
    HID: hid-multitouch: add support for Zytronic panels
    HID: add Sennheiser BTD500USB device support
    HID: add battery quirk for Apple Wireless ANSI

    Linus Torvalds
     
  • The strcpy was being used to set the name of the board. Since the
    destination char* was read-only and the name is set statically at
    compile time; this was both wrong and redundant.

    The type of char* is changed to const char* to prevent future errors.

    Reported-by: Radek Masin
    Signed-off-by: Ezequiel Garcia
    [ Taking directly due to vacations - Linus ]
    Signed-off-by: Linus Torvalds

    Ezequiel Garcia
     
  • Signed-off-by: Benjamin Tissoires
    Signed-off-by: Jiri Kosina

    Benjamin Tissoires
     
  • commit 4367af556133723d0f443e14ca8170d9447317cb
    md/raid1: clear bad-block record when write succeeds.

    Added a 'reschedule_retry' call possibility at the end of
    end_sync_write, but didn't add matching code at the end of
    sync_request_write. So if the writes complete very quickly, or
    scheduling makes it seem that way, then we can miss rescheduling
    the request and the resync could hang.

    Also commit 73d5c38a9536142e062c35997b044e89166e063b
    md: avoid races when stopping resync.

    Fix a race condition in this same code in end_sync_write but didn't
    make the change in sync_request_write.

    This patch updates sync_request_write to fix both of those.
    Patch is suitable for 3.1 and later kernels.

    Reported-by: Alexander Lyakas
    Original-version-by: Alexander Lyakas
    Cc: stable@vger.kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     
  • md will refuse to stop an array if any other fd (or mounted fs) is
    using it.
    When any fs is unmounted of when the last open fd is closed all
    pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
    so there will be no pending IO to worry about when the array is
    stopped.

    However in order to send the STOP_ARRAY ioctl to stop the array one
    must first get and open fd on the block device.
    If some fd is being used to write to the block device and it is closed
    after mdadm open the block device, but before mdadm issues the
    STOP_ARRAY ioctl, then there will be no last-close on the md device so
    __blkdev_put will not call sync_blockdev.

    If this happens, then IO can still be in-flight while md tears down
    the array and bad things can happen (use-after-free and subsequent
    havoc).

    So in the case where do_md_stop is being called from an open file
    descriptor, call sync_block after taking the mutex to ensure there
    will be no new openers.

    This is needed when setting a read-write device to read-only too.

    Cc: stable@vger.kernel.org
    Reported-by: majianpeng
    Signed-off-by: NeilBrown

    NeilBrown
     
  • commit c6563a8c38fde3c1c7fc925a10bde3ca20799301
    md: add possibility to change data-offset for devices.

    introduced a 'new_data_offset' attribute which should normally
    be the same as 'data_offset', but can be explicitly set to a different
    value to allow a reshape operation to move the data.

    Unfortunately when the 'data_offset' is explicitly set through
    sysfs, the new_data_offset is not also set, so the two would become
    out-of-sync incorrectly.

    One result of this is that trying to set the 'size' after the
    'data_offset' would fail because it is not permitted to set the size
    when the 'data_offset' and 'new_data_offset' are different - as that
    can be confusing.
    Consequently when mdadm tried to do this while assembling an IMSM
    array it would fail.

    This bug was introduced in 3.5-rc1.

    Reported-by: Brian Downing
    Bisected-by: Brian Downing
    Tested-by: Brian Downing
    Signed-off-by: NeilBrown

    NeilBrown