16 Mar, 2009

2 commits

  • While looking for a possible reason of bugzilla report on HTB oops:
    http://bugzilla.kernel.org/show_bug.cgi?id=12858
    I found the code in htb_delete calling htb_destroy_class on zero
    refcount is very misleading: it can suggest this is a common path, and
    destroy is called under sch_tree_lock. Actually, this can never happen
    like this because before deletion cops->get() is done, and after
    delete a class is still used by tclass_notify. The class destroy is
    always called from cops->put(), so without sch_tree_lock.

    This doesn't mean much now (since 2.6.27) because all vulnerable calls
    were moved from htb_destroy_class to htb_delete, but there was a bug
    in older kernels. The same change is done for other classful scheds,
    which, it seems, didn't have similar locking problems here.

    Reported-by: m0sia
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • On x86_64, its rather unfortunate that "wait_queue_head_t wait"
    field of "struct socket" spans two cache lines (assuming a 64
    bytes cache line in current cpus)

    offsetof(struct socket, wait)=0x30
    sizeof(wait_queue_head_t)=0x18

    This might explain why Kenny Chang noticed that his multicast workload
    was performing bad with 64 bit kernels, since more cache lines ping pongs
    were involved.

    This litle patch moves "wait" field next "fasync_list" so that both
    fields share a single cache line, to speedup sock_def_readable()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Mar, 2009

25 commits

  • To mark all features and bugfixes submitted since 4.0.11.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: David S. Miller

    Dhananjay Phadke
     
  • This patch enables the load balancing capability of firmware
    and hardware to spray traffic into different cpus through
    separate rx msix interrupts.

    The feature is being enabled for NX3031, NX2031 (old) will be
    enabled later. This depends on msi-x and compatibility with
    msi and legacy is maintained by enabling single rx ring.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: David S. Miller

    Dhananjay Phadke
     
  • Signed-off-by: Dhananjay Phadke
    Signed-off-by: David S. Miller

    Dhananjay Phadke
     
  • o remove max_ prefix from ring sizes, since they don't really
    represent max possible sizes.
    o cleanup naming of rx ring types (normal, jumbo, lro).
    o simplify logic to choose rx ring size, gig ports get half
    rx ring of 10 gig ports.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: David S. Miller

    Dhananjay Phadke
     
  • Detach network interface on PCI suspend and recreate hardware
    context after resumes.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: David S. Miller

    Dhananjay Phadke
     
  • Signed-off-by: Dhananjay Phadke
    Signed-off-by: David S. Miller

    Dhananjay Phadke
     
  • Documentation for the ixgbe driver in the kernel docs area is missing.
    This adds that documentation.

    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    PJ Waskiewicz
     
  • Cleanup a bit of whitespace, add some function header comments, and fix a
    few comments around the driver.

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Peter P Waskiewicz Jr
    Acked-by: Mallikarjuna R Chilakala
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • The Tx DMA unit should be disabled when bringing the device down. Also,
    the KX4 device with 82599 supports WoL, so we should clear the Wake Up
    Status (WUS) after a PCIe slot reset.

    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jesse Brandeburg
    Acked-by: Mallikarjuna R Chilakala
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    PJ Waskiewicz
     
  • There are possible times that a driver may fail to completely initialize,
    due to a buggy platform or a buggy kernel. In those cases, we'd rather
    fail gracefully instead of a panic. Add a few safety checks to some
    critical paths to try and prevent a panic in these corner-case situations.

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • This cleans up the following pieces of the Rx initialization path:

    - Enable the ECC memory fault interrupt in OTHER causes.

    - Fix an 82598 initialization of RDRXCTL when depending on RSS and VMDq to
    be enabled. We don't need these features enabled to safely set the MVMEN
    bit to allow multiple SRRCTL register mappings into the RXDCTL registers.

    - Fix the RSS initialization path to not stomp on DCB accidentally. When
    configuring the MRQC (multiple Rx queue contol) register, we want to make
    sure we only OR in features as necessary, instead of full assignment.

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • The Tx accounting when cleaning during NAPI was not completely properly.
    We should use the work_limit to determine when to finish cleaning, and
    use the same to return the cleaned status. The impact of running like this
    causes the NAPI clean for this Tx to get stuck in a scheduling loop, and
    can result in Tx not getting cleaned, ending with a Tx hang and device
    reset.

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • Occasionally if the driver was loaded in a system that
    didn't support MSI-X or MSI and was on a shared interrupt,
    the driver would then panic in NAPI on the first shared
    interrupt because we hadn't called napi_add yet.

    Solution: call napi_add before calling request_irq

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • The interrupt models using EITR have changed in 82599. The way the register
    is laid out, the change is transparent to some of the existing code.
    However, some of it isn't. This patch fixes all the cases where EITR
    handling is different than 82598.

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • 82599 mistakenly enabled drop on Rx queues in the packet buffer. The
    default mode should be store-and-forward from the FIFO.

    Signed-off-by: Peter P Waskiewicz Jr
    Acked-by: Mallikarjuna R Chilakala
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    PJ Waskiewicz
     
  • The rx_no_dma_resources counter reported by ethtool -S ethX is not
    counting correctly. In 82599, the queue mappings for the counters need
    to be mapped properly, and accounted for properly.

    Signed-off-by: Peter P Waskiewicz Jr
    Acked-by: Mallikarjuna R Chilakala
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    PJ Waskiewicz
     
  • A purely cosmetic change. Report which physical layer is present, instead
    of PHY unknown. 82599 added new PHY types for the SFP+ devices, and this
    was missed getting updated.

    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    PJ Waskiewicz
     
  • Add support for 82576 copper adapter and necessary code to restrict wol for
    quad port adapter to first port.

    Signed-off-by: Alexander Duyck
    Acked-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Adding device id to support 82576NS dual port copper
    NIC.

    Signed-off-by: Alexander Duyck
    Acked-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch corrects a typo that was doing a less than comparison instead of
    a left shift due to the fact that I didn't get enough
    Acked-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Add Pf to pool if adding a VLVF register value and the VFTA bit is
    already set.

    This patch addresses the unlikely situation that the PF adds a vlan
    entry when the vlvf is full, and a vf later adds the vlan to the vlvf.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • We need to support wol on the second port for situations such as when the
    lan ports are on the motherboard itself.

    Signed-off-by: Alexander Duyck
    Acked-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • If DCA is undefined then the adapter struct becomes unnecessary. To
    resolve this issue the DCA calls can simply make a call to the adapter
    struct through the rx_ring adapter struct member.

    Signed-off-by: Alexander Duyck
    Acked-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • The netif_running check in igb poll is a hold over from the use of fake
    netdevs to use multiple queues with NAPI prior to 2.6.24. It is no longer
    necessary to have the call there and it currently can cause errors if
    work_done == budget.

    Signed-off-by: Alexander Duyck
    Acked-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • With the new DCA API, the driver should use dca3_get_tag() instead of
    the obsolete dca_get_tag().

    Signed-off-by: Maciej Sosnowski < maciej.sosnowski@intel.com>
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Maciej Sosnowski
     

14 Mar, 2009

13 commits

  • I found the PPP subsystem to not work properly when connecting channels
    with different speeds to the same bundle.

    Problem Description:

    As the "ppp_mp_explode" function fragments the sk_buff buffer evenly
    among the PPP channels that are connected to a certain PPP unit to
    make up a bundle, if we are transmitting using an upper layer protocol
    that requires an Ack before sending the next packet (like TCP/IP for
    example), we will have a bandwidth bottleneck on the slowest channel
    of the bundle.

    Let's clarify by an example. Let's consider a scenario where we have
    two PPP links making up a bundle: a slow link (10KB/sec) and a fast
    link (1000KB/sec) working at the best (full bandwidth). On the top we
    have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the
    PPP subsystem. The "ppp_mp_explode" function will divide the buffer in
    two fragments of 500B each (we are neglecting all the headers, crc,
    flags etc?.). Before the TCP/IP stack sends out the next buffer, it
    will have to wait for the ACK response from the remote peer, so it
    will have to wait for both fragments to have been sent over the two
    PPP links, received by the remote peer and reconstructed. The
    resulting behaviour is that, rather than having a bundle working
    @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle
    working @20KB/sec (the double of the slowest channels bandwidth).

    Problem Solution:

    The problem has been solved by redesigning the "ppp_mp_explode"
    function in such a way to make it split the sk_buff buffer according
    to the speeds of the underlying PPP channels (the speeds of the serial
    interfaces respectively attached to the PPP channels). Referring to
    the above example, the redesigned "ppp_mp_explode" function will now
    divide the 1000 Bytes buffer into two fragments whose sizes are set
    according to the speeds of the channels where they are going to be
    sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on
    1000KB/sec channel). The reworked function grants the same
    performances of the original one in optimal working conditions (i.e. a
    bundle made up of PPP links all working at the same speed), while
    greatly improving performances on the bundles made up of channels
    working at different speeds.

    Signed-off-by: Gabriele Paoloni
    Signed-off-by: David S. Miller

    Gabriele Paoloni
     
  • promote 'cnt' to size_t, to match 'len'.

    Signed-off-by: Roel Kluin
    Signed-off-by: David S. Miller

    Roel Kluin
     
  • skb->len is an unsigned int, so the test in x25_rx_call_request() always
    evaluates to true.

    len in x25_sendmsg() is unsigned as well. so -ERRORS returned by x25_output()
    are not noticed.

    Signed-off-by: Roel Kluin
    Signed-off-by: David S. Miller

    Roel Kluin
     
  • Windows (XP at least) hosts on boot, with configured static ip, performing
    address conflict detection, which is defined in RFC3927.
    Here is quote of important information:

    "
    An ARP announcement is identical to the ARP Probe described above,
    except that now the sender and target IP addresses are both set
    to the host's newly selected IPv4 address.
    "

    But it same time this goes wrong with RFC5227.
    "
    The 'sender IP address' field MUST be set to all zeroes; this is to avoid
    polluting ARP caches in other hosts on the same link in the case
    where the address turns out to be already in use by another host.
    "

    When ARP proxy configured, it must not answer to both cases, because
    it is address conflict verification in any case. For Windows it is just
    causing to detect false "ip conflict". Already there is code for RFC5227, so
    just trivially we just check also if source ip == target ip.

    Signed-off-by: Denys Fedoryshchenko
    Signed-off-by: David S. Miller

    Denys Fedoryshchenko
     
  • The original patch was submitted last year but wasn't discussed or applied
    because of missing maintainer's CCs. I only fixed some formatting errors,
    but as I saw tulip is very badly formatted and needs further work.

    Original description:
    This patch fixes MTU problem, which occurs when using 802.1q VLANs. We
    should allow receiving frames of up to 1518 bytes in length, instead of
    1514.

    Based on patch written by Ben McKeegan for 2.4.x kernels. It is archived
    at http://www.candelatech.com/~greear/vlan/howto.html#tulip
    I've adjusted a few things to make it apply on 2.6.x kernels.

    Tested on D-Link DFE-570TX quad-fastethernet card.

    Signed-off-by: Tomasz Lemiech
    Signed-off-by: Ivan Vecera
    Signed-off-by: Ben McKeegan
    Acked-by: Grant Grundler
    Signed-off-by: David S. Miller

    Tomasz Lemiech
     
  • It closes a race in phy_stop_machine when reprogramming of phy_timer
    (from phy_state_machine) happens between del_timer_sync and cancel_work_sync.

    Without this change it could lead to crash if phy_device would be freed after
    phy_stop_machine (timer would fire and schedule freed work).

    Signed-off-by: Marcin Slusarz
    Acked-by: Jean Delvare
    Signed-off-by: David S. Miller

    Marcin Slusarz
     
  • From: Pavel Roskin

    Signed-off-by: David S. Miller

    Pavel Roskin
     
  • This patch fixes the circular locking problem by changing the locking strategy
    concerning the logging of firmware handles.

    Signed-off-by: Jan-Bernd Themann
    Signed-off-by: David S. Miller

    Jan-Bernd Themann
     
  • Changing the mac address when a macvlan device is up will leave the
    device on the wrong hash chain making it impossible to receive
    packets.

    There is no checking of the mac address set on the macvlan. Allowing
    a misconfiguration to grab packets from the the underlying device or
    another macvlan.

    To resolve these problems I update the hash table of macvlans when the
    mac address of a macvlan changes, and when updating the hash table
    I verify that the new mac address is usable.

    The result is well defined and predictable if not perfect handling of
    mac vlan mac addresses.

    To keep the code clear I have created a set of hash table maintenance
    in macvlan so I am not open coding the hash function and the logic
    needed to update the hash table all over the place.

    Signed-off-by: Eric Biederman
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Biederman
     
  • When running in a network namespace whose only link to
    the outside world is a macvlan device, not being
    able to create another macvlan is a real pain.

    So modify macvlan creation to allow automatically forward
    a creation of a macvlan on a macvlan to become a creation
    of a macvlan on the underlying network device.

    Signed-off-by: Eric Biederman
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Biederman
     
  • This patch from Juha Leppanen suppresses a false warning if the eeprom
    load succeeds on the very last attempt.

    Juha> In function smsc911x_open smsc911x_reg_read+udelay can be run 50
    Juha> times with timeout reaching -1, and the following if statetement
    Juha> does not catch the timeout and no warning is issued. Also if the
    Juha> 50th smsc911x_reg_read is GOOD, loop is exited with timeout as 0
    Juha> and bogus warning issued. Replace testing order and --timeout
    Juha> instead of timeout-- and now max 50 smsc911x_reg_read's are done,
    Juha> with max 49 udelays.

    Signed-off-by: Steve Glendinning
    Signed-off-by: David S. Miller

    Steve Glendinning
     
  • This patch from Juha Leppanen suppresses a false warning if a fast
    forward operation succeeds on the very last attempt.

    Juha> If smsc911x_reg_read loop is executed 500 times, timeout reaches 0
    Juha> and the 500th smsc911x_reg_read result in val is ignored. If
    Juha> testing order is changed, then val is checked first. The 500th
    Juha> reg_read might be GOOD, why ignore it!

    Signed-off-by: Steve Glendinning
    Signed-off-by: David S. Miller

    Steve Glendinning
     
  • Network Drop Monitor: Adding Build changes to enable drop monitor

    Signed-off-by: Neil Horman

    include/linux/Kbuild | 1 +
    net/Kconfig | 11 +++++++++++
    net/core/Makefile | 1 +
    3 files changed, 13 insertions(+)
    Signed-off-by: David S. Miller

    Neil Horman