26 Jun, 2010

23 commits

  • Update FW to 7.10

    Signed-off-by: Divy Le Ray
    Signed-off-by: David S. Miller

    Divy Le Ray
     
  • Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    Remove "pktgen: " from formats
    Convert printks to pr_
    Added func_enter() for debugging
    Moved version to end of string at module_init
    Coalesced long formats

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Gcc is currenlty not in the ability to optimize the switch statement in
    sk_run_filter() because of dense case labels. This patch replace the
    OR'd labels with ordered sequenced case labels. The sk_chk_filter()
    function is modified to patch/replace the original OPCODES in a
    ordered but equivalent form. gcc is now in the ability to transform the
    switch statement in sk_run_filter into a jump table of complexity O(1).

    Until this patch gcc generates a sequence of conditional branches (O(n) of 567
    byte .text segment size (arch x86_64):

    7ff: 8b 06 mov (%rsi),%eax
    801: 66 83 f8 35 cmp $0x35,%ax
    805: 0f 84 d0 02 00 00 je adb
    80b: 0f 87 07 01 00 00 ja 918
    811: 66 83 f8 15 cmp $0x15,%ax
    815: 0f 84 c5 02 00 00 je ae0
    81b: 77 73 ja 890
    81d: 66 83 f8 04 cmp $0x4,%ax
    821: 0f 84 17 02 00 00 je a3e
    827: 77 29 ja 852
    829: 66 83 f8 01 cmp $0x1,%ax
    [...]

    With the modification the compiler translate the switch statement into
    the following jump table fragment:

    7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
    803: 0f 87 1f 02 00 00 ja a28
    809: 0f b7 06 movzwl (%rsi),%eax
    80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
    813: 44 89 e3 mov %r12d,%ebx
    816: e9 43 03 00 00 jmpq b5e
    81b: 41 89 dc mov %ebx,%r12d
    81e: e9 3b 03 00 00 jmpq b5e

    Furthermore, I reordered the instructions to reduce cache line misses by
    order the most common instruction to the start.

    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     
  • Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Insertion of the Falcon hash is unreliable.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • We will use this hash key for Toeplitz IPv4 hashing too.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • The hash appears immediately before the packet data, not at the
    beginning of the buffer. This means we can easily use negative offsets
    from the start of packet data, so adjust the data and length at the
    top of __efx_rx_packet() instead of wherever we consume the hash.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • 1) Update copyright
    2) Fix hardware queue descriptor field size CQ_ENET_RQ_DESC_FCOE_SOF_BITS
    3) Include rtnetlink.h instead of if_link.h
    4) Selectively flush writes to interrupt mask register
    5) Use pci_enable_device_mem
    6) Remove unused variables and header files
    7) Fix size mismatch between memory alloc and free operations of a variable
    8) Check for non null arguments to vic_provinfo_alloc

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • Handle surprise hardware removals gracefully during devcmd issue and init,
    cleanup of queues.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • Hardware has the loopback capability to queue the packets transmitted from
    a device to the receive queue of the same device. enic now supports the
    loopback capability.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • Change the receive queue buffer allocations into blocks of 32 entries when
    ring size is less than 64, otherwise use 64 entries per block.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • Add new firmware devcmds - CMD_PROXY_BY_BDF, CMD_PACKET_FILTER_ALL,
    CMD_ENABLE_WAIT.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • Replace all printk routines with the (netdev|dev|pr)_ macros that
    provide verbose logs.

    Signed-off-by: Joe Perches
    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • Add wrapper routines that issue devcmds to firmware and ensure that a
    devcmd lock is held for each devcmd call.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • The port profile information for a dynamic enic device is set by the upper
    layers, that are oblivious to the device reset operation. We do not want a
    reset operation erase the network state of a dynamic enic device as there
    is no way to set up the port profile information again. Hence a lighter
    reset operation called hang reset is used. Hang reset, unlike soft reset
    does not reset the network state and resets the host side state only.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • The current ingress vlan rewrite mode setting lets the hardware strip off
    the tag control information of a packet received on native vlan. As a
    result, the priority bits are also lost. The fix is to change the ingress
    vlan rewrite mode setting such that the complete tag control information is
    retained for packets that belong to native vlan.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • enic now uses the GRO mechanism instead of LRO to pass skbs to upper
    layers.

    Signed-off-by: Scott Feldman
    Signed-off-by: Vasanthy Kolluri
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Vasanthy Kolluri
     
  • Signed-off-by: David S. Miller

    Michael Chan
     
  • This eliminates some of the duplicate code for the various devices
    that require the same basic kcq handling.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • By doing more work in the common function cnic_get_kcqes(), and
    making full use of the kcq_info structure.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • By creating a common data stucture kcq_info for all devices, the kcq
    (kernel completion queue) for all devices can be allocated by common
    code.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • By creating a common cnic_doirq().

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • The current code makes assumptions about the CID (context ID) memory
    space and starting CID that may not be always correct when firmware
    changes. In particular, BNX2_ISCSI_START_CID may not always be fixed.
    We now calculate cp->max_cid_space and cp->iscsi_start_cid dynamically
    instead of using fixed constants. The unused cp->max_iscsi_conn is also
    eliminated.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     

25 Jun, 2010

10 commits

  • Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Replace EFX_ERR() with netif_err(), EFX_INFO() with netif_info(),
    EFX_LOG() with netif_dbg() and EFX_TRACE() and EFX_REGDUMP() with
    netif_vdbg().

    Replace EFX_ERR_RL(), EFX_INFO_RL() and EFX_LOG_RL() using explicit
    calls to net_ratelimit().

    Implement the ethtool operations to get and set message level flags,
    and add a 'debug' module parameter for the initial value.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Signed-off-by: Ben Hutchings
    Acked-by: Jeff Garzik
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • i've found that tcp_close() can be called for an already closed
    socket, but still sends reset in this case (tcp_send_active_reset())
    which seems to be incorrect. Moreover, a packet with reset is sent
    with different source port as original port number has been already
    cleared on socket. Besides that incrementing stat counter for
    LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case.

    Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same
    seems to be true for the current mainstream kernel (checked on
    2.6.35-rc3). Please, correct me if i missed something.

    How that happens:

    1) the server receives a packet for socket in TCP_CLOSE_WAIT state
    that triggers a tcp_reset():

    Call Trace:
    [] tcp_reset+0x12f/0x1e8
    [] tcp_rcv_state_process+0x1c0/0xa08
    [] tcp_v4_do_rcv+0x310/0x37a
    [] tcp_v4_rcv+0x74d/0xb43
    [] ip_local_deliver_finish+0x0/0x259
    [] ip_local_deliver+0x200/0x2f4
    [] ip_rcv+0x64c/0x69f
    [] netif_receive_skb+0x4c4/0x4fa
    [] process_backlog+0x90/0xec
    [] net_rx_action+0xbb/0x1f1
    [] __do_softirq+0xf5/0x1ce
    [] handle_IRQ_event+0x56/0xb0
    [] call_softirq+0x1c/0x28
    [] do_softirq+0x2c/0x85
    [] do_IRQ+0x149/0x152
    [] ret_from_intr+0x0/0xa
    [] __handle_mm_fault+0x6cd/0x1303
    [] __handle_mm_fault+0x5a2/0x1303
    [] cache_free_debugcheck+0x21f/0x22e
    [] do_page_fault+0x49a/0x7dc
    [] thread_return+0x89/0x174
    [] audit_syscall_exit+0x341/0x35c
    [] error_exit+0x0/0x84

    tcp_rcv_state_process()
    ... // (sk_state == TCP_CLOSE_WAIT here)
    ...
    /* step 2: check RST bit */
    if(th->rst) {
    tcp_reset(sk);
    goto discard;
    }
    ...
    ---------------------------------
    tcp_rcv_state_process
    tcp_reset
    tcp_done
    tcp_set_state(sk, TCP_CLOSE);
    inet_put_port
    __inet_put_port
    inet_sk(sk)->num = 0;

    sk->sk_shutdown = SHUTDOWN_MASK;

    2) After that the process (socket owner) tries to write something to
    that socket and "inet_autobind" sets a _new_ (which differs from
    the original!) port number for the socket:

    Call Trace:
    [] inet_bind_hash+0x33/0x5f
    [] inet_csk_get_port+0x216/0x268
    [] inet_autobind+0x22/0x8f
    [] inet_sendmsg+0x27/0x57
    [] do_sock_write+0xae/0xea
    [] sock_writev+0xdc/0xf6
    [] _spin_lock_irqsave+0x9/0xe
    [] __pollwait+0x0/0xdd
    [] default_wake_function+0x0/0xe
    [] autoremove_wake_function+0x0/0x2e
    [] do_readv_writev+0x163/0x274
    [] thread_return+0x13a/0x174
    [] tcp_poll+0x0/0x1c9
    [] audit_syscall_entry+0x180/0x1b3
    [] sys_writev+0x49/0xe4
    [] tracesys+0xd5/0xe0

    3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace):

    F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3

    Call Trace:
    [] tcp_sendmsg+0xcb/0xe87
    [] release_sock+0x10/0xae
    [] vgacon_cursor+0x0/0x1a7
    [] inet_autobind+0x8b/0x8f
    [] do_sock_write+0xae/0xea
    [] sock_writev+0xdc/0xf6
    [] _spin_lock_irqsave+0x9/0xe
    [] __pollwait+0x0/0xdd
    [] default_wake_function+0x0/0xe
    [] autoremove_wake_function+0x0/0x2e
    [] do_readv_writev+0x163/0x274
    [] thread_return+0x13a/0x174
    [] tcp_poll+0x0/0x1c9
    [] audit_syscall_entry+0x180/0x1b3
    [] sys_writev+0x49/0xe4
    [] tracesys+0xd5/0xe0

    tcp_sendmsg()
    ...
    /* Wait for a connection to finish. */
    if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
    int old_state = sk->sk_state;
    if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) {
    if (f_d && (err == -EPIPE)) {
    printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, "
    "sk_err=%d, sk_shutdown=%d\n",
    sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state,
    sk->sk_err, sk->sk_shutdown);
    dump_stack();
    }
    goto out_err;
    }
    }
    ...

    4) Then the process (socket owner) understands that it's time to close
    that socket and does that (and thus triggers sending reset packet):

    Call Trace:
    ...
    [] dev_queue_xmit+0x343/0x3d6
    [] ip_output+0x351/0x384
    [] dst_output+0x0/0xe
    [] ip_queue_xmit+0x567/0x5d2
    [] vprintk+0x21/0x33
    [] check_poison_obj+0x2e/0x206
    [] poison_obj+0x36/0x45
    [] tcp_send_active_reset+0x15/0x14d
    [] dbg_redzone1+0x1c/0x25
    [] tcp_send_active_reset+0x15/0x14d
    [] cache_alloc_debugcheck_after+0x189/0x1c8
    [] tcp_transmit_skb+0x764/0x786
    [] tcp_send_active_reset+0xf9/0x14d
    [] tcp_close+0x39a/0x960
    [] inet_release+0x69/0x80
    [] sock_release+0x4f/0xcf
    [] sock_close+0x2c/0x30
    [] __fput+0xac/0x197
    [] filp_close+0x59/0x61
    [] sys_close+0x85/0xc7
    [] tracesys+0xd5/0xe0

    So, in brief:

    * a received packet for socket in TCP_CLOSE_WAIT state triggers
    tcp_reset() which clears inet_sk(sk)->num and put socket into
    TCP_CLOSE state

    * an attempt to write to that socket forces inet_autobind() to get a
    new port (but the write itself fails with -EPIPE)

    * tcp_close() called for socket in TCP_CLOSE state sends an active
    reset via socket with newly allocated port

    This adds an additional check in tcp_close() for already closed
    sockets. We do not want to send anything to closed sockets.

    Signed-off-by: Konstantin Khorenko
    Signed-off-by: David S. Miller

    Konstantin Khorenko
     
  • This patch adds the 5241 PHY ID to the broadcom module.

    Signed-off-by: Dmitry Eremin-Solenikov
    Signed-off-by: David S. Miller

    Dmitry Baryshkov
     
  • Move all PHY IDs to brcmphy.h header for completeness and unification of code.

    Signed-off-by: Dmitry Eremin-Solenikov
    Signed-off-by: David S. Miller

    Dmitry Baryshkov
     
  • Remove rtnl_unlock() which had no corresponding rtnl_lock().

    Signed-off-by: Andrew Morton
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Andrew Morton
     

24 Jun, 2010

7 commits