23 Oct, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    move virtrng_remove to .devexit.text
    move virtballoon_remove to .devexit.text
    virtio_blk: Revert serial number support
    virtio: let header files include virtio_ids.h
    virtio_blk: revert QUEUE_FLAG_VIRT addition

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
    niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
    KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
    KS8851: Fix MAC address write order
    KS8851: Add soft reset at probe time
    net: fix section mismatch in fec.c
    net: Fix struct inet_timewait_sock bitfield annotation
    tcp: Try to catch MSG_PEEK bug
    net: Fix IP_MULTICAST_IF
    bluetooth: static lock key fix
    bluetooth: scheduling while atomic bug fix
    tcp: fix TCP_DEFER_ACCEPT retrans calculation
    tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
    tcp: accept socket after TCP_DEFER_ACCEPT period
    Revert "tcp: fix tcp_defer_accept to consider the timeout"
    AF_UNIX: Fix deadlock on connecting to shutdown socket
    ethoc: clear only pending irqs
    ethoc: inline regs access
    vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
    virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
    be2net: fix support for PCI hot plug
    ...

    Linus Torvalds
     

22 Oct, 2009

1 commit

  • Rusty,

    commit 3ca4f5ca73057a617f9444a91022d7127041970a
    virtio: add virtio IDs file
    moved all device IDs into a single file. While the change itself is
    a very good one, it can break userspace applications. For example
    if a userspace tool wanted to get the ID of virtio_net it used to
    include virtio_net.h. This does no longer work, since virtio_net.h
    does not include virtio_ids.h.
    This patch moves all "#include " from the C
    files into the header files, making the header files compatible with
    the old ones.

    In addition, this patch exports virtio_ids.h to userspace.

    CC: Fernando Luis Vazquez Cao
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell

    Christian Borntraeger
     

20 Oct, 2009

8 commits

  • This patch tries to print out more information when we hit the
    MSG_PEEK bug in tcp_recvmsg. It's been around since at least
    2005 and it's about time that we finally fix it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls.

    This function should be called only with RTNL or dev_base_lock held, or reader
    could see a corrupt hash chain and eventually enter an endless loop.

    Fix is to call dev_get_by_index()/dev_put().

    If this happens to be performance critical, we could define a new dev_exist_by_index()
    function to avoid touching dev refcount.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When shutdown ppp connection, lockdep waring about non-static key
    will happen, it is caused by the lock is not initialized properly
    at that time.

    Fix with tuning the lock/skb_queue_head init order

    [ 94.339261] INFO: trying to register non-static key.
    [ 94.342509] the code is fine but needs lockdep annotation.
    [ 94.342509] turning off the locking correctness validator.
    [ 94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
    [ 94.342509] Call Trace:
    [ 94.342509] [] register_lock_class+0x58/0x241
    [ 94.342509] [] ? __lock_acquire+0xb57/0xb73
    [ 94.342509] [] __lock_acquire+0xac/0xb73
    [ 94.342509] [] ? lock_release_non_nested+0x17b/0x1de
    [ 94.342509] [] lock_acquire+0x67/0x84
    [ 94.342509] [] ? skb_dequeue+0x15/0x41
    [ 94.342509] [] _spin_lock_irqsave+0x2f/0x3f
    [ 94.342509] [] ? skb_dequeue+0x15/0x41
    [ 94.342509] [] skb_dequeue+0x15/0x41
    [ 94.342509] [] ? _read_unlock+0x1d/0x20
    [ 94.342509] [] skb_queue_purge+0x14/0x1b
    [ 94.342509] [] l2cap_recv_frame+0xea1/0x115a [l2cap]
    [ 94.342509] [] ? __lock_acquire+0xb57/0xb73
    [ 94.342509] [] ? mark_lock+0x1e/0x1c7
    [ 94.342509] [] ? hci_rx_task+0xd2/0x1bc [bluetooth]
    [ 94.342509] [] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
    [ 94.342509] [] hci_rx_task+0x106/0x1bc [bluetooth]
    [ 94.342509] [] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
    [ 94.342509] [] tasklet_action+0x69/0xc1
    [ 94.342509] [] __do_softirq+0x94/0x11e
    [ 94.342509] [] do_softirq+0x36/0x5a
    [ 94.342509] [] irq_exit+0x35/0x68
    [ 94.342509] [] do_IRQ+0x72/0x89
    [ 94.342509] [] common_interrupt+0x2e/0x34
    [ 94.342509] [] ? pm_qos_add_requirement+0x63/0x9d
    [ 94.342509] [] ? acpi_idle_enter_bm+0x209/0x238
    [ 94.342509] [] cpuidle_idle_call+0x5c/0x94
    [ 94.342509] [] cpu_idle+0x4e/0x6f
    [ 94.342509] [] rest_init+0x53/0x55
    [ 94.342509] [] start_kernel+0x2f0/0x2f5
    [ 94.342509] [] i386_start_kernel+0x91/0x96

    Reported-by: Oliver Hartkopp
    Signed-off-by: Dave Young
    Tested-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Dave Young
     
  • Due to driver core changes dev_set_drvdata will call kzalloc which should be
    in might_sleep context, but hci_conn_add will be called in atomic context

    Like dev_set_name move dev_set_drvdata to work queue function.

    oops as following:

    Oct 2 17:41:59 darkstar kernel: [ 438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
    Oct 2 17:41:59 darkstar kernel: [ 438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
    Oct 2 17:41:59 darkstar kernel: [ 438.001348] 2 locks held by sdptool/2133:
    Oct 2 17:41:59 darkstar kernel: [ 438.001350] #0: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [] lock_sock+0xa/0xc [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001360] #1: (&hdev->lock){+.-.+.}, at: [] l2cap_sock_connect+0x103/0x26b [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
    Oct 2 17:41:59 darkstar kernel: [ 438.001373] Call Trace:
    Oct 2 17:41:59 darkstar kernel: [ 438.001381] [] __might_sleep+0xde/0xe5
    Oct 2 17:41:59 darkstar kernel: [ 438.001386] [] __kmalloc+0x4a/0x15a
    Oct 2 17:41:59 darkstar kernel: [ 438.001392] [] ? kzalloc+0xb/0xd
    Oct 2 17:41:59 darkstar kernel: [ 438.001396] [] kzalloc+0xb/0xd
    Oct 2 17:41:59 darkstar kernel: [ 438.001400] [] device_private_init+0x15/0x3d
    Oct 2 17:41:59 darkstar kernel: [ 438.001405] [] dev_set_drvdata+0x18/0x26
    Oct 2 17:41:59 darkstar kernel: [ 438.001414] [] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001422] [] ? hci_conn_add+0x128/0x186 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001429] [] hci_conn_add+0x177/0x186 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001437] [] hci_connect+0x3c/0xfb [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001442] [] l2cap_sock_connect+0x174/0x26b [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001448] [] sys_connect+0x60/0x7a
    Oct 2 17:41:59 darkstar kernel: [ 438.001453] [] ? lock_release_non_nested+0x84/0x1de
    Oct 2 17:41:59 darkstar kernel: [ 438.001458] [] ? might_fault+0x47/0x81
    Oct 2 17:41:59 darkstar kernel: [ 438.001462] [] ? might_fault+0x47/0x81
    Oct 2 17:41:59 darkstar kernel: [ 438.001468] [] ? __copy_from_user_ll+0x11/0xce
    Oct 2 17:41:59 darkstar kernel: [ 438.001472] [] sys_socketcall+0x82/0x17b
    Oct 2 17:41:59 darkstar kernel: [ 438.001477] [] syscall_call+0x7/0xb

    Signed-off-by: Dave Young
    Signed-off-by: David S. Miller

    Dave Young
     
  • Fix TCP_DEFER_ACCEPT conversion between seconds and
    retransmission to match the TCP SYN-ACK retransmission periods
    because the time is converted to such retransmissions. The old
    algorithm selects one more retransmission in some cases. Allow
    up to 255 retransmissions.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT
    users to not retransmit SYN-ACKs during the deferring period if
    ACK from client was received. The goal is to reduce traffic
    during the deferring period. When the period is finished
    we continue with sending SYN-ACKs (at least one) but this time
    any traffic from client will change the request to established
    socket allowing application to terminate it properly.
    Also, do not drop acked request if sending of SYN-ACK fails.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Willy Tarreau and many other folks in recent years
    were concerned what happens when the TCP_DEFER_ACCEPT period
    expires for clients which sent ACK packet. They prefer clients
    that actively resend ACK on our SYN-ACK retransmissions to be
    converted from open requests to sockets and queued to the
    listener for accepting after the deferring period is finished.
    Then application server can decide to wait longer for data
    or to properly terminate the connection with FIN if read()
    returns EAGAIN which is an indication for accepting after
    the deferring period. This change still can have side effects
    for applications that expect always to see data on the accepted
    socket. Others can be prepared to work in both modes (with or
    without TCP_DEFER_ACCEPT period) and their data processing can
    ignore the read=EAGAIN notification and to allocate resources for
    clients which proved to have no data to send during the deferring
    period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not
    as a timeout) to wait for data will notice clients that didn't
    send data for 3 seconds but that still resend ACKs.
    Thanks to Willy Tarreau for the initial idea and to
    Eric Dumazet for the review and testing the change.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • This reverts commit 6d01a026b7d3009a418326bdcf313503a314f1ea.

    Julian Anastasov, Willy Tarreau and Eric Dumazet have come up
    with a more correct way to deal with this.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Oct, 2009

1 commit

  • I found a deadlock bug in UNIX domain socket, which makes able to DoS
    attack against the local machine by non-root users.

    How to reproduce:
    1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
    namespace(*), and shutdown(2) it.
    2. Repeat connect(2)ing to the listening socket from the other sockets
    until the connection backlog is full-filled.
    3. connect(2) takes the CPU forever. If every core is taken, the
    system hangs.

    PoC code: (Run as many times as cores on SMP machines.)

    int main(void)
    {
    int ret;
    int csd;
    int lsd;
    struct sockaddr_un sun;

    /* make an abstruct name address (*) */
    memset(&sun, 0, sizeof(sun));
    sun.sun_family = PF_UNIX;
    sprintf(&sun.sun_path[1], "%d", getpid());

    /* create the listening socket and shutdown */
    lsd = socket(AF_UNIX, SOCK_STREAM, 0);
    bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
    listen(lsd, 1);
    shutdown(lsd, SHUT_RDWR);

    /* connect loop */
    alarm(15); /* forcely exit the loop after 15 sec */
    for (;;) {
    csd = socket(AF_UNIX, SOCK_STREAM, 0);
    ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
    if (-1 == ret) {
    perror("connect()");
    break;
    }
    puts("Connection OK");
    }
    return 0;
    }

    (*) Make sun_path[0] = 0 to use the abstruct namespace.
    If a file-based socket is used, the system doesn't deadlock because
    of context switches in the file system layer.

    Why this happens:
    Error checks between unix_socket_connect() and unix_wait_for_peer() are
    inconsistent. The former calls the latter to wait until the backlog is
    processed. Despite the latter returns without doing anything when the
    socket is shutdown, the former doesn't check the shutdown state and
    just retries calling the latter forever.

    Patch:
    The patch below adds shutdown check into unix_socket_connect(), so
    connect(2) to the shutdown socket will return -ECONREFUSED.

    Signed-off-by: Tomoki Sekiyama
    Signed-off-by: Masanori Yoshida
    Signed-off-by: David S. Miller

    Tomoki Sekiyama
     

16 Oct, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
    vmxnet: fix 2 build problems
    net: add support for STMicroelectronics Ethernet controllers.
    net: ks8851_mll uses mii interfaces
    net/fec_mpc52xx: Fix kernel panic on FEC error
    net: Fix OF platform drivers coldplug/hotplug when compiled as modules
    TI DaVinci EMAC: Clear statistics register properly.
    r8169: partial support and phy init for the 8168d
    irda/sa1100_ir: check return value of startup hook
    udp: Fix udp_poll() and ioctl()
    WAN: fix Cisco HDLC handshaking.
    tcp: fix tcp_defer_accept to consider the timeout
    3c574_cs: spin_lock the set_multicast_list function
    net: Teach pegasus driver to ignore bluetoother adapters with clashing Vendor:Product IDs
    netxen: fix pci bar mapping
    ethoc: fix warning from 32bit build
    libertas: fix build
    net: VMware virtual Ethernet NIC driver: vmxnet3
    net: Fix IXP 2000 network driver building.
    libertas: fix build
    mac80211: document ieee80211_rx() context requirement
    ...

    Linus Torvalds
     

14 Oct, 2009

1 commit


13 Oct, 2009

5 commits

  • udp_poll() can in some circumstances drop frames with incorrect checksums.

    Problem is we now have to lock the socket while dropping frames, or risk
    sk_forward corruption.

    This bug is present since commit 95766fff6b9a78d1
    ([UDP]: Add memory accounting.)

    While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • I was trying to use TCP_DEFER_ACCEPT and noticed that if the
    client does not talk, the connection is never accepted and
    remains in SYN_RECV state until the retransmits expire, where
    it finally is deleted. This is bad when some firewall such as
    netfilter sits between the client and the server because the
    firewall sees the connection in ESTABLISHED state while the
    server will finally silently drop it without sending an RST.

    This behaviour contradicts the man page which says it should
    wait only for some time :

    TCP_DEFER_ACCEPT (since Linux 2.4)
    Allows a listener to be awakened only when data arrives
    on the socket. Takes an integer value (seconds), this
    can bound the maximum number of attempts TCP will
    make to complete the connection. This option should not
    be used in code intended to be portable.

    Also, looking at ipv4/tcp.c, a retransmit counter is correctly
    computed :

    case TCP_DEFER_ACCEPT:
    icsk->icsk_accept_queue.rskq_defer_accept = 0;
    if (val > 0) {
    /* Translate value in seconds to number of
    * retransmits */
    while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
    val > ((TCP_TIMEOUT_INIT / HZ) <<
    icsk->icsk_accept_queue.rskq_defer_accept))
    icsk->icsk_accept_queue.rskq_defer_accept++;
    icsk->icsk_accept_queue.rskq_defer_accept++;
    }
    break;

    ==> rskq_defer_accept is used as a counter of retransmits.

    But in tcp_minisocks.c, this counter is only checked. And in
    fact, I have found no location which updates it. So I think
    that what was intended was to decrease it in tcp_minisocks
    whenever it is checked, which the trivial patch below does.

    Signed-off-by: Willy Tarreau
    Signed-off-by: David S. Miller

    Willy Tarreau
     
  • ieee80211_rx() must be called with softirqs disabled
    since the networking stack requires this for netif_rx()
    and some code in mac80211 can assume that it can not
    be processing its own tasklet and this call at the same
    time.

    It may be possible to remove this requirement after a
    careful audit of mac80211 and doing any needed locking
    improvements in it along with disabling softirqs around
    netif_rx(). An alternative might be to push all packet
    processing to process context in mac80211, instead of
    to the tasklet, and add other synchronisation.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • When a scan completes, we call ieee80211_sta_find_ibss(),
    which is also called from other places. When the scan was
    done in software, there's no problem as both run from the
    single-threaded mac80211 workqueue and are thus serialised
    against each other, but with hardware scan the completion
    can be in a different context and race against callers of
    this function from the workqueue (e.g. due to beacon RX).
    So instead of calling ieee80211_sta_find_ibss() directly,
    just arm the timer and have it fire, scheduling the work,
    which will invoke ieee80211_sta_find_ibss() (if that is
    appropriate in the current state).

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • Signed-off-by: Felix Fietkau
    Acked-by: Johannes Berg
    Signed-off-by: John W. Linville

    Felix Fietkau
     

12 Oct, 2009

2 commits


09 Oct, 2009

3 commits

  • David S. Miller
     
  • The error unwinding code in set_netns has a bug
    that will make it run into a BUG_ON if passed a
    bad wiphy index, fix by not trying to unlock a
    wiphy that doesn't exist.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
    ethoc: limit the number of buffers to 128
    ethoc: use system memory as buffer
    ethoc: align received packet to make IP header at word boundary
    ethoc: fix buffer address mapping
    ethoc: fix typo to compute number of tx descriptors
    au1000_eth: Duplicate test of RX_OVERLEN bit in update_rx_stats()
    netxen: Fix Unlikely(x) > y
    pasemi_mac: ethtool get settings fix
    add maintainer for network drop monitor kernel service
    tg3: Fix phylib locking strategy
    rndis_host: support ETHTOOL_GPERMADDR
    ipv4: arp_notify address list bug
    gigaset: add kerneldoc comments
    gigaset: correct debugging output selection
    gigaset: improve error recovery
    gigaset: fix device ERROR response handling
    gigaset: announce if built with debugging
    gigaset: handle isoc frame errors more gracefully
    gigaset: linearize skb
    gigaset: fix reject/hangup handling
    ...

    Linus Torvalds
     

08 Oct, 2009

3 commits

  • Commit 9ef1d4c7c7aca1cd436612b6ca785b726ffb8ed8 ("[NETLINK]: Missing
    initializations in dumped data") introduced a typo in
    initialization. This patch fixes this.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • kfree_skb() should be used to free struct sk_buff pointers.

    Signed-off-by: Roel Kluin
    Acked-by: Johannes Berg
    Cc: stable@kernel.org
    Signed-off-by: John W. Linville

    Roel Kluin
     
  • When receiving data frames, we can send them only to
    the interface they belong to based on transmitting
    station (this doesn't work for probe requests). Also,
    don't try to handle other frames for AP_VLAN at all
    since those interface should only receive data.

    Additionally, the transmit side must check that the
    station we're sending a frame to is actually on the
    interface we're transmitting on, and not transmit
    packets to functions that live on other interfaces,
    so validate that as well.

    Another bug fix is needed in sta_info.c where in the
    VLAN case when adding/removing stations we overwrite
    the sdata variable we still need.

    Signed-off-by: Johannes Berg
    Cc: stable@kernel.org
    Signed-off-by: John W. Linville

    Johannes Berg
     

07 Oct, 2009

1 commit

  • This fixes a bug with arp_notify.

    If arp_notify is enabled, kernel will crash if address is changed
    and no IP address is assigned.
    http://bugzilla.kernel.org/show_bug.cgi?id=14330

    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Stephen Hemminger
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

05 Oct, 2009

4 commits

  • A number of drivers (recently including cfg80211-based ones)
    assume that all wireless handlers, including statistics, can
    sleep and they often also implicitly assume that the rtnl is
    held around their invocation. This is almost always true now
    except when reading from sysfs:

    BUG: sleeping function called from invalid context at kernel/mutex.c:280
    in_atomic(): 1, irqs_disabled(): 0, pid: 10450, name: head
    2 locks held by head/10450:
    #0: (&buffer->mutex){+.+.+.}, at: [] sysfs_read_file+0x24/0xf4
    #1: (dev_base_lock){++.?..}, at: [] wireless_show+0x1a/0x4c
    Pid: 10450, comm: head Not tainted 2.6.32-rc3 #1
    Call Trace:
    [] __might_sleep+0xf0/0xf7
    [] mutex_lock_nested+0x1a/0x33
    [] wdev_lock+0xd/0xf [cfg80211]
    [] cfg80211_wireless_stats+0x45/0x12d [cfg80211]
    [] get_wireless_stats+0x16/0x1c
    [] wireless_show+0x2a/0x4c

    Fix this by using the rtnl instead of dev_base_lock.

    Reported-by: Miles Lane
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Commit fd29cf72 (pktgen: convert to use ktime_t)
    inadvertantly converted "delay" parameter from nanosec to microsec.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • It is not currently possible to instruct pktgen to use one selected tx queue.

    When Robert added multiqueue support in commit 45b270f8, he added
    an interval (queue_map_min, queue_map_max), and his code doesnt take
    into account the case of min = max, to select one tx queue exactly.

    I suspect a high performance setup on a eight txqueue device wants
    to use exactly eight cpus, and assign one tx queue to each sender.

    This patchs makes pktgen select the right tx queue, not the first one.

    Also updates Documentation to reflect Robert changes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Robert Olsson
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

03 Oct, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
    cnic: Fix NETDEV_UP event processing.
    uvesafb/connector: Disallow unpliviged users to send netlink packets
    pohmelfs/connector: Disallow unpliviged users to configure pohmelfs
    dst/connector: Disallow unpliviged users to configure dst
    dm/connector: Only process connector packages from privileged processes
    connector: Removed the destruct_data callback since it is always kfree_skb()
    connector/dm: Fixed a compilation warning
    connector: Provide the sender's credentials to the callback
    connector: Keep the skb in cn_callback_data
    e1000e/igb/ixgbe: Don't report an error if devices don't support AER
    net: Fix wrong sizeof
    net: splice() from tcp to pipe should take into account O_NONBLOCK
    net: Use sk_mark for routing lookup in more places
    sky2: irqname based on pci address
    skge: use unique IRQ name
    IPv4 TCP fails to send window scale option when window scale is zero
    net/ipv4/tcp.c: fix min() type mismatch warning
    Kconfig: STRIP: Remove stale bits of STRIP help text
    NET: mkiss: Fix typo
    tg3: Remove prev_vlan_tag from struct tx_ring_info
    ...

    Linus Torvalds
     
  • tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag

    Before this patch :

    splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
    causes a random endless block (if pipe is full) and
    splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
    will return 0 immediately if the TCP buffer is empty.

    User application has no way to instruct splice() that socket should be in blocking mode
    but pipe in nonblock more.

    Many projects cannot use splice(tcp -> pipe) because of this flaw.

    http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
    http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html

    Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
    (splice: add SPLICE_F_NONBLOCK flag )

    It doesn't make the splice itself necessarily nonblocking (because the
    actual file descriptors that are spliced from/to may block unless they
    have the O_NONBLOCK flag set), but it makes the splice pipe operations
    nonblocking.

    Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only

    This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK
    flag, as other socket operations do.

    Users will then call :

    splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );

    to block on data coming from socket (if file is in blocking mode),
    and not block on pipe output (to avoid deadlock)

    First version of this patch was submitted by Octavian Purdila

    Reported-by: Volker Lendecke
    Reported-by: Jason Gunthorpe
    Signed-off-by: Eric Dumazet
    Signed-off-by: Octavian Purdila
    Acked-by: Linus Torvalds
    Acked-by: Jens Axboe
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Oct, 2009

5 commits

  • This patch against v2.6.31 adds support for route lookup using sk_mark in some
    more places. The benefits from this patch are the following.
    First, SO_MARK option now has effect on UDP sockets too.
    Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
    lookup correctly if TCP sockets with SO_MARK were used.

    Signed-off-by: Atis Elsts
    Acked-by: Eric Dumazet

    Atis Elsts
     
  • Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
    and SYN headers even if our window scale is zero.

    This fixes the following observed behavior:

    1. Client sends a SYN with TCP window scaling option and non zero window scale
    value to a Linux box.
    2. Linux box notes large receive window from client.
    3. Linux decides on a zero value of window scale for its part.
    4. Due to compare against requested window scale size option, Linux does not to
    send windows scale TCP option header on SYN/ACK at all.

    With the following result:

    Client box thinks TCP window scaling is not supported, since SYN/ACK had no
    TCP window scale option, while Linux thinks that TCP window scaling is
    supported (and scale might be non zero), since SYN had TCP window scale
    option and we have a mismatched idea between the client and server
    regarding window sizes.

    Probably it also fixes up the following bug (not observed in practice):

    1. Linux box opens TCP connection to some server.
    2. Linux decides on zero value of window scale.
    3. Due to compare against computed window scale size option, Linux does
    not to set windows scale TCP option header on SYN.

    With the expected result that the server OS does not use window scale option
    due to not receiving such an option in the SYN headers, leading to suboptimal
    performance.

    Signed-off-by: Gilad Ben-Yossef
    Signed-off-by: Ori Finkelman
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Ori Finkelman
     
  • net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
    net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Andrew Morton
     
  • David S. Miller
     
  • After last pktgen changes, delay handling is wrong.

    pktgen actually sends packets at full line speed.

    Fix is to update pkt_dev->next_tx even if spin() returns early,
    so that next spin() calls have a chance to see a positive delay.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Oct, 2009

1 commit