19 Dec, 2008

1 commit

  • The kernel_accept() does not hold the module refcount of newsock->ops->owner,
    so we need __module_get(newsock->ops->owner) code after call kernel_accept()
    by hand.
    In sunrpc, the module refcount is missing to hold. So this cause kernel panic.

    Used following script to reproduct:

    while [ 1 ];
    do
    mount -t nfs4 192.168.0.19:/ /mnt
    touch /mnt/file
    umount /mnt
    lsmod | grep ipv6
    done

    This patch fixed the problem by add __module_get(newsock->ops->owner) to
    kernel_accept(). So we do not need to used __module_get(newsock->ops->owner)
    in every place when used kernel_accept().

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

16 Dec, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    Phonet: keep TX queue disabled when the device is off
    SCHED: netem: Correct documentation comment in code.
    netfilter: update rwlock initialization for nat_table
    netlabel: Compiler warning and NULL pointer dereference fix
    e1000e: fix double release of mutex
    IA64: HP_SIMETH needs to depend upon NET
    netpoll: fix race on poll_list resulting in garbage entry
    ipv6: silence log messages for locally generated multicast
    sungem: improve ethtool output with internal pcs and serdes
    tcp: tcp_vegas cong avoid fix
    sungem: Make PCS PHY support partially work again.

    Linus Torvalds
     

15 Dec, 2008

3 commits


12 Dec, 2008

1 commit

  • Fix the two compiler warnings show below. Thanks to Geert Uytterhoeven for
    finding and reporting the problem.

    net/netlabel/netlabel_unlabeled.c:567: warning: 'entry' may be used
    uninitialized in this function
    net/netlabel/netlabel_unlabeled.c:629: warning: 'entry' may be used
    uninitialized in this function

    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     

10 Dec, 2008

2 commits

  • A few months back a race was discused between the netpoll napi service
    path, and the fast path through net_rx_action:
    http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470

    A patch was submitted for that bug, but I think we missed a case.

    Consider the following scenario:

    INITIAL STATE
    CPU0 has one napi_struct A on its poll_list
    CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
    napi_struct A that CPU0 has on its list

    CPU0 CPU1
    net_rx_action poll_napi
    !list_empty (returns true) locks poll_lock for A
    poll_one_napi
    napi->poll
    netif_rx_complete
    __napi_complete
    (removes A from poll_list)
    list_entry(list->next)

    In the above scenario, net_rx_action assumes that the per-cpu poll_list is
    exclusive to that cpu. netpoll of course violates that, and because the netpoll
    path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
    list at the top of the while loop in net_rx_action, but have it become empty by
    the time it calls list_entry. Since the poll_list isn't surrounded by any other
    structure, the returned data from that list_entry call in this situation is
    garbage, and any number of crashes can result based on what exactly that garbage
    is.

    Given that its not fasible for performance reasons to place exclusive locks
    arround each cpus poll list to provide that mutal exclusion, I think the best
    solution is modify the netpoll path in such a way that we continue to guarantee
    that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've
    implemented the patch below. It adds an additional bit to the state field in
    the napi_struct. When executing napi->poll from the netpoll_path, this bit will
    be set. When a driver calls netif_rx_complete, if that bit is set, it will not
    remove the napi_struct from the poll_list. That work will be saved for the next
    iteration of net_rx_action.

    I've tested this and it seems to work well. About the biggest drawback I can
    see to it is the fact that it might result in an extra loop through
    net_rx_action in the event that the device is actually contended for (i.e. the
    netpoll path actually preforms all the needed work no the device, and the call
    to net_rx_action winds up doing nothing, except removing the napi_struct from
    the poll_list. However I think this is probably a small price to pay, given
    that the alternative is a crash.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     
  • This patch fixes minor annoyance during transmission of unsolicited
    neighbor advertisements from userspace to multicast addresses (as
    far as I can see in RFC, this is allowed and the similar functionality
    for IPv4 has been in arping for a long time).

    Outgoing multicast packets get reinserted into local processing as if they
    are received from the network. The machine thus sees its own NA and fills
    the logs with error messages. This patch removes the message if NA has been
    generated locally.

    Signed-off-by: Jan Sembera
    Signed-off-by: David S. Miller

    Jan Sembera
     

09 Dec, 2008

2 commits

  • This patch addresses a book-keeping issue in tcp_vegas.c. At present
    tcp_vegas does separate book-keeping of cwnd based on packet sequence
    numbers. A mismatch can develop between this book-keeping and
    tp->snd_cwnd due, for example, to delayed acks acking multiple
    packets. When vegas transitions to reno operation (e.g. following
    loss), then this mismatch leads to incorrect behaviour (akin to a cwnd
    backoff). This seems mostly to affect operation at low cwnds where
    delayed acking can lead to a significant fraction of cwnd being
    covered by a single ack, leading to the book-keeping mismatch. This
    patch modifies the congestion avoidance update to avoid the need for
    separate book-keeping while leaving vegas congestion avoidance
    functionally unchanged. A secondary advantage of this modification is
    that the use of fixed-point (via V_PARAM_SHIFT) and 64 bit arithmetic
    is no longer necessary, simplifying the code.

    Some example test measurements with the patched code (confirming no functional
    change in the congestion avoidance algorithm) can be seen at:

    http://www.hamilton.ie/doug/vegaspatch/

    Signed-off-by: Doug Leith
    Signed-off-by: David S. Miller

    Doug Leith
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    tproxy: fixe a possible read from an invalid location in the socket match
    zd1211rw: use unaligned safe memcmp() in-place of compare_ether_addr()
    mac80211: use unaligned safe memcmp() in-place of compare_ether_addr()
    ipw2200: fix netif_*_queue() removal regression
    iwlwifi: clean key table in iwl_clear_stations_table function
    tcp: tcp_vegas ssthresh bug fix
    can: omit received RTR frames for single ID filter lists
    ATM: CVE-2008-5079: duplicate listen() on socket corrupts the vcc table
    netx-eth: initialize per device spinlock
    tcp: make urg+gso work for real this time
    enc28j60: Fix sporadic packet loss (corrected again)
    hysdn: fix writing outside the field on 64 bits
    b1isa: fix b1isa_exit() to really remove registered capi controllers
    can: Fix CAN_(EFF|RTR)_FLAG handling in can_filter
    Phonet: do not dump addresses from other namespaces
    netlabel: Fix a potential NULL pointer dereference
    bnx2: Add workaround to handle missed MSI.
    xfrm: Fix kernel panic when flush and dump SPD entries

    Linus Torvalds
     

08 Dec, 2008

1 commit


06 Dec, 2008

1 commit


05 Dec, 2008

4 commits

  • After fixing zd1211rw: use unaligned safe memcmp() in-place of
    compare_ether_addr(), I started to see kernel log messages detailing
    unaligned access:

    Kernel unaligned access at TPC[100f7f44] sta_info_get+0x24/0x68 [mac80211]

    As with the aforementioned patch, the unaligned access was eminating
    from a compare_ether_addr() call. Concerned that whilst it was safe to
    assume that unalignment was the norm for the zd1211rw, and take
    preventative measures, it may not be the case or acceptable to use the
    easy fix of changing the call to memcmp().

    My research however indicated that it was OK to do this, as there are
    a few instances where memcmp() is the preferred mechanism for doing
    mac address comparisons throughout the module.

    Signed-off-by: Shaddy Baddah
    Signed-off-by: John W. Linville

    Shaddy Baddah
     
  • This patch fixes a bug in tcp_vegas.c. At the moment this code leaves
    ssthresh untouched. However, this means that the vegas congestion
    control algorithm is effectively unable to reduce cwnd below the
    ssthresh value (if the vegas update lowers the cwnd below ssthresh,
    then slow start is activated to raise it back up). One example where
    this matters is when during slow start cwnd overshoots the link
    capacity and a flow then exits slow start with ssthresh set to a value
    above where congestion avoidance would like to adjust it.

    Signed-off-by: Doug Leith
    Signed-off-by: David S. Miller

    Doug Leith
     
  • Since commit d253eee20195b25e298bf162a6e72f14bf4803e5 the single CAN
    identifier filter lists handle only non-RTR CAN frames.

    So we need to omit the check of these filter lists when receiving RTR
    CAN frames.

    Signed-off-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Oliver Hartkopp
     
  • As reported by Hugo Dias that it is possible to cause a local denial
    of service attack by calling the svc_listen function twice on the same
    socket and reading /proc/net/atm/*vc

    Signed-off-by: Chas Williams
    Signed-off-by: David S. Miller

    Chas Williams
     

04 Dec, 2008

4 commits

  • I should have noticed this earlier... :-) The previous solution
    to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
    patterns, or even worse, a steep downward slope into packet
    counting because each skb pcount would be truncated to pcount
    of 2 and then the following fragments of the later portion would
    restore the window again.

    Basically this reverts "tcp: Do not use TSO/GSO when there is
    urgent data" (33cf71cee1). It also removes some unnecessary code
    from tcp_current_mss that didn't work as intented either (could
    be that something was changed down the road, or it might have
    been broken since the dawn of time) because it only works once
    urg is already written while this bug shows up starting from
    ~64k before the urg point.

    The retransmissions already are split to mss sized chunks, so
    only new data sending paths need splitting in case they have
    a segment otherwise suitable for gso/tso. The actually check
    can be improved to be more narrow but since this is late -rc
    already, I'll postpone thinking the more fine-grained things.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • * 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
    NLM: client-side nlm_lookup_host() should avoid matching on srcaddr
    nfsd: use of unitialized list head on error exit in nfs4recover.c
    Add a reference to sunrpc in svc_addsock
    nfsd: clean up grace period on early exit

    Linus Torvalds
     
  • Due to a wrong safety check in af_can.c it was not possible to filter
    for SFF frames with a specific CAN identifier without getting the
    same selected CAN identifier from a received EFF frame also.

    This fix has a minimum (but user visible) impact on the CAN filter
    API and therefore the CAN version is set to a new date.

    Indeed the 'old' API is still working as-is. But when now setting
    CAN_(EFF|RTR)_FLAG in can_filter.can_mask you might get less traffic
    than before - but still the stuff that you expected to get for your
    defined filter ...

    Thanks to Kurt Van Dijck for pointing at this issue and for the review.

    Signed-off-by: Oliver Hartkopp
    Acked-by: Kurt Van Dijck
    Signed-off-by: David S. Miller

    Oliver Hartkopp
     
  • Signed-off-by: Rémi Denis-Courmont
    Signed-off-by: David S. Miller

    remi.denis-courmont@nokia
     

03 Dec, 2008

2 commits

  • Fix a potential NULL pointer dereference seen when trying to remove a
    static label configuration with an invalid address/mask combination.

    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     
  • After flush the SPD entries, dump the SPD entries will cause kernel painc.

    Used the following commands to reproduct:

    - echo 'spdflush;' | setkey -c
    - echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64 any -P out ipsec \
    ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
    spddump;' | setkey -c
    - echo 'spdflush; spddump;' | setkey -c
    - echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64 any -P out ipsec \
    ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
    spddump;' | setkey -c

    This is because when flush the SPD entries, the SPD entry is not remove
    from the list.

    This patch fix the problem by remove the SPD entry from the list.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

27 Nov, 2008

2 commits

  • This is an implementation of David Miller's suggested fix in:
    https://bugzilla.redhat.com/show_bug.cgi?id=470201

    It has been updated to use wait_event() instead of
    wait_event_interruptible().

    Paraphrasing the description from the above report, it makes sendmsg()
    block while UNIX garbage collection is in progress. This avoids a
    situation where child processes continue to queue new FDs over a
    AF_UNIX socket to a parent which is in the exit path and running
    garbage collection on these FDs. This contention can result in soft
    lockups and oom-killing of unrelated processes.

    Signed-off-by: dann frazier
    Signed-off-by: David S. Miller

    dann frazier
     
  • A NULL dereference would occur when trying to delete an addres from a
    network device that does not have any Phonet address.

    Signed-off-by: Rémi Denis-Courmont
    Signed-off-by: David S. Miller

    Rémi Denis-Courmont
     

26 Nov, 2008

5 commits


25 Nov, 2008

4 commits

  • Since changeset e79ad711a0108475c1b3a03815527e7237020b08 from mainline,
    >From David S. Miller,
    empty packet can be transmitted on connected socket for datagram protocols.

    However, this patch broke a high level application using ROSE network protocol with connected datagram.

    Bulletin Board Stations perform bulletins forwarding between BBS stations via ROSE network using a forward protocol.
    Now, if for some reason, a buffer in the application software happens to be empty at a specific moment,
    ROSE sends an empty packet via unfiltered packet socket.
    When received, this ROSE packet introduces perturbations of data exchange of BBS forwarding,
    for the application message forwarding protocol is waiting for something else.
    We agree that a more careful programming of the application protocol would avoid this situation and we are
    willing to debug it.
    But, as an empty frame is no use and does not have any meaning for ROSE protocol,
    we may consider filtering zero length data both when sending and receiving socket data.

    The proposed patch repaired BBS data exchange through ROSE network that were broken since 2.6.22.11 kernel.

    Signed-off-by: Bernard Pidoux
    Signed-off-by: David S. Miller

    Bernard Pidoux
     
  • As GRE tries to call the update_pmtu function on skb->dst and
    bridge supplies an skb->dst that has a NULL ops field, all is
    not well.

    This patch fixes this by giving the bridge device an ops field
    with an update_pmtu function. For the moment I've left all
    other fields blank but we can fill them in later should the
    need arise.

    Based on report and patch by Philip Craig.

    Signed-off-by: Herbert Xu
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Conntrack creation through ctnetlink has two races:

    - the timer may expire and free the conntrack concurrently, causing an
    invalid memory access when attempting to put it in the hash tables

    - an identical conntrack entry may be created in the packet processing
    path in the time between the lookup and hash insertion

    Hold the conntrack lock between the lookup and insertion to avoid this.

    Reported-by: Zoltan Borbely
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The svc_addsock function adds transport instances without taking a
    reference on the sunrpc.ko module, however, the generic transport
    destruction code drops a reference when a transport instance
    is destroyed.

    Add a try_module_get call to the svc_addsock function for transport
    instances added by this function.

    Signed-off-by: Tom Tucker
    Signed-off-by: J. Bruce Fields
    Tested-by: Jeff Moyer

    Tom Tucker
     

22 Nov, 2008

2 commits

  • If the slub allocator is used, kmem_cache_create() may merge two or more
    kmem_cache's into one but the cache name pointer is not updated and
    kmem_cache_name() is no longer guaranteed to return the pointer passed
    to the former function. This patch stores the kmalloc'ed pointers in the
    corresponding request_sock_ops and timewait_sock_ops structures.

    Signed-off-by: Catalin Marinas
    Acked-by: Arnaldo Carvalho de Melo
    Reviewed-by: Christoph Lameter
    Signed-off-by: David S. Miller

    Catalin Marinas
     
  • This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014

    Since most (if not all) implementations of TSO and even the in-kernel
    software GSO do not update the urgent pointer when splitting a large
    segment, it is necessary to turn off TSO/GSO for all outgoing traffic
    with the URG pointer set.

    Looking at tcp_current_mss (and the preceding comment) I even think
    this was the original intention. However, this approach is insufficient,
    because TSO/GSO is turned off only for newly created frames, not for
    frames which were already pending at the arrival of a message with
    MSG_OOB set. These frames were created when TSO/GSO was enabled,
    so they may be large, and they will have the urgent pointer set
    in tcp_transmit_skb().

    With this patch, such large packets will be fragmented again before
    going to the transmit routine.

    As a side note, at least the following NICs are known to screw up
    the urgent pointer in the TCP header when doing TSO:

    Intel 82566MM (PCI ID 8086:1049)
    Intel 82566DC (PCI ID 8086:104b)
    Intel 82541GI (PCI ID 8086:1076)
    Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)

    Signed-off-by: Petr Tesarik
    Signed-off-by: David S. Miller

    Petr Tesarik
     

21 Nov, 2008

2 commits

  • Fix a regression reported by Max Kellermann whereby kernel profiling
    showed that his clients were spending 45% of their time in
    rpcauth_lookup_credcache.

    It turns out that although his processes had identical uid/gid/groups,
    generic_match() was failing to detect this, because the task->group_info
    pointers were not shared. This again lead to the creation of a huge number
    of identical credentials at the RPC layer.

    The regression is fixed by comparing the contents of task->group_info
    if the actual pointers are not identical.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
    net: fix tiny output corruption of /proc/net/snmp6
    atl2: don't request irq on resume if netif running
    ipv6: use seq_release_private for ip6mr.c /proc entries
    pkt_sched: fix missing check for packet overrun in qdisc_dump_stab()
    smc911x: Fix printf format typo in smc911x driver.
    asix: Fix asix-based cards connecting to 10/100Mbs LAN.
    mv643xx_eth: fix recycle check bound
    mv643xx_eth: fix the order of mdiobus_{unregister, free}() calls
    sh: sh_eth: Update to change of mii_bus
    TPROXY: supply a struct flowi->flags argument in inet_sk_rebuild_header()
    TPROXY: fill struct flowi->flags in udp_sendmsg()
    net: ipg.c fix bracing on endian swapping
    phylib: Fix auto-negotiation restart avoidance
    net: jme.c rxdesc.flags is __le16, other missing endian swaps
    phylib: fix phy name example in documentation
    net: Do not fire linkwatch events until the device is registered.
    phonet: fix compilation with gcc-3.4
    ixgbe: fix compilation with gcc-3.4
    pktgen: fix multiple queue warning
    net: fix ip_mr_init() error path
    ...

    Linus Torvalds
     

20 Nov, 2008

3 commits