27 Jun, 2006

1 commit


26 Jun, 2006

3 commits

  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (51 commits)
    nfs: remove nfs_put_link()
    nfs-build-fix-99
    git-nfs-build-fixes
    Merge branch 'odirect'
    NFS: alloc nfs_read/write_data as direct I/O is scheduled
    NFS: Eliminate nfs_get_user_pages()
    NFS: refactor nfs_direct_free_user_pages
    NFS: remove user_addr, user_count, and pos from nfs_direct_req
    NFS: "open code" the NFS direct write rescheduler
    NFS: Separate functions for counting outstanding NFS direct I/Os
    NLM: Fix reclaim races
    NLM: sem to mutex conversion
    locks.c: add the fl_owner to nlm_compare_locks
    NFS: Display the chosen RPCSEC_GSS security flavour in /proc/mounts
    NFS: Split fs/nfs/inode.c
    NFS: Fix typo in nfs_do_clone_mount()
    NFS: Fix compile errors introduced by referrals patches
    NFSv4: Ensure that referral mounts bind to a reserved port
    NFSv4: A root pathname is sent as a zero component4
    NFSv4: Follow a referral
    ...

    Linus Torvalds
     
  • There are several instances of per_cpu(foo, raw_smp_processor_id()), which
    is semantically equivalent to __get_cpu_var(foo) but without the warning
    that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
    those architectures with optimized per-cpu implementations, namely ia64,
    powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
    code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
    on those platforms.

    This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
    raw_smp_processor_id()) on architectures that use the generic per-cpu
    implementation, and turns into __get_cpu_var(x) on the architectures that
    have an optimized per-cpu implementation.

    Signed-off-by: Paul Mackerras
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Acked-by: Martin Schwidefsky
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mackerras
     
  • Convert a few stragglers over to for_each_possible_cpu(), remove
    for_each_cpu().

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

25 Jun, 2006

1 commit


23 Jun, 2006

15 commits

  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [NET]: Require CAP_NET_ADMIN to create tuntap devices.
    [NET]: fix net-core kernel-doc
    [TCP]: Move inclusion of to correct place in
    [IPSEC]: Handle GSO packets
    [NET]: Added GSO toggle
    [NET]: Add software TSOv4
    [NET]: Add generic segmentation offload
    [NET]: Merge TSO/UFO fields in sk_buff
    [NET]: Prevent transmission after dev_deactivate
    [IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY
    [IPV6]: Fix source address selection.
    [NET]: Avoid allocating skb in skb_pad

    Linus Torvalds
     
  • list_splice_init(list, head) does unneeded job if it is known that
    list_empty(head) == 1. We can use list_replace_init() instead.

    Signed-off-by: Oleg Nesterov
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Default values for boolean and tristate options can only be 'y', 'm' or 'n'.
    This patch removes wrong default for IP_DCCP_ACKVEC.

    Signed-off-by: Jean-Luc Leger
    Cc: Arnaldo Carvalho de Melo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean-Luc Leger
     
  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
    Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
    Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
    Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'

    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • This patch segments GSO packets received by the IPsec stack. This can
    happen when a NIC driver injects GSO packets into the stack which are
    then forwarded to another host.

    The primary application of this is going to be Xen where its backend
    driver may inject GSO packets into dom0.

    Of course this also can be used by other virtualisation schemes such as
    VMWare or UML since the tap device could be modified to inject GSO packets
    received through splice.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds a generic segmentation offload toggle that can be turned
    on/off for each net device. For now it only supports in TCPv4.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds the GSO implementation for IPv4 TCP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds the infrastructure for generic segmentation offload.
    The idea is to tap into the potential savings of TSO without hardware
    support by postponing the allocation of segmented skb's until just
    before the entry point into the NIC driver.

    The same structure can be used to support software IPv6 TSO, as well as
    UFO and segmentation offload for other relevant protocols, e.g., DCCP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
    going to scale if we add any more segmentation methods (e.g., DCCP). So
    let's merge them.

    They were used to tell the protocol of a packet. This function has been
    subsumed by the new gso_type field. This is essentially a set of netdev
    feature bits (shifted by 16 bits) that are required to process a specific
    skb. As such it's easy to tell whether a given device can process a GSO
    skb: you just have to and the gso_type field and the netdev's features
    field.

    I've made gso_type a conjunction. The idea is that you have a base type
    (e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
    For example, if we add a hardware TSO type that supports ECN, they would
    declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
    have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
    packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
    to be emulated in software.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The dev_deactivate function has bit-rotted since the introduction of
    lockless drivers. In particular, the spin_unlock_wait call at the end
    has no effect on the xmit routine of lockless drivers.

    With a little bit of work, we can make it much more useful by providing
    the guarantee that when it returns, no more calls to the xmit routine
    of the underlying driver will be made.

    The idea is simple. There are two entry points in to the xmit routine.
    The first comes from dev_queue_xmit. That one is easily stopped by
    using synchronize_rcu. This works because we set the qdisc to noop_qdisc
    before the synchronize_rcu call. That in turn causes all subsequent
    packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call
    also ensures all outstanding calls leave their critical section.

    The other entry point is from qdisc_run. Since we now have a bit that
    indicates whether it's running, all we have to do is to wait until the
    bit is off.

    I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is
    useless because netif_wake_queue can cause it to be set again. It is
    also harmless because we've disarmed qdisc_run.

    I've also removed the spin_unlock_wait on xmit_lock because its only
    purpose of making sure that all outstanding xmit_lock holders have
    exited is also given by dev_watchdog_down.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • We need to update hiscore.rule even if we don't enable CONFIG_IPV6_PRIVACY,
    because we have more less significant rule; longest match.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • Two additional labels (RFC 3484, sec. 10.3) for IPv6 addreses
    are defined to make a distinction between global unicast
    addresses and Unique Local Addresses (fc00::/7, RFC 4193) and
    Teredo (2001::/32, RFC 4380). It is necessary to avoid attempts
    of connection that would either fail (eg. fec0:: to 2001:feed::)
    or be sub-optimal (2001:0:: to 2001:feed::).

    Signed-off-by: Łukasz Stelmach
    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Łukasz Stelmach
     
  • First of all it is unnecessary to allocate a new skb in skb_pad since
    the existing one is not shared. More importantly, our hard_start_xmit
    interface does not allow a new skb to be allocated since that breaks
    requeueing.

    This patch uses pskb_expand_head to expand the existing skb and linearize
    it if needed. Actually, someone should sift through every instance of
    skb_pad on a non-linear skb as they do not fit the reasons why this was
    originally created.

    Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
    TCP, etc.). As it is skb_pad will simply write over a cloned skb. Because
    of the position of the write it is unlikely to cause problems but still
    it's best if we don't do it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Jeff Garzik
     

20 Jun, 2006

8 commits

  • Trond Myklebust
     
  • NIPQUAD expects an l-value of type __be32, _NOT_ a pointer to __be32.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Al Viro
     
  • sizeof(pointer) != sizeof(array)...

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Al Viro
     
  • …e/wireless-2.6 into upstream

    Jeff Garzik
     
  • Having two or more qdisc_run's contend against each other is bad because
    it can induce packet reordering if the packets have to be requeued. It
    appears that this is an unintended consequence of relinquinshing the queue
    lock while transmitting. That in turn is needed for devices that spend a
    lot of time in their transmit routine.

    There are no advantages to be had as devices with queues are inherently
    single-threaded (the loopback device is not but then it doesn't have a
    queue).

    Even if you were to add a queue to a parallel virtual device (e.g., bolt
    a tbf filter in front of an ipip tunnel device), you would still want to
    process the queue in sequence to ensure that the packets are ordered
    correctly.

    The solution here is to steal a bit from net_device to prevent this.

    BTW, as qdisc_restart is no longer used by anyone as a module inside the
    kernel (IIRC it used to with netif_wake_queue), I have not exported the
    new __qdisc_run function.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Fix endless loop in the SCTP match similar to those already fixed in
    the SCTP conntrack helper (was CVE-2006-1527).

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (46 commits)
    IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
    IB/mthca: Make all device methods truly reentrant
    IB/mthca: Fix memory leak on modify_qp error paths
    IB/uverbs: Factor out common idr code
    IB/uverbs: Don't decrement usecnt on error paths
    IB/uverbs: Release lock on error path
    IB/cm: Use address handle helpers
    IB/sa: Add ib_init_ah_from_path()
    IB: Add ib_init_ah_from_wc()
    IB/ucm: Get rid of duplicate P_Key parameter
    IB/srp: Factor out common request reset code
    IB/srp: Support SRP rev. 10 targets
    [SCSI] srp.h: Add I/O Class values
    IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
    IB/mthca: Fill in max_map_per_fmr device attribute
    IB/ipath: Add client reregister event generation
    IB/mthca: Add client reregister event generation
    IB: Move struct port_info from ipath to
    IPoIB: Handle client reregister events
    IB: Add client reregister event type
    ...

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (109 commits)
    [ETHTOOL]: Fix UFO typo
    [SCTP]: Fix persistent slowdown in sctp when a gap ack consumes rx buffer.
    [SCTP]: Send only 1 window update SACK per message.
    [SCTP]: Don't do CRC32C checksum over loopback.
    [SCTP] Reset rtt_in_progress for the chunk when processing its sack.
    [SCTP]: Reject sctp packets with broadcast addresses.
    [SCTP]: Limit association max_retrans setting in setsockopt.
    [PFKEYV2]: Fix inconsistent typing in struct sadb_x_kmprivate.
    [IPV6]: Sum real space for RTAs.
    [IRDA]: Use put_unaligned() in irlmp_do_discovery().
    [BRIDGE]: Add support for NETIF_F_HW_CSUM devices
    [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM
    [TG3]: Convert to non-LLTX
    [TG3]: Remove unnecessary tx_lock
    [TCP]: Add tcp_slow_start_after_idle sysctl.
    [BNX2]: Update version and reldate
    [BNX2]: Use CPU native page size
    [BNX2]: Use compressed firmware
    [BNX2]: Add firmware decompression
    [BNX2]: Allow WoL settings on new 5708 chips
    ...

    Manual fixup for conflict in drivers/net/tulip/winbond-840.c

    Linus Torvalds
     

18 Jun, 2006

12 commits

  • The function ethtool_get_ufo was referring to ETHTOOL_GTSO instead of
    ETHTOOL_GUFO.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • In the event that our entire receive buffer is full with a series of
    chunks that represent a single gap-ack, and then we accept a chunk
    (or chunks) that fill in the gap between the ctsn and the first gap,
    we renege chunks from the end of the buffer, which effectively does
    nothing but move our gap to the end of our received tsn stream. This
    does little but move our missing tsns down stream a little, and, if the
    sender is sending sufficiently large retransmit frames, the result is a
    perpetual slowdown which can never be recovered from, since the only
    chunk that can be accepted to allow progress in the tsn stream necessitates
    that a new gap be created to make room for it. This leads to a constant
    need for retransmits, and subsequent receiver stalls. The fix I've come up
    with is to deliver the frame without reneging if we have a full receive
    buffer and the receiving sockets sk_receive_queue is empty(indicating that
    the receive buffer is being blocked by a missing tsn).

    Signed-off-by: Neil Horman
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Neil Horman
     
  • Right now, every time we increase our rwnd by more then MTU bytes, we
    trigger a SACK. When processing large messages, this will generate a
    SACK for almost every other SCTP fragment. However since we are freeing
    the entire message at the same time, we might as well collapse the SACK
    generation to 1.

    Signed-off-by: Tsutomu Fujii
    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Tsutomu Fujii
     
  • Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Sridhar Samudrala
     
  • Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When using ASSOCINFO socket option, we need to limit the number of
    maximum association retransmissions to be no greater than the sum
    of all the path retransmissions. This is specified in Section 7.1.2
    of the SCTP socket API draft.
    However, we only do this if the association has multiple paths. If
    there is only one path, the protocol stack will use the
    assoc_max_retrans setting when trying to retransmit packets.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This patch fixes RTNLGRP_IPV6_IFINFO netlink notifications. Issue
    pointed out by Patrick McHardy .

    Signed-off-by: YOSHIFUJI Hideaki
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • irda_device_info->hints[] is byte aligned but is being
    accessed as a u16

    Based upon a patch by Luke Yang .

    Signed-off-by: David S. Miller

    David S. Miller
     
  • As it is the bridge will only ever declare NETIF_F_IP_CSUM even if all
    its constituent devices support NETIF_F_HW_CSUM. This patch fixes
    this by supporting the first one out of NETIF_F_NO_CSUM,
    NETIF_F_HW_CSUM, and NETIF_F_IP_CSUM that is supported by all
    constituent devices.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
    identically so we test for them in quite a few places. For the sake
    of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We
    also test the disjunct of NETIF_F_IP_CSUM and the other two in various
    places, for that purpose I've added NETIF_F_ALL_CSUM.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • A lot of people have asked for a way to disable tcp_cwnd_restart(),
    and it seems reasonable to add a sysctl to do that.

    Signed-off-by: David S. Miller

    David S. Miller