30 Jun, 2006

5 commits

  • skb_release_data() no longer has any users in other files.

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • This patch implements an API whereby an application can determine the
    label of its peer's Unix datagram sockets via the auxiliary data mechanism of
    recvmsg.

    Patch purpose:

    This patch enables a security-aware application to retrieve the
    security context of the peer of a Unix datagram socket. The application
    can then use this security context to determine the security context for
    processing on behalf of the peer who sent the packet.

    Patch design and implementation:

    The design and implementation is very similar to the UDP case for INET
    sockets. Basically we build upon the existing Unix domain socket API for
    retrieving user credentials. Linux offers the API for obtaining user
    credentials via ancillary messages (i.e., out of band/control messages
    that are bundled together with a normal message). To retrieve the security
    context, the application first indicates to the kernel such desire by
    setting the SO_PASSSEC option via getsockopt. Then the application
    retrieves the security context using the auxiliary data mechanism.

    An example server application for Unix datagram socket should look like this:

    toggle = 1;
    toggle_len = sizeof(toggle);

    setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
    recvmsg(sockfd, &msg_hdr, 0);
    if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len cmsg_level == SOL_SOCKET &&
    cmsg_hdr->cmsg_type == SCM_SECURITY) {
    memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
    }

    sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
    a server socket to receive security context of the peer.

    Testing:

    We have tested the patch by setting up Unix datagram client and server
    applications. We verified that the server can retrieve the security context
    using the auxiliary data mechanism of recvmsg.

    Signed-off-by: Catherine Zhang
    Acked-by: Acked-by: James Morris
    Signed-off-by: David S. Miller

    Catherine Zhang
     
  • Rather than having illegal_highdma as a macro when HIGHMEM is off, we
    can turn it into an inline function that returns zero. This will catch
    callers that give it bad arguments.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch encapsulates the usage of eff_cap (in netlink_skb_params) within
    the security framework by extending security_netlink_recv to include a required
    capability parameter and converting all direct usage of eff_caps outside
    of the lsm modules to use the interface. It also updates the SELinux
    implementation of the security_netlink_send and security_netlink_recv
    hooks to take advantage of the sid in the netlink_skb_params struct.
    This also enables SELinux to perform auditing of netlink capability checks.
    Please apply, for 2.6.18 if possible.

    Signed-off-by: Darrel Goeddel
    Signed-off-by: Stephen Smalley
    Acked-by: James Morris
    Signed-off-by: David S. Miller

    Darrel Goeddel
     
  • When GSO packets come from an untrusted source (e.g., a Xen guest domain),
    we need to verify the header integrity before passing it to the hardware.

    Since the first step in GSO is to verify the header, we can reuse that
    code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this
    bit set can only be fed directly to devices with the corresponding bit
    NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb
    is fed to the GSO engine which will allow the packet to be sent to the
    hardware if it passes the header check.

    This patch changes the sg flag to a full features flag. The same method
    can be used to implement TSO ECN support. We simply have to mark packets
    with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
    NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment
    the packet, or segment the first MTU and pass the rest to the hardware for
    further segmentation.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

26 Jun, 2006

5 commits

  • The netpoll system currently has a rx to tx path via:

    netpoll_rx
    __netpoll_rx
    arp_reply
    netpoll_send_skb
    dev->hard_start_tx

    This rx->tx loop places network drivers at risk of inadvertently causing a
    deadlock or BUG halt by recursively trying to acquire a spinlock that is
    used in both their rx and tx paths (this problem was origionally reported
    to me in the 3c59x driver, which shares a spinlock between the
    boomerang_interrupt and boomerang_start_xmit routines).

    This patch breaks this loop, by queueing arp frames, so that they can be
    responded to after all receive operations have been completed. Tested by
    myself and the reported with successful results.

    Specifically it was tested with netdump. Heres the BZ with details:
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=194055

    Signed-off-by: Neil Horman
    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Neil Horman
     
  • When transmitting a skb in netpoll_send_skb(), only retry a limited number
    of times if the device queue is stopped.

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Jeremy Fitzhardinge
     
  • skb_find_text takes a "to" argument which is supposed to limit how
    far into the skb it will search for the given text. At present,
    it seems to ignore that argument on the first skb, and instead
    return a match even if the text occurs beyond the limit.

    Patch below fixes this, after adjusting for the "from" starting
    point. This consequently fixes the netfilter string match's "--to"
    handling, which currently is broken.

    Signed-off-by: Phil Oester
    Signed-off-by: David S. Miller

    Phil Oester
     
  • netdev_nit can now become static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • Fix 2 problems in dev_hard_start_xmit():

    1. nskb->next needs to link back to skb->next if hard_start_xmit()
    returns non-zero.

    2. Since the total number of GSO fragments may exceed MAX_SKB_FRAGS + 1,
    it needs to stop transmitting if the netif_queue is stopped.

    Signed-off-by: Michael Chan
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Michael Chan
     

23 Jun, 2006

9 commits

  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [NET]: Require CAP_NET_ADMIN to create tuntap devices.
    [NET]: fix net-core kernel-doc
    [TCP]: Move inclusion of to correct place in
    [IPSEC]: Handle GSO packets
    [NET]: Added GSO toggle
    [NET]: Add software TSOv4
    [NET]: Add generic segmentation offload
    [NET]: Merge TSO/UFO fields in sk_buff
    [NET]: Prevent transmission after dev_deactivate
    [IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY
    [IPV6]: Fix source address selection.
    [NET]: Avoid allocating skb in skb_pad

    Linus Torvalds
     
  • list_splice_init(list, head) does unneeded job if it is known that
    list_empty(head) == 1. We can use list_replace_init() instead.

    Signed-off-by: Oleg Nesterov
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
    Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
    Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
    Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'

    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • This patch adds a generic segmentation offload toggle that can be turned
    on/off for each net device. For now it only supports in TCPv4.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds the GSO implementation for IPv4 TCP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds the infrastructure for generic segmentation offload.
    The idea is to tap into the potential savings of TSO without hardware
    support by postponing the allocation of segmented skb's until just
    before the entry point into the NIC driver.

    The same structure can be used to support software IPv6 TSO, as well as
    UFO and segmentation offload for other relevant protocols, e.g., DCCP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
    going to scale if we add any more segmentation methods (e.g., DCCP). So
    let's merge them.

    They were used to tell the protocol of a packet. This function has been
    subsumed by the new gso_type field. This is essentially a set of netdev
    feature bits (shifted by 16 bits) that are required to process a specific
    skb. As such it's easy to tell whether a given device can process a GSO
    skb: you just have to and the gso_type field and the netdev's features
    field.

    I've made gso_type a conjunction. The idea is that you have a base type
    (e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
    For example, if we add a hardware TSO type that supports ECN, they would
    declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
    have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
    packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
    to be emulated in software.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The dev_deactivate function has bit-rotted since the introduction of
    lockless drivers. In particular, the spin_unlock_wait call at the end
    has no effect on the xmit routine of lockless drivers.

    With a little bit of work, we can make it much more useful by providing
    the guarantee that when it returns, no more calls to the xmit routine
    of the underlying driver will be made.

    The idea is simple. There are two entry points in to the xmit routine.
    The first comes from dev_queue_xmit. That one is easily stopped by
    using synchronize_rcu. This works because we set the qdisc to noop_qdisc
    before the synchronize_rcu call. That in turn causes all subsequent
    packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call
    also ensures all outstanding calls leave their critical section.

    The other entry point is from qdisc_run. Since we now have a bit that
    indicates whether it's running, all we have to do is to wait until the
    bit is off.

    I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is
    useless because netif_wake_queue can cause it to be set again. It is
    also harmless because we've disarmed qdisc_run.

    I've also removed the spin_unlock_wait on xmit_lock because its only
    purpose of making sure that all outstanding xmit_lock holders have
    exited is also given by dev_watchdog_down.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • First of all it is unnecessary to allocate a new skb in skb_pad since
    the existing one is not shared. More importantly, our hard_start_xmit
    interface does not allow a new skb to be allocated since that breaks
    requeueing.

    This patch uses pskb_expand_head to expand the existing skb and linearize
    it if needed. Actually, someone should sift through every instance of
    skb_pad on a non-linear skb as they do not fit the reasons why this was
    originally created.

    Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
    TCP, etc.). As it is skb_pad will simply write over a cloned skb. Because
    of the position of the write it is unlikely to cause problems but still
    it's best if we don't do it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Jun, 2006

10 commits

  • The function ethtool_get_ufo was referring to ETHTOOL_GTSO instead of
    ETHTOOL_GUFO.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
    identically so we test for them in quite a few places. For the sake
    of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We
    also test the disjunct of NETIF_F_IP_CSUM and the other two in various
    places, for that purpose I've added NETIF_F_ALL_CSUM.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • It's better to warn and fail rather than rarely triggering BUG on paths
    that incorrectly call skb_trim/__skb_trim on a non-linear skb.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The linearisation operation doesn't need to be super-optimised. So we can
    replace __skb_linearize with __pskb_pull_tail which does the same thing but
    is more general.

    Also, most users of skb_linearize end up testing whether the skb is linear
    or not so it helps to make skb_linearize do just that.

    Some callers of skb_linearize also use it to copy cloned data, so it's
    useful to have a new function skb_linearize_cow to copy the data if it's
    either non-linear or cloned.

    Last but not least, I've removed the gfp argument since nobody uses it
    anymore. If it's ever needed we can easily add it back.

    Misc bugs fixed by this patch:

    * via-velocity error handling (also, no SG => no frags)

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Various drivers use xmit_lock internally to synchronise with their
    transmission routines. They do so without setting xmit_lock_owner.
    This is fine as long as netpoll is not in use.

    With netpoll it is possible for deadlocks to occur if xmit_lock_owner
    isn't set. This is because if a printk occurs while xmit_lock is held
    and xmit_lock_owner is not set can cause netpoll to attempt to take
    xmit_lock recursively.

    While it is possible to resolve this by getting netpoll to use
    trylock, it is suboptimal because netpoll's sole objective is to
    maximise the chance of getting the printk out on the wire. So
    delaying or dropping the message is to be avoided as much as possible.

    So the only alternative is to always set xmit_lock_owner. The
    following patch does this by introducing the netif_tx_lock family of
    functions that take care of setting/unsetting xmit_lock_owner.

    I renamed xmit_lock to _xmit_lock to indicate that it should not be
    used directly. I didn't provide irq versions of the netif_tx_lock
    functions since xmit_lock is meant to be a BH-disabling lock.

    This is pretty much a straight text substitution except for a small
    bug fix in winbond. It currently uses
    netif_stop_queue/spin_unlock_wait to stop transmission. This is
    unsafe as an IRQ can potentially wake up the queue. So it is safer to
    use netif_tx_disable.

    The hamradio bits used spin_lock_irq but it is unnecessary as
    xmit_lock must never be taken in an IRQ handler.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Add a secmark field to the skbuff structure, to allow security subsystems to
    place security markings on network packets. This is similar to the nfmark
    field, except is intended for implementing security policy, rather than than
    networking policy.

    This patch was already acked in principle by Dave Miller.

    Signed-off-by: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    James Morris
     
  • Any socket recv of less than this ammount will not be offloaded

    Signed-off-by: Chris Leech
    Signed-off-by: David S. Miller

    Chris Leech
     
  • Adds an async_wait_queue and some additional fields to tcp_sock, and a
    dma_cookie_t to sk_buff.

    Signed-off-by: Chris Leech
    Signed-off-by: David S. Miller

    Chris Leech
     
  • Provides for pinning user space pages in memory, copying to iovecs,
    and copying from sk_buffs including fragmented and chained sk_buffs.

    Signed-off-by: Chris Leech
    Signed-off-by: David S. Miller

    Chris Leech
     
  • Attempts to allocate per-CPU DMA channels

    Signed-off-by: Chris Leech
    Signed-off-by: David S. Miller

    Chris Leech
     

27 May, 2006

1 commit


13 May, 2006

1 commit

  • The classical IP over ATM code maintains its own IPv4
    ARP table, using the standard neighbour-table code. The
    neigh_table_init function adds this neighbour table to a linked list
    of all neighbor tables which is used by the functions neigh_delete()
    neigh_add() and neightbl_set(), all called by the netlink code.

    Once the ATM neighbour table is added to the list, there are two
    tables with family == AF_INET there, and ARP entries sent via netlink
    go into the first table with matching family. This is indeterminate
    and often wrong.

    To see the bug, on a kernel with CLIP enabled, create a standard IPv4
    ARP entry by pinging an unused address on a local subnet. Then attempt
    to complete that entry by doing

    ip neigh replace lladdr nud reachable

    Looking at the ARP tables by using

    ip neigh show

    will reveal two ARP entries for the same address. One of these can be
    found in /proc/net/arp, and the other in /proc/net/atm/arp.

    This patch adds a new function, neigh_table_init_no_netlink() which
    does everything the neigh_table_init() does, except add the table to
    the netlink all-arp-tables chain. In addition neigh_table_init() has a
    check that all tables on the chain have a distinct address family.
    The init call in clip.c is changed to call
    neigh_table_init_no_netlink().

    Since ATM ARP tables are rather more complicated than can currently be
    handled by the available rtattrs in the netlink protocol, no
    functionality is lost by this patch, and non-ATM ARP manipulation via
    netlink is rescued. A more complete solution would involve a rtattr
    for ATM ARP entries and some way for the netlink code to give
    neigh_add and friends more information than just address family with
    which to find the correct ARP table.

    [ I've changed the assertion checking in neigh_table_init() to not
    use BUG_ON() while holding neigh_tbl_lock. Instead we remember that
    we found an existing tbl with the same family, and after dropping
    the lock we'll give a diagnostic kernel log message and a stack dump.
    -DaveM ]

    Signed-off-by: Simon Kelley
    Signed-off-by: David S. Miller

    Simon Kelley
     

11 May, 2006

1 commit

  • The last step of netdevice registration was being done by a delayed
    call, but because it was delayed, it was impossible to return any error
    code if the class_device registration failed.

    Side effects:
    * one state in registration process is unnecessary.
    * register_netdevice can sleep inside class_device registration/hotplug
    * code in netdev_run_todo only does unregistration so it is simpler.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

10 May, 2006

2 commits

  • The test used in the linkwatch does not handle wrap-arounds correctly.
    Since the intention of the code is to eliminate bursts of messages we
    can afford to delay things up to a second. Using that fact we can
    easily handle wrap-arounds by making sure that we don't delay things
    by more than one second.

    This is based on diagnosis and a patch by Stefan Rompf.

    Signed-off-by: Herbert Xu
    Acked-by: Stefan Rompf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • From: Alan Stern

    This chain does it's own locking via the RTNL semaphore, and
    can also run recursively so adding a new mutex here was causing
    deadlocks.

    Signed-off-by: David S. Miller

    Alan Stern
     

07 May, 2006

1 commit


21 Apr, 2006

1 commit

  • * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (21 commits)
    [PATCH] wext: Fix RtNetlink ENCODE security permissions
    [PATCH] bcm43xx: iw_priv_args names should be <16 characters
    [PATCH] bcm43xx: sysfs code cleanup
    [PATCH] bcm43xx: fix pctl slowclock limit calculation
    [PATCH] bcm43xx: fix dyn tssi2dbm memleak
    [PATCH] bcm43xx: fix config menu alignment
    [PATCH] bcm43xx wireless: fix printk format warnings
    [PATCH] softmac: report when scanning has finished
    [PATCH] softmac: fix event sending
    [PATCH] softmac: handle iw_mode properly
    [PATCH] softmac: dont send out packets while scanning
    [PATCH] softmac: return -EAGAIN from getscan while scanning
    [PATCH] bcm43xx: set trans_start on TX to prevent bogus timeouts
    [PATCH] orinoco: fix truncating commsquality RID with the latest Symbol firmware
    [PATCH] softmac: fix spinlock recursion on reassoc
    [PATCH] Revert NET_RADIO Kconfig title change
    [PATCH] wext: Fix IWENCODEEXT security permissions
    [PATCH] wireless/atmel: send WEXT scan completion events
    [PATCH] wireless/airo: clean up WEXT association and scan events
    [PATCH] softmac uses Wiress Ext.
    ...

    Linus Torvalds
     

20 Apr, 2006

3 commits

  • Add some sanity checking. truesize should be at least sizeof(struct
    sk_buff) plus the current packet length. If not, then truesize is
    seriously mangled and deserves a kernel log message.

    Currently we'll do the check for release of stream socket buffers.

    But we can add checks to more spots over time.

    Incorporating ideas from Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • I've just realised that the RtNetlink code does not check the
    permission for SIOCGIWENCODE and SIOCGIWENCODEEXT, which means that
    any user can read the encryption keys. The fix is trivial and should
    go in 2.6.17 alonside the two other patch I sent you last week.

    Signed-off-by: Jean Tourrilhes
    Signed-off-by: John W. Linville

    Jean Tourrilhes
     
  • Check the permissions when user-space try to read the
    encryption parameters via SIOCGIWENCODEEXT. This is trivial and
    probably should go in 2.6.17...
    Bug was found by Brian Eaton , thanks !

    Signed-off-by: Jean Tourrilhes
    Signed-off-by: John W. Linville

    Jean Tourrilhes
     

19 Apr, 2006

1 commit