11 Jun, 2010

1 commit


07 Jun, 2010

1 commit


02 Jun, 2010

1 commit

  • There are more than a dozen occurrences of following code in the
    IPv6 stack:

    if (opt && opt->srcrt) {
    struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
    ipv6_addr_copy(&final, &fl.fl6_dst);
    ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
    final_p = &final;
    }

    Replace those with a helper. Note that the helper overrides final_p
    in all cases. This is ok as final_p was previously initialized to
    NULL when declared.

    Signed-off-by: Arnaud Ebalard
    Signed-off-by: David S. Miller

    Arnaud Ebalard
     

11 May, 2010

1 commit


29 Apr, 2010

1 commit

  • When queueing a skb to socket, we can immediately release its dst if
    target socket do not use IP_CMSG_PKTINFO.

    tcp_data_queue() can drop dst too.

    This to benefit from a hot cache line and avoid the receiver, possibly
    on another cpu, to dirty this cache line himself.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Apr, 2010

2 commits


20 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

1 commit


18 Jan, 2010

1 commit


08 Nov, 2009

1 commit


06 Nov, 2009

1 commit

  • struct can_proto had a capability field which wasn't ever used. It is
    dropped entirely.

    struct inet_protosw had a capability field which can be more clearly
    expressed in the code by just checking if sock->type = SOCK_RAW.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

19 Oct, 2009

2 commits

  • - skb_kill_datagram() can increment sk->sk_drops itself, not callers.

    - UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Oct, 2009

1 commit

  • sock_queue_rcv_skb() can update sk_drops itself, removing need for
    callers to take care of it. This is more consistent since
    sock_queue_rcv_skb() also reads sk_drops when queueing a skb.

    This adds sk_drops managment to many protocols that not cared yet.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Oct, 2009

1 commit

  • Create a new socket level option to report number of queue overflows

    Recently I augmented the AF_PACKET protocol to report the number of frames lost
    on the socket receive queue between any two enqueued frames. This value was
    exported via a SOL_PACKET level cmsg. AFter I completed that work it was
    requested that this feature be generalized so that any datagram oriented socket
    could make use of this option. As such I've created this patch, It creates a
    new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
    SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
    overflowed between any two given frames. It also augments the AF_PACKET
    protocol to take advantage of this new feature (as it previously did not touch
    sk->sk_drops, which this patch uses to record the overflow count). Tested
    successfully by me.

    Notes:

    1) Unlike my previous patch, this patch simply records the sk_drops value, which
    is not a number of drops between packets, but rather a total number of drops.
    Deltas must be computed in user space.

    2) While this patch currently works with datagram oriented protocols, it will
    also be accepted by non-datagram oriented protocols. I'm not sure if thats
    agreeable to everyone, but my argument in favor of doing so is that, for those
    protocols which aren't applicable to this option, sk_drops will always be zero,
    and reporting no drops on a receive queue that isn't used for those
    non-participating protocols seems reasonable to me. This also saves us having
    to code in a per-protocol opt in mechanism.

    3) This applies cleanly to net-next assuming that commit
    977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Sep, 2009

1 commit

  • Christoph Lameter pointed out that packet drops at qdisc level where not
    accounted in SNMP counters. Only if application sets IP_RECVERR, drops
    are reported to user (-ENOBUFS errors) and SNMP counters updated.

    IP_RECVERR is used to enable extended reliable error message passing,
    but these are not needed to update system wide SNMP stats.

    This patch changes things a bit to allow SNMP counters to be updated,
    regardless of IP_RECVERR being set or not on the socket.

    Example after an UDP tx flood
    # netstat -s
    ...
    IP:
    1487048 outgoing packets dropped
    ...
    Udp:
    ...
    SndbufErrors: 1487048

    send() syscalls, do however still return an OK status, to not
    break applications.

    Note : send() manual page explicitly says for -ENOBUFS error :

    "The output queue for a network interface was full.
    This generally indicates that the interface has stopped sending,
    but may be caused by transient congestion.
    (Normally, this does not occur in Linux. Packets are just silently
    dropped when a device queue overflows.) "

    This is not true for IP_RECVERR enabled sockets : a send() syscall
    that hit a qdisc drop returns an ENOBUFS error.

    Many thanks to Christoph, David, and last but not least, Alexey !

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Aug, 2009

1 commit

  • This replaces assignments of the type "int on LHS" = "u8 on RHS" with
    simpler code. The LHS can express all of the unsigned right hand side
    values, hence the assigned value can not be negative.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

23 Jun, 2009

1 commit


18 Jun, 2009

1 commit

  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    We need to take into account this offset when reporting
    sk_wmem_alloc to user, in PROC_FS files or various
    ioctls (SIOCOUTQ/TIOCOUTQ)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Apr, 2009

1 commit

  • The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
    OutMcastOctets:
    http://tools.ietf.org/html/rfc4293
    But it seems we don't track those in any way that easy to separate from other
    protocols. This patch adds those missing counters to the stats file. Tested
    successfully by me

    With help from Eric Dumazet.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

26 Nov, 2008

1 commit

  • Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
    to flow_cache_lookup() and resolver callback.

    Take it from socket or netdevice. Stub DECnet to init_net.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

09 Oct, 2008

1 commit


30 Aug, 2008

1 commit


19 Jul, 2008

1 commit


18 Jun, 2008

1 commit

  • In commits 33c732c36169d7022ad7d6eb474b0c9be43a2dc1 ([IPV4]: Add raw
    drops counter) and a92aa318b4b369091fd80433c80e62838db8bc1c ([IPV6]:
    Add raw drops counter), Wang Chen added raw drops counter for
    /proc/net/raw & /proc/net/raw6

    This patch adds this capability to UDP sockets too (/proc/net/udp &
    /proc/net/udp6).

    This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
    be examined for each udp socket.

    # grep Udp: /proc/net/snmp
    Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
    Udp: 23971006 75 899420 16390693 146348 0

    # cat /proc/net/udp
    sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt ---
    uid timeout inode ref pointer drops
    75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
    0 0 2358 2 ffff81082a538c80 0
    111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
    0 0 2286 2 ffff81042dd35c80 146348

    In this example, only port 111 (0x006F) was flooded by messages that
    user program could not read fast enough. 146348 messages were lost.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Jun, 2008

1 commit


14 Jun, 2008

1 commit


13 Jun, 2008

1 commit

  • In changeset 22dd485022f3d0b162ceb5e67d85de7c3806aa20
    ("raw: Raw socket leak.") code was added so that we
    flush pending frames on raw sockets to avoid leaks.

    The ipv4 part was fine, but the ipv6 part was not
    done correctly. Unlike the ipv4 side, the ipv6 code
    already has a .destroy method for rawv6_prot.

    So now there were two assignments to this member, and
    what the compiler does is use the last one, effectively
    making the ipv6 parts of that changeset a NOP.

    Fix this by removing the:

    .destroy = inet6_destroy_sock,

    line, and adding an inet6_destroy_sock() call to the
    end of raw6_destroy().

    Noticed by Al Viro.

    Signed-off-by: David S. Miller
    Acked-by: YOSHIFUJI Hideaki

    David S. Miller
     

12 Jun, 2008

1 commit


05 Jun, 2008

2 commits

  • The program below just leaks the raw kernel socket

    int main() {
    int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
    struct sockaddr_in addr;

    memset(&addr, 0, sizeof(addr));
    inet_aton("127.0.0.1", &addr.sin_addr);
    addr.sin_family = AF_INET;
    addr.sin_port = htons(2048);
    sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr));
    return 0;
    }

    Corked packet is allocated via sock_wmalloc which holds the owner socket,
    so one should uncork it and flush all pending data on close. Do this in the
    same way as in UDP.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     

13 May, 2008

1 commit

  • This patch adds needed_headroom/needed_tailroom members to struct
    net_device and updates many places that allocate sbks to use them. Not
    all of them can be converted though, and I'm sure I missed some (I
    mostly grepped for LL_RESERVED_SPACE)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

25 Apr, 2008

1 commit


12 Apr, 2008

2 commits

  • Based on patch from Dmitry Butskoy .

    Closes: 10437
    Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • This patch fixes a difference between IPv4 and IPv6 when sending packets
    to the unspecified address (either 0.0.0.0 or ::) when using raw or
    un-connected UDP sockets. There are two cases where IPv6 either fails
    to send anything, or sends with the destination address set to ::. For
    example:

    --> ping -c1 0.0.0.0
    PING 0.0.0.0 (127.0.0.1) 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.032 ms

    --> ping6 -c1 ::
    PING ::(::) 56 data bytes
    ping: sendmsg: Invalid argument

    Doing a sendto("0.0.0.0") reveals:

    10:55:01.495090 IP localhost.32780 > localhost.7639: UDP, length 100

    Doing a sendto("::") reveals:

    10:56:13.262478 IP6 fe80::217:8ff:fe7d:4718.32779 > ::.7639: UDP, length 100

    If you issue a connect() first in the UDP case, it will be sent to ::1,
    similar to what happens with TCP.

    This restores the BSD-ism.

    Signed-off-by: Brian Haley
    Signed-off-by: YOSHIFUJI Hideaki

    Brian Haley
     

05 Apr, 2008

1 commit