08 Aug, 2011

1 commit

  • Currently userland will barf when including linux/netlink.h unless it
    precisely includes sys/socket.h first. The issue is where the
    definition of "sa_family_t" comes from.

    We've been back and forth on how to fix this issue in the past, see:

    http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621
    http://thread.gmane.org/gmane.linux.network/143380

    Ben Hutchings suggested we take a hint from how we handle the
    sockaddr_storage type. First we define a "__kernel_sa_family_t"
    to linux/socket.h that is always defined.

    Then if __KERNEL__ is defined, we also define "sa_family_t" as
    equal to "__kernel_sa_family_t".

    Then in places like linux/netlink.h we use __kernel_sa_family_t
    in user visible datastructures.

    Reported-by: Michel Machado
    Signed-off-by: David S. Miller

    David S. Miller
     

06 Jul, 2011

1 commit


06 May, 2011

1 commit

  • This patch adds a multiple message send syscall and is the send
    version of the existing recvmmsg syscall. This is heavily
    based on the patch by Arnaldo that added recvmmsg.

    I wrote a microbenchmark to test the performance gains of using
    this new syscall:

    http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

    The test was run on a ppc64 box with a 10 Gbit network card. The
    benchmark can send both UDP and RAW ethernet packets.

    64B UDP

    batch pkts/sec
    1 804570
    2 872800 (+ 8 %)
    4 916556 (+14 %)
    8 939712 (+17 %)
    16 952688 (+18 %)
    32 956448 (+19 %)
    64 964800 (+20 %)

    64B raw socket

    batch pkts/sec
    1 1201449
    2 1350028 (+12 %)
    4 1461416 (+22 %)
    8 1513080 (+26 %)
    16 1541216 (+28 %)
    32 1553440 (+29 %)
    64 1557888 (+30 %)

    We see a 20% improvement in throughput on UDP send and 30%
    on raw socket send.

    [ Add sparc syscall entries. -DaveM ]

    Signed-off-by: Anton Blanchard
    Signed-off-by: David S. Miller

    Anton Blanchard
     

31 Mar, 2011

1 commit


14 Jan, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
    hwrng: via_rng - Fix memory scribbling on some CPUs
    crypto: padlock - Move padlock.h into include/crypto
    hwrng: via_rng - Fix asm constraints
    crypto: n2 - use __devexit not __exit in n2_unregister_algs
    crypto: mark crypto workqueues CPU_INTENSIVE
    crypto: mv_cesa - dont return PTR_ERR() of wrong pointer
    crypto: ripemd - Set module author and update email address
    crypto: omap-sham - backlog handling fix
    crypto: gf128mul - Remove experimental tag
    crypto: af_alg - fix af_alg memory_allocated data type
    crypto: aesni-intel - Fixed build with binutils 2.16
    crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets
    net: Add missing lockdep class names for af_alg
    include: Install linux/if_alg.h for user-space crypto API
    crypto: omap-aes - checkpatch --file warning fixes
    crypto: omap-aes - initialize aes module once per request
    crypto: omap-aes - unnecessary code removed
    crypto: omap-aes - error handling implementation improved
    crypto: omap-aes - redundant locking is removed
    crypto: omap-aes - DMA initialization fixes for OMAP off mode
    ...

    Linus Torvalds
     

07 Jan, 2011

1 commit


19 Nov, 2010

1 commit

  • This patch adds the socket family/level macros for the yet-to-be-born
    AF_ALG family. The AF_ALG family provides the user-space interface
    for the kernel crypto API.

    Signed-off-by: Herbert Xu
    Acked-by: David S. Miller

    Herbert Xu
     

29 Oct, 2010

1 commit

  • This helps protect us from overflow issues down in the
    individual protocol sendmsg/recvmsg handlers. Once
    we hit INT_MAX we truncate out the rest of the iovec
    by setting the iov_len members to zero.

    This works because:

    1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
    writes are allowed and the application will just continue
    with another write to send the rest of the data.

    2) For datagram oriented sockets, where there must be a
    one-to-one correspondance between write() calls and
    packets on the wire, INT_MAX is going to be far larger
    than the packet size limit the protocol is going to
    check for and signal with -EMSGSIZE.

    Based upon a patch by Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Oct, 2010

1 commit


28 Sep, 2010

1 commit

  • Fixes kernel bugzilla #16603

    tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
    zero bytes, for example.

    There is also the problem higher up of how verify_iovec() works. It
    wants to prevent the total length from looking like an error return
    value.

    However it does this using 'int', but syscalls return 'long' (and
    thus signed 64-bit on 64-bit machines). So it could trigger
    false-positives on 64-bit as written. So fix it to use 'long'.

    Reported-by: Olaf Bonorden
    Reported-by: Daniel Büse
    Reported-by: Andrew Morton
    Signed-off-by: David S. Miller

    David S. Miller
     

17 Jun, 2010

1 commit

  • To keep the coming code clear and to allow both the sock
    code and the scm code to share the logic introduce a
    fuction to translate from struct cred to struct ucred.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

07 Apr, 2010

1 commit


31 Mar, 2010

1 commit


27 Mar, 2010

1 commit

  • Add new flag MSG_WAITFORONE for the recvmmsg() syscall.
    When this flag is specified for a blocking socket, recvmmsg()
    will only block until at least 1 packet is available. The
    default behavior is to block until all vlen packets are
    available. This flag has no effect on non-blocking sockets
    or when used in combination with MSG_DONTWAIT.

    Signed-off-by: Brandon L Black
    Acked-by: Ulrich Drepper
    Acked-by: Eric Dumazet
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Brandon L Black
     

29 Oct, 2009

1 commit

  • proto_ops->getname implies copying protocol specific data
    into storage unit (particulary to __kernel_sockaddr_storage).
    So when we implement new protocol support we should keep such
    a detail in mind (which is easy to forget about).

    Lets introduce DECLARE_SOCKADDR helper which check if
    storage unit is not overfowed at build time.

    Eventually inet_getname is switched to use DECLARE_SOCKADDR
    (to show example of usage).

    Signed-off-by: Cyrill Gorcunov
    Signed-off-by: David S. Miller

    Cyrill Gorcunov
     

13 Oct, 2009

1 commit

  • Meaning receive multiple messages, reducing the number of syscalls and
    net stack entry/exit operations.

    Next patches will introduce mechanisms where protocols that want to
    optimize this operation will provide an unlocked_recvmsg operation.

    This takes into account comments made by:

    . Paul Moore: sock_recvmsg is called only for the first datagram,
    sock_recvmsg_nosec is used for the rest.

    . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
    works in the same fashion as the ppoll one.

    If the underlying protocol returns a datagram with MSG_OOB set, this
    will make recvmmsg return right away with as many datagrams (+ the OOB
    one) it has received so far.

    . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
    datagrams and then recvmsg returns an error, recvmmsg will return
    the successfully received datagrams, store the error and return it
    in the next call.

    This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
    where we will be able to acquire the lock only at batch start and end, not at
    every underlying recvmsg call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

05 Oct, 2009

1 commit

  • The following user-space program fails to compile:

    #include
    #include
    int main() { return 0; }

    The reason is that tests __GLIBC__ to decide whether it
    should define various structures and macros that are now defined for
    user-space by , but __GLIBC__ is not defined if no libc
    headers have yet been included.

    It seems safe to drop support for libc 5 now.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Bastian Blank
    Signed-off-by: David S. Miller

    Ben Hutchings
     

09 Jun, 2009

1 commit


23 Apr, 2009

1 commit


21 Apr, 2009

2 commits

  • aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct
    iovec * and modifies the iovec. As a result, attempts to use io_submit
    to send packets to a tun device fail with weird errors such as EINVAL.

    Since tun is the only user of skb_copy_datagram_from_iovec, we can
    fix this simply by changing the later so that it does not
    touch the iovec passed to it.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • There's an skb_copy_datagram_iovec() to copy out of a paged skb,
    but it modifies the iovec, and does not support starting
    at an offset in the destination. We want both in tun.c, so let's
    add the function.

    It's a carbon copy of skb_copy_datagram_iovec() with enough changes to
    be annoying.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

27 Mar, 2009

1 commit

  • …el/git/tip/linux-2.6-tip

    * 'header-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    x86: headers cleanup - setup.h
    emu101k1.h: fix duplicate include of <linux/types.h>
    compiler-gcc4: conditionalize #error on __KERNEL__
    remove __KERNEL_STRICT_NAMES
    make netfilter use strict integer types
    make drm headers use strict integer types
    make MTD headers use strict integer types
    make most exported headers use strict integer types
    make exported headers use strict posix types
    unconditionally include asm/types.h from linux/types.h
    make linux/types.h as assembly safe
    Neither asm/types.h nor linux/types.h is required for arch/ia64/include/asm/fpu.h
    headers_check fix cleanup: linux/reiserfs_fs.h
    headers_check fix cleanup: linux/nubus.h
    headers_check fix cleanup: linux/coda_psdev.h
    headers_check fix: x86, setup.h
    headers_check fix: x86, prctl.h
    headers_check fix: linux/reinserfs_fs.h
    headers_check fix: linux/socket.h
    headers_check fix: linux/nubus.h
    ...

    Manually fix trivial conflicts in:
    include/linux/netfilter/xt_limit.h
    include/linux/netfilter/xt_statistic.h

    Linus Torvalds
     

27 Feb, 2009

1 commit


03 Feb, 2009

1 commit


06 Oct, 2008

1 commit


23 Sep, 2008

1 commit


27 Jul, 2008

1 commit


20 Jul, 2008

1 commit


29 Jan, 2008

2 commits


22 Oct, 2007

1 commit


17 Jul, 2007

1 commit

  • Part two in the O_CLOEXEC saga: adding support for file descriptors received
    through Unix domain sockets.

    The patch is once again pretty minimal, it introduces a new flag for recvmsg
    and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit
    is not used otherwise but the networking people will know better.

    This new flag is not recognized by recvfrom and recv. These functions cannot
    be used for that purpose and the asymmetry this introduces is not worse than
    the already existing MSG_CMSG_COMPAT situations.

    The patch must be applied on the patch which introduced O_CLOEXEC. It has to
    remove static from the new get_unused_fd_flags function but since scm.c cannot
    live in a module the function still hasn't to be exported.

    Here's a test program to make sure the code works. It's so much longer than
    the actual patch...

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #ifndef O_CLOEXEC
    # define O_CLOEXEC 02000000
    #endif
    #ifndef MSG_CMSG_CLOEXEC
    # define MSG_CMSG_CLOEXEC 0x40000000
    #endif

    int
    main (int argc, char *argv[])
    {
    if (argc > 1)
    {
    int fd = atol (argv[1]);
    printf ("child: fd = %d\n", fd);
    if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
    {
    puts ("file descriptor valid in child");
    return 1;
    }
    return 0;

    }

    struct sockaddr_un sun;
    strcpy (sun.sun_path, "./testsocket");
    sun.sun_family = AF_UNIX;

    char databuf[] = "hello";
    struct iovec iov[1];
    iov[0].iov_base = databuf;
    iov[0].iov_len = sizeof (databuf);

    union
    {
    struct cmsghdr hdr;
    char bytes[CMSG_SPACE (sizeof (int))];
    } buf;
    struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
    .msg_control = buf.bytes,
    .msg_controllen = sizeof (buf) };
    struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);

    cmsg->cmsg_level = SOL_SOCKET;
    cmsg->cmsg_type = SCM_RIGHTS;
    cmsg->cmsg_len = CMSG_LEN (sizeof (int));

    msg.msg_controllen = cmsg->cmsg_len;

    pid_t child = fork ();
    if (child == -1)
    error (1, errno, "fork");
    if (child == 0)
    {
    int sock = socket (PF_UNIX, SOCK_STREAM, 0);
    if (sock < 0)
    error (1, errno, "socket");

    if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
    error (1, errno, "bind");
    if (listen (sock, SOMAXCONN) < 0)
    error (1, errno, "listen");

    int conn = accept (sock, NULL, NULL);
    if (conn == -1)
    error (1, errno, "accept");

    *(int *) CMSG_DATA (cmsg) = sock;
    if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
    error (1, errno, "sendmsg");

    return 0;
    }

    /* For a test suite this should be more robust like a
    barrier in shared memory. */
    sleep (1);

    int sock = socket (PF_UNIX, SOCK_STREAM, 0);
    if (sock < 0)
    error (1, errno, "socket");

    if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
    error (1, errno, "connect");
    unlink (sun.sun_path);

    *(int *) CMSG_DATA (cmsg) = -1;

    if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
    error (1, errno, "recvmsg");

    int fd = *(int *) CMSG_DATA (cmsg);
    if (fd == -1)
    error (1, 0, "no descriptor received");

    char fdname[20];
    snprintf (fdname, sizeof (fdname), "%d", fd);
    execl ("/proc/self/exe", argv[0], fdname, NULL);
    puts ("execl failed");
    return 1;
    }

    [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Ulrich Drepper
    Cc: Ingo Molnar
    Cc: Michael Buesch
    Cc: Michael Kerrisk
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

11 Jul, 2007

1 commit

  • Add struct sockaddr_pppol2tp to carry L2TP-specific address
    information for the PPPoX (PPPoL2TP) socket. Unfortunately we can't
    use the union inside struct sockaddr_pppox because the L2TP-specific
    data is larger than the current size of the union and we must preserve
    the size of struct sockaddr_pppox for binary compatibility.

    Also add a PPPIOCGL2TPSTATS ioctl to allow userspace to obtain
    L2TP counters and state from the kernel.

    Add new if_pppol2tp.h header.

    [ Modified to use aligned_u64 in statistics structure -DaveM ]

    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    James Chapman
     

27 Apr, 2007

1 commit

  • Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve
    answers to AFS clients. KerberosIV security is fully supported. The patches
    and some example test programs can be found in:

    http://people.redhat.com/~dhowells/rxrpc/

    This will eventually replace the old implementation of kernel-only RxRPC
    currently resident in net/rxrpc/.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

01 Mar, 2007

1 commit

  • This reverts 57a87bb0720a5cf7a9ece49a8c8ed288398fd1bb.

    As H. Peter Anvin states, this change broke klibc and it's
    not very easy to fix things up without duplicating everything
    into userspace.

    In the longer term we should have a better solution to this
    problem, but for now let's unbreak things.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Feb, 2007

1 commit


09 Feb, 2007

1 commit


03 Dec, 2006

2 commits

  • Signed-off-by: Al Viro
    Signed-off-by: David S. Miller

    Al Viro
     
  • This is a revision of the previously submitted patch, which alters
    the way files are organized and compiled in the following manner:

    * UDP and UDP-Lite now use separate object files
    * source file dependencies resolved via header files
    net/ipv{4,6}/udp_impl.h
    * order of inclusion files in udp.c/udplite.c adapted
    accordingly

    [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)

    This patch adds support for UDP-Lite to the IPv4 stack, provided as an
    extension to the existing UDPv4 code:
    * generic routines are all located in net/ipv4/udp.c
    * UDP-Lite specific routines are in net/ipv4/udplite.c
    * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
    * shared API with extensions for partial checksum coverage

    [NET/IPv6]: Extension for UDP-Lite over IPv6

    It extends the existing UDPv6 code base with support for UDP-Lite
    in the same manner as per UDPv4. In particular,
    * UDPv6 generic and shared code is in net/ipv6/udp.c
    * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
    * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
    * support for IPV6_ADDRFORM
    * aligned the coding style of protocol initialisation with af_inet6.c
    * made the error handling in udpv6_queue_rcv_skb consistent;
    to return `-1' on error on all error cases
    * consolidation of shared code

    [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support

    The UDP-Lite patch further provides
    * API documentation for UDP-Lite
    * basic xfrm support
    * basic netfilter support for IPv4 and IPv6 (LOG target)

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

25 Apr, 2006

1 commit