09 May, 2007

1 commit


26 Apr, 2007

1 commit

  • For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
    later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
    64bit land while possibly keeping it as a pointer on 32bit.

    This one touches just the most simple cases:

    skb->h.raw = skb->data;
    skb->h.raw = {skb_push|[__]skb_pull}()

    The next ones will handle the slightly more "complex" cases.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

07 Mar, 2007

1 commit

  • This reverts two changes:

    8488df894d05d6fa41c2bd298c335f944bb0e401
    248f06726e866942b3d8ca8f411f9067713b7ff8

    A backlog value of N really does mean allow "N + 1" connections
    to queue to a listening socket. This allows one to specify
    "0" as the backlog and still get 1 connection.

    Noticed by Gerrit Renker and Rick Jones.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Mar, 2007

1 commit


15 Feb, 2007

2 commits

  • The semantic effect of insert_at_head is that it would allow new registered
    sysctl entries to override existing sysctl entries of the same name. Which is
    pain for caching and the proc interface never implemented.

    I have done an audit and discovered that none of the current users of
    register_sysctl care as (excpet for directories) they do not register
    duplicate sysctl entries.

    So this patch simply removes the support for overriding existing entries in
    the sys_sysctl interface since no one uses it or cares and it makes future
    enhancments harder.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ralf Baechle
    Acked-by: Martin Schwidefsky
    Cc: Russell King
    Cc: David Howells
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Andi Kleen
    Cc: Jens Axboe
    Cc: Corey Minyard
    Cc: Neil Brown
    Cc: "John W. Linville"
    Cc: James Bottomley
    Cc: Jan Kara
    Cc: Trond Myklebust
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

13 Feb, 2007

1 commit

  • Many struct file_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

11 Feb, 2007

1 commit


09 Dec, 2006

1 commit


03 Dec, 2006

1 commit


23 Sep, 2006

2 commits


03 Aug, 2006

1 commit

  • From: Catherine Zhang

    This patch implements a cleaner fix for the memory leak problem of the
    original unix datagram getpeersec patch. Instead of creating a
    security context each time a unix datagram is sent, we only create the
    security context when the receiver requests it.

    This new design requires modification of the current
    unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
    secid_to_secctx and release_secctx. The former retrieves the security
    context and the latter releases it. A hook is required for releasing
    the security context because it is up to the security module to decide
    how that's done. In the case of Selinux, it's a simple kfree
    operation.

    Acked-by: Stephen Smalley
    Signed-off-by: David S. Miller

    Catherine Zhang
     

22 Jul, 2006

1 commit


04 Jul, 2006

2 commits

  • The unix_get_peersec_dgram() stub should have been inlined so that it
    disappears.

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Andrew Morton
     
  • Teach special (recursive) locking code to the lock validator. Also splits
    af_unix's sk_receive_queue.lock class from the other networking skb-queue
    locks. Has no effect on non-lockdep kernels.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

01 Jul, 2006

1 commit


30 Jun, 2006

1 commit

  • This patch implements an API whereby an application can determine the
    label of its peer's Unix datagram sockets via the auxiliary data mechanism of
    recvmsg.

    Patch purpose:

    This patch enables a security-aware application to retrieve the
    security context of the peer of a Unix datagram socket. The application
    can then use this security context to determine the security context for
    processing on behalf of the peer who sent the packet.

    Patch design and implementation:

    The design and implementation is very similar to the UDP case for INET
    sockets. Basically we build upon the existing Unix domain socket API for
    retrieving user credentials. Linux offers the API for obtaining user
    credentials via ancillary messages (i.e., out of band/control messages
    that are bundled together with a normal message). To retrieve the security
    context, the application first indicates to the kernel such desire by
    setting the SO_PASSSEC option via getsockopt. Then the application
    retrieves the security context using the auxiliary data mechanism.

    An example server application for Unix datagram socket should look like this:

    toggle = 1;
    toggle_len = sizeof(toggle);

    setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
    recvmsg(sockfd, &msg_hdr, 0);
    if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len cmsg_level == SOL_SOCKET &&
    cmsg_hdr->cmsg_type == SCM_SECURITY) {
    memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
    }

    sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
    a server socket to receive security context of the peer.

    Testing:

    We have tested the patch by setting up Unix datagram client and server
    applications. We verified that the server can retrieve the security context
    using the auxiliary data mechanism of recvmsg.

    Signed-off-by: Catherine Zhang
    Acked-by: Acked-by: James Morris
    Signed-off-by: David S. Miller

    Catherine Zhang
     

26 Mar, 2006

1 commit

  • Implement the half-closed devices notifiation, by adding a new POLLRDHUP
    (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the
    existing POLLHUP handling, that does not report correctly half-closed
    devices, was feared to be changed, this implementation leaves the current
    POLLHUP reporting unchanged and simply add a new bit that is set in the few
    places where it makes sense. The same thing was discussed and conceptually
    agreed quite some time ago:

    http://lkml.org/lkml/2003/7/12/116

    Since this new event bit is added to the existing Linux poll infrastruture,
    even the existing poll/select system calls will be able to use it. As far
    as the existing POLLHUP handling, the patch leaves it as is. The
    pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
    archs and sets the bit in the six relevant files. The other attached diff
    is the simple change required to sys/epoll.h to add the EPOLLRDHUP
    definition.

    There is "a stupid program" to test POLLRDHUP delivery here:

    http://www.xmailserver.org/pollrdhup-test.c

    It tests poll(2), but since the delivery is same epoll(2) will work equally.

    Signed-off-by: Davide Libenzi
    Cc: "David S. Miller"
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

21 Mar, 2006

3 commits

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Ingo Molnar
     
  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Arjan van de Ven
     
  • The patch below replaces a divide by 2 with a shift -- sk_sndbuf is an
    integer, so gcc emits an idiv, which takes 10x longer than a shift by 1.
    This improves af_unix bandwidth by ~6-10K/s. Also, tidy up the comment
    to fit in 80 columns while we're at it.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

09 Mar, 2006

1 commit

  • I have benchmarked this on an x86_64 NUMA system and see no significant
    performance difference on kernbench. Tested on both x86_64 and powerpc.

    The way we do file struct accounting is not very suitable for batched
    freeing. For scalability reasons, file accounting was
    constructor/destructor based. This meant that nr_files was decremented
    only when the object was removed from the slab cache. This is susceptible
    to slab fragmentation. With RCU based file structure, consequent batched
    freeing and a test program like Serge's, we just speed this up and end up
    with a very fragmented slab -

    llm22:~ # cat /proc/sys/fs/file-nr
    587730 0 758844

    At the same time, I see only a 2000+ objects in filp cache. The following
    patch I fixes this problem.

    This patch changes the file counting by removing the filp_count_lock.
    Instead we use a separate percpu counter, nr_files, for now and all
    accesses to it are through get_nr_files() api. In the sysctl handler for
    nr_files, we populate files_stat.nr_files before returning to user.

    Counting files as an when they are created and destroyed (as opposed to
    inside slab) allows us to correctly count open files with RCU.

    Signed-off-by: Dipankar Sarma
    Cc: "Paul E. McKenney"
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

10 Jan, 2006

1 commit


04 Jan, 2006

5 commits

  • Currently all network protocols need to call dev_ioctl as the default
    fallback in their ioctl implementations. This patch adds a fallback
    to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
    This way all the procotol ioctl handlers can be simplified and we don't
    need to export dev_ioctl.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • From: Benjamin LaHaise

    In af_unix, a rwlock is used to protect internal state. At least on my
    P4 with HT it is faster to use a spinlock due to the simpler memory
    barrier used to unlock. This patch raises bw_unix to ~690K/s.

    Signed-off-by: David S. Miller

    Benjamin LaHaise
     
  • I noticed that some of 'struct proto_ops' used in the kernel may share
    a cache line used by locks or other heavily modified data. (default
    linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
    least)

    This patch makes sure a 'struct proto_ops' can be declared as const,
    so that all cpus can share all parts of it without false sharing.

    This is not mandatory : a driver can still use a read/write structure
    if it needs to (and eventually a __read_mostly)

    I made a global stubstitute to change all existing occurences to make
    them const.

    This should reduce the possibility of false sharing on SMP, and
    speedup some socket system calls.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This lock is actually taken mostly as a writer,
    so using a rwlock actually just makes performance
    worse especially on chips like the Intel P4.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • AF_UNIX stream socket performance on P4 CPUs tends to suffer due to a
    lot of pipeline flushes from atomic operations. The patch below
    removes the sock_hold() and sock_put() in unix_stream_sendmsg(). This
    should be safe as the socket still holds a reference to its peer which
    is only released after the file descriptor's final user invokes
    unix_release_sock(). The only consideration is that we must add a
    memory barrier before setting the peer initially.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

09 Nov, 2005

1 commit

  • Most permission() calls have a struct nameidata * available. This helper
    takes that as an argument and thus makes sure we pass it down for lookup
    intents and prepares for per-mount read-only support where we need a struct
    vfsmount for checking whether a file is writeable.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

30 Aug, 2005

3 commits

  • Of this type, mostly:

    CHECK net/ipv6/netfilter.c
    net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
    net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Lots of places just needs the states, not even linux/tcp.h, where this
    enum was, needs it.

    This speeds up development of the refactorings as less sources are
    rebuilt when things get moved from net/tcp.h.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Remove the "list" member of struct sk_buff, as it is entirely
    redundant. All SKB list removal callers know which list the
    SKB is on, so storing this in sk_buff does nothing other than
    taking up some space.

    Two tricky bits were SCTP, which I took care of, and two ATM
    drivers which Francois Romieu fixed
    up.

    Signed-off-by: David S. Miller
    Signed-off-by: Francois Romieu

    David S. Miller
     

12 Jul, 2005

1 commit

  • Move the protocol specific config options out to the specific protocols.
    With this change net/Kconfig now starts to become readable and serve as a
    good basis for further re-structuring.

    The menu structure is left almost intact, except that indention is
    fixed in most cases. Most visible are the INET changes where several
    "depends on INET" are replaced with a single ifdef INET / endif pair.

    Several new files were created to accomplish this change - they are
    small but serve the purpose that config options are now distributed
    out where they belongs.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: David S. Miller

    Sam Ravnborg
     

09 Jul, 2005

1 commit

  • This is part of the grand scheme to eliminate the qlen
    member of skb_queue_head, and subsequently remove the
    'list' member of sk_buff.

    Most users of skb_queue_len() want to know if the queue is
    empty or not, and that's trivially done with skb_queue_empty()
    which doesn't use the skb_queue_head->qlen member and instead
    uses the queue list emptyness as the test.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 May, 2005

1 commit

  • currently it opencodes it, but that's in the way of chaning the
    lookup_hash interface.

    I'd prefer to disallow modular af_unix over exporting lookup_create,
    but I'll leave that to you.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

26 Apr, 2005

1 commit

  • A lot of places in there are including major.h for no reason whatsoever.
    Removed. And yes, it still builds.

    The history of that stuff is often amusing. E.g. for net/core/sock.c
    the story looks so, as far as I've been able to reconstruct it: we used
    to need major.h in net/socket.c circa 1.1.early. In 1.1.13 that need
    had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket",
    &net_fops) in sock_init(). Include had not. When 1.2 -> 1.3 reorg of
    net/* had moved a lot of stuff from net/socket.c to net/core/sock.c,
    this crap had followed...

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds