29 Sep, 2011

1 commit

  • Since commit 7361c36c5224 (af_unix: Allow credentials to work across
    user and pid namespaces) af_unix performance dropped a lot.

    This is because we now take a reference on pid and cred in each write(),
    and release them in read(), usually done from another process,
    eventually from another cpu. This triggers false sharing.

    # Events: 154K cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... .................. .........................
    #
    10.40% hackbench [kernel.kallsyms] [k] put_pid
    8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
    7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
    6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock
    4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb
    4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns
    4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred
    2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm
    2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count
    1.75% hackbench [kernel.kallsyms] [k] fget_light
    1.51% hackbench [kernel.kallsyms] [k]
    __mutex_lock_interruptible_slowpath
    1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskb

    This patch includes SCM_CREDENTIALS information in a af_unix message/skb
    only if requested by the sender, [man 7 unix for details how to include
    ancillary data using sendmsg() system call]

    Note: This might break buggy applications that expected SCM_CREDENTIAL
    from an unaware write() system call, and receiver not using SO_PASSCRED
    socket option.

    If SOCK_PASSCRED is set on source or destination socket, we still
    include credentials for mere write() syscalls.

    Performance boost in hackbench : more than 50% gain on a 16 thread
    machine (2 quad-core cpus, 2 threads per core)

    hackbench 20 thread 2000

    4.228 sec instead of 9.102 sec

    Signed-off-by: Eric Dumazet
    Acked-by: Tim Chen
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Sep, 2011

1 commit


25 Aug, 2011

1 commit

  • Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
    allow credentials to work across pid namespaces for packets sent via
    UNIX sockets. However, the atomic reference counts on pid and
    credentials caused plenty of cache bouncing when there are numerous
    threads of the same pid sharing a UNIX socket. This patch mitigates the
    problem by eliminating extraneous reference counts on pid and
    credentials on both send and receive path of UNIX sockets. I found a 2x
    improvement in hackbench's threaded case.

    On the receive path in unix_dgram_recvmsg, currently there is an
    increment of reference count on pid and credentials in scm_set_cred.
    Then there are two decrement of the reference counts. Once in scm_recv
    and once when skb_free_datagram call skb->destructor function
    unix_destruct_scm. One pair of increment and decrement of ref count on
    pid and credentials can be eliminated from the receive path. Until we
    destroy the skb, we already set a reference when we created the skb on
    the send side.

    On the send path, there are two increments of ref count on pid and
    credentials, once in scm_send and once in unix_scm_to_skb. Then there
    is a decrement of the reference counts in scm_destroy's call to
    scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair
    of increment and decrement of the reference counts can be removed so we
    only need to increment the ref counts once.

    By incorporating these changes, for hackbench running on a 4 socket
    NHM-EX machine with 40 cores, the execution of hackbench on
    50 groups of 20 threads sped up by factor of 2.

    Hackbench command used for testing:
    ./hackbench 50 thread 2000

    Signed-off-by: Tim Chen
    Signed-off-by: David S. Miller

    Tim Chen
     

25 Nov, 2010

1 commit

  • Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
    halved. (commit f8d570a4 added two pointers in this structure)

    scm_fp_dup() should not copy whole structure (and trigger kmemcheck
    warnings), but only the used part. While we are at it, only allocate
    needed size.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Jun, 2010

2 commits

  • Start capturing not only the userspace pid, uid and gid values of the
    sending process but also the struct pid and struct cred of the sending
    process as well.

    This is in preparation for properly supporting SCM_CREDENTIALS for
    sockets that have different uid and/or pid namespaces at the different
    ends.

    Signed-off-by: Eric W. Biederman
    Acked-by: Serge E. Hallyn
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Reorder the fields in scm_cookie so they pack better on 64bit.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

04 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Jul, 2009

1 commit


14 Nov, 2008

2 commits

  • Conflicts:
    security/keys/internal.h
    security/keys/process_keys.c
    security/keys/request_key.c

    Fixed conflicts above by using the non 'tsk' versions.

    Signed-off-by: James Morris

    James Morris
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: netdev@vger.kernel.org
    Signed-off-by: James Morris

    David Howells
     

07 Nov, 2008

1 commit

  • __scm_destroy() walks the list of file descriptors in the scm_fp_list
    pointed to by the scm_cookie argument.

    Those, in turn, can close sockets and invoke __scm_destroy() again.

    There is nothing which limits how deeply this can occur.

    The idea for how to fix this is from Linus. Basically, we do all of
    the fput()s at the top level by collecting all of the scm_fp_list
    objects hit by an fput(). Inside of the initial __scm_destroy() we
    keep running the list until it is empty.

    Signed-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    David Miller
     

20 Oct, 2007

1 commit

  • This is the largest patch in the set. Make all (I hope) the places where
    the pid is shown to or get from user operate on the virtual pids.

    The idea is:
    - all in-kernel data structures must store either struct pid itself
    or the pid's global nr, obtained with pid_nr() call;
    - when seeking the task from kernel code with the stored id one
    should use find_task_by_pid() call that works with global pids;
    - when showing pid's numerical value to the user the virtual one
    should be used, but however when one shows task's pid outside this
    task's namespace the global one is to be used;
    - when getting the pid from userspace one need to consider this as
    the virtual one and use appropriate task/pid-searching functions.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: nuther build fix]
    [akpm@linux-foundation.org: yet nuther build fix]
    [akpm@linux-foundation.org: remove unneeded casts]
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

18 Jul, 2007

1 commit

  • The OPEN_MAX constant is an arbitrary number with no useful relation to
    anything. Nothing should be using it. SCM_MAX_FD is just an arbitrary
    constant and it should be clear that its value is chosen in net/scm.h
    and not actually derived from anything else meaningful in the system.

    Signed-off-by: Roland McGrath
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

03 Aug, 2006

1 commit

  • From: Catherine Zhang

    This patch implements a cleaner fix for the memory leak problem of the
    original unix datagram getpeersec patch. Instead of creating a
    security context each time a unix datagram is sent, we only create the
    security context when the receiver requests it.

    This new design requires modification of the current
    unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
    secid_to_secctx and release_secctx. The former retrieves the security
    context and the latter releases it. A hook is required for releasing
    the security context because it is up to the security module to decide
    how that's done. In the case of Selinux, it's a simple kfree
    operation.

    Acked-by: Stephen Smalley
    Signed-off-by: David S. Miller

    Catherine Zhang
     

30 Jun, 2006

1 commit

  • This patch implements an API whereby an application can determine the
    label of its peer's Unix datagram sockets via the auxiliary data mechanism of
    recvmsg.

    Patch purpose:

    This patch enables a security-aware application to retrieve the
    security context of the peer of a Unix datagram socket. The application
    can then use this security context to determine the security context for
    processing on behalf of the peer who sent the packet.

    Patch design and implementation:

    The design and implementation is very similar to the UDP case for INET
    sockets. Basically we build upon the existing Unix domain socket API for
    retrieving user credentials. Linux offers the API for obtaining user
    credentials via ancillary messages (i.e., out of band/control messages
    that are bundled together with a normal message). To retrieve the security
    context, the application first indicates to the kernel such desire by
    setting the SO_PASSSEC option via getsockopt. Then the application
    retrieves the security context using the auxiliary data mechanism.

    An example server application for Unix datagram socket should look like this:

    toggle = 1;
    toggle_len = sizeof(toggle);

    setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
    recvmsg(sockfd, &msg_hdr, 0);
    if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len cmsg_level == SOL_SOCKET &&
    cmsg_hdr->cmsg_type == SCM_SECURITY) {
    memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
    }

    sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
    a server socket to receive security context of the peer.

    Testing:

    We have tested the patch by setting up Unix datagram client and server
    applications. We verified that the server can retrieve the security context
    using the auxiliary data mechanism of recvmsg.

    Signed-off-by: Catherine Zhang
    Acked-by: Acked-by: James Morris
    Signed-off-by: David S. Miller

    Catherine Zhang
     

21 Mar, 2006

1 commit

  • Instead of doing a memset then initialization of the fields of the scm
    structure, just initialize all the members explicitly. Prevent reloading
    of current on x86 and x86-64 by storing the value in a local variable for
    subsequent dereferences. This is worth a ~7KB/s increase in af_unix
    bandwidth. Note that we avoid the issues surrounding potentially
    uninitialized members of the ucred structure by constructing a struct
    ucred instead of assigning the members individually, which forces the
    compiler to zero any padding.

    [ I modified the patch not to use the aggregate assignment since
    gcc-3.4.x and earlier cannot optimize that properly at all even
    though gcc-4.0.x and later can -DaveM ]

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds