15 Jan, 2011

1 commit


02 May, 2010

1 commit

  • sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
    need two atomic operations (and associated dirtying) per incoming
    packet.

    RCU conversion is pretty much needed :

    1) Add a new structure, called "struct socket_wq" to hold all fields
    that will need rcu_read_lock() protection (currently: a
    wait_queue_head_t and a struct fasync_struct pointer).

    [Future patch will add a list anchor for wakeup coalescing]

    2) Attach one of such structure to each "struct socket" created in
    sock_alloc_inode().

    3) Respect RCU grace period when freeing a "struct socket_wq"

    4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
    socket_wq"

    5) Change sk_sleep() function to use new sk->sk_wq instead of
    sk->sk_sleep

    6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
    a rcu_read_lock() section.

    7) Change all sk_has_sleeper() callers to :
    - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
    - Use wq_has_sleeper() to eventually wakeup tasks.
    - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

    8) sock_wake_async() is modified to use rcu protection as well.

    9) Exceptions :
    macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
    instead of dynamically allocated ones. They dont need rcu freeing.

    Some cleanups or followups are probably needed, (possible
    sk_callback_lock conversion to a spinlock for example...).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

06 Nov, 2009

1 commit

  • The generic __sock_create function has a kern argument which allows the
    security system to make decisions based on if a socket is being created by
    the kernel or by userspace. This patch passes that flag to the
    net_proto_family specific create function, so it can do the same thing.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

07 Oct, 2009

1 commit


01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Jul, 2009

1 commit

  • Adding memory barrier after the poll_wait function, paired with
    receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
    to wrap the memory barrier.

    Without the memory barrier, following race can happen.
    The race fires, when following code paths meet, and the tp->rcv_nxt
    and __add_wait_queue updates stay in CPU caches.

    CPU1 CPU2

    sys_select receive packet
    ... ...
    __add_wait_queue update tp->rcv_nxt
    ... ...
    tp->rcv_nxt check sock_def_readable
    ... {
    schedule ...
    if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
    wake_up_interruptible(sk->sk_sleep)
    ...
    }

    If there was no cache the code would work ok, since the wait_queue and
    rcv_nxt are opposit to each other.

    Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
    passed the tp->rcv_nxt check and sleeps, or will get the new value for
    tp->rcv_nxt and will return with new data mask.
    In both cases the process (CPU1) is being added to the wait queue, so the
    waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

    The bad case is when the __add_wait_queue changes done by CPU1 stay in its
    cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then
    endup calling schedule and sleep forever if there are no more data on the
    socket.

    Calls to poll_wait in following modules were ommited:
    net/bluetooth/af_bluetooth.c
    net/irda/af_irda.c
    net/irda/irnet/irnet_ppp.c
    net/mac80211/rc80211_pid_debugfs.c
    net/phonet/socket.c
    net/rds/af_rds.c
    net/rfkill/core.c
    net/sunrpc/cache.c
    net/sunrpc/rpc_pipe.c
    net/tipc/socket.c

    Signed-off-by: Jiri Olsa
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jiri Olsa
     

07 Feb, 2009

1 commit

  • Fix a potential NULL dereference bug during error handling in
    rxrpc_kernel_begin_call(), whereby rxrpc_put_transport() may be handed a NULL
    pointer.

    This was found with a code checker (http://repo.or.cz/w/smatch.git/).

    Reported-by: Dan Carpenter
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

31 Oct, 2008

1 commit


26 Jul, 2008

1 commit

  • Removes legacy reinvent-the-wheel type thing. The generic
    machinery integrates much better to automated debugging aids
    such as kerneloops.org (and others), and is unambiguous due to
    better naming. Non-intuively BUG_TRAP() is actually equal to
    WARN_ON() rather than BUG_ON() though some might actually be
    promoted to BUG_ON() but I left that to future.

    I could make at least one BUILD_BUG_ON conversion.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

16 Apr, 2008

1 commit


08 Feb, 2008

1 commit


29 Jan, 2008

1 commit

  • The sock_wake_async() performs a bit different actions
    depending on "how" argument. Unfortunately this argument
    ony has numerical magic values.

    I propose to give names to their constants to help people
    reading this function callers understand what's going on
    without looking into this function all the time.

    I suppose this is 2.6.25 material, but if it's not (or the
    naming seems poor/bad/awful), I can rework it against the
    current net-2.6 tree.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

01 Nov, 2007

1 commit

  • Finally, the zero_it argument can be completely removed from
    the callers and from the function prototype.

    Besides, fix the checkpatch.pl warnings about using the
    assignments inside if-s.

    This patch is rather big, and it is a part of the previous one.
    I splitted it wishing to make the patches more readable. Hope
    this particular split helped.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

17 Oct, 2007

1 commit

  • Make request_key() and co fundamentally asynchronous to make it easier for
    NFS to make use of them. There are now accessor functions that do
    asynchronous constructions, a wait function to wait for construction to
    complete, and a completion function for the key type to indicate completion
    of construction.

    Note that the construction queue is now gone. Instead, keys under
    construction are linked in to the appropriate keyring in advance, and that
    anyone encountering one must wait for it to be complete before they can use
    it. This is done automatically for userspace.

    The following auxiliary changes are also made:

    (1) Key type implementation stuff is split from linux/key.h into
    linux/key-type.h.

    (2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does
    not need to call key_instantiate_and_link() directly.

    (3) Adjust the debugging macros so that they're -Wformat checked even if
    they are disabled, and make it so they can be enabled simply by defining
    __KDEBUG to be consistent with other code of mine.

    (3) Documentation.

    [alan@lxorguk.ukuu.org.uk: keys: missing word in documentation]
    Signed-off-by: David Howells
    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

11 Oct, 2007

2 commits

  • This patch passes in the namespace a new socket should be created in
    and has the socket code do the appropriate reference counting. By
    virtue of this all socket create methods are touched. In addition
    the socket create methods are modified so that they will fail if
    you attempt to create a socket in a non-default network namespace.

    Failing if we attempt to create a socket outside of the default
    network namespace ensures that as we incrementally make the network stack
    network namespace aware we will not export functionality that someone
    has not audited and made certain is network namespace safe.
    Allowing us to partially enable network namespaces before all of the
    exotic protocols are supported.

    Any protocol layers I have missed will fail to compile because I now
    pass an extra parameter into the socket creation code.

    [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch makes /proc/net per network namespace. It modifies the global
    variables proc_net and proc_net_stat to be per network namespace.
    The proc_net file helpers are modified to take a network namespace argument,
    and all of their callers are fixed to pass &init_net for that argument.
    This ensures that all of the /proc/net files are only visible and
    usable in the initial network namespace until the code behind them
    has been updated to be handle multiple network namespaces.

    Making /proc/net per namespace is necessary as at least some files
    in /proc/net depend upon the set of network devices which is per
    network namespace, and even more files in /proc/net have contents
    that are relevant to a single network namespace.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

26 Jul, 2007

1 commit

  • This avoids use of the kernel-internal "xtime" variable directly outside
    of the actual time-related functions. Instead, use the helper functions
    that we already have available to us.

    This doesn't actually change any behaviour, but this will allow us to
    fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ
    (because much of the realtime information is maintained as separate
    offsets to 'xtime'), which has caused interfaces that use xtime directly
    to get a time that is out of sync with the real-time clock by up to a
    third of a second or so.

    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    john stultz
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

19 Jul, 2007

1 commit


27 Apr, 2007

2 commits

  • Add an interface to the AF_RXRPC module so that the AFS filesystem module can
    more easily make use of the services available. AFS still opens a socket but
    then uses the action functions in lieu of sendmsg() and registers an intercept
    functions to grab messages before they're queued on the socket Rx queue.

    This permits AFS (or whatever) to:

    (1) Avoid the overhead of using the recvmsg() call.

    (2) Use different keys directly on individual client calls on one socket
    rather than having to open a whole slew of sockets, one for each key it
    might want to use.

    (3) Avoid calling request_key() at the point of issue of a call or opening of
    a socket. This is done instead by AFS at the point of open(), unlink() or
    other VFS operation and the key handed through.

    (4) Request the use of something other than GFP_KERNEL to allocate memory.

    Furthermore:

    (*) The socket buffer markings used by RxRPC are made available for AFS so
    that it can interpret the cooked RxRPC messages itself.

    (*) rxgen (un)marshalling abort codes are made available.

    The following documentation for the kernel interface is added to
    Documentation/networking/rxrpc.txt:

    =========================
    AF_RXRPC KERNEL INTERFACE
    =========================

    The AF_RXRPC module also provides an interface for use by in-kernel utilities
    such as the AFS filesystem. This permits such a utility to:

    (1) Use different keys directly on individual client calls on one socket
    rather than having to open a whole slew of sockets, one for each key it
    might want to use.

    (2) Avoid having RxRPC call request_key() at the point of issue of a call or
    opening of a socket. Instead the utility is responsible for requesting a
    key at the appropriate point. AFS, for instance, would do this during VFS
    operations such as open() or unlink(). The key is then handed through
    when the call is initiated.

    (3) Request the use of something other than GFP_KERNEL to allocate memory.

    (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be
    intercepted before they get put into the socket Rx queue and the socket
    buffers manipulated directly.

    To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket,
    bind an addess as appropriate and listen if it's to be a server socket, but
    then it passes this to the kernel interface functions.

    The kernel interface functions are as follows:

    (*) Begin a new client call.

    struct rxrpc_call *
    rxrpc_kernel_begin_call(struct socket *sock,
    struct sockaddr_rxrpc *srx,
    struct key *key,
    unsigned long user_call_ID,
    gfp_t gfp);

    This allocates the infrastructure to make a new RxRPC call and assigns
    call and connection numbers. The call will be made on the UDP port that
    the socket is bound to. The call will go to the destination address of a
    connected client socket unless an alternative is supplied (srx is
    non-NULL).

    If a key is supplied then this will be used to secure the call instead of
    the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls
    secured in this way will still share connections if at all possible.

    The user_call_ID is equivalent to that supplied to sendmsg() in the
    control data buffer. It is entirely feasible to use this to point to a
    kernel data structure.

    If this function is successful, an opaque reference to the RxRPC call is
    returned. The caller now holds a reference on this and it must be
    properly ended.

    (*) End a client call.

    void rxrpc_kernel_end_call(struct rxrpc_call *call);

    This is used to end a previously begun call. The user_call_ID is expunged
    from AF_RXRPC's knowledge and will not be seen again in association with
    the specified call.

    (*) Send data through a call.

    int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg,
    size_t len);

    This is used to supply either the request part of a client call or the
    reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the
    data buffers to be used. msg_iov may not be NULL and must point
    exclusively to in-kernel virtual addresses. msg.msg_flags may be given
    MSG_MORE if there will be subsequent data sends for this call.

    The msg must not specify a destination address, control data or any flags
    other than MSG_MORE. len is the total amount of data to transmit.

    (*) Abort a call.

    void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code);

    This is used to abort a call if it's still in an abortable state. The
    abort code specified will be placed in the ABORT message sent.

    (*) Intercept received RxRPC messages.

    typedef void (*rxrpc_interceptor_t)(struct sock *sk,
    unsigned long user_call_ID,
    struct sk_buff *skb);

    void
    rxrpc_kernel_intercept_rx_messages(struct socket *sock,
    rxrpc_interceptor_t interceptor);

    This installs an interceptor function on the specified AF_RXRPC socket.
    All messages that would otherwise wind up in the socket's Rx queue are
    then diverted to this function. Note that care must be taken to process
    the messages in the right order to maintain DATA message sequentiality.

    The interceptor function itself is provided with the address of the socket
    and handling the incoming message, the ID assigned by the kernel utility
    to the call and the socket buffer containing the message.

    The skb->mark field indicates the type of message:

    MARK MEANING
    =============================== =======================================
    RXRPC_SKB_MARK_DATA Data message
    RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call
    RXRPC_SKB_MARK_BUSY Client call rejected as server busy
    RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer
    RXRPC_SKB_MARK_NET_ERROR Network error detected
    RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered
    RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance

    The remote abort message can be probed with rxrpc_kernel_get_abort_code().
    The two error messages can be probed with rxrpc_kernel_get_error_number().
    A new call can be accepted with rxrpc_kernel_accept_call().

    Data messages can have their contents extracted with the usual bunch of
    socket buffer manipulation functions. A data message can be determined to
    be the last one in a sequence with rxrpc_kernel_is_data_last(). When a
    data message has been used up, rxrpc_kernel_data_delivered() should be
    called on it..

    Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose
    of. It is possible to get extra refs on all types of message for later
    freeing, but this may pin the state of a call until the message is finally
    freed.

    (*) Accept an incoming call.

    struct rxrpc_call *
    rxrpc_kernel_accept_call(struct socket *sock,
    unsigned long user_call_ID);

    This is used to accept an incoming call and to assign it a call ID. This
    function is similar to rxrpc_kernel_begin_call() and calls accepted must
    be ended in the same way.

    If this function is successful, an opaque reference to the RxRPC call is
    returned. The caller now holds a reference on this and it must be
    properly ended.

    (*) Reject an incoming call.

    int rxrpc_kernel_reject_call(struct socket *sock);

    This is used to reject the first incoming call on the socket's queue with
    a BUSY message. -ENODATA is returned if there were no incoming calls.
    Other errors may be returned if the call had been aborted (-ECONNABORTED)
    or had timed out (-ETIME).

    (*) Record the delivery of a data message and free it.

    void rxrpc_kernel_data_delivered(struct sk_buff *skb);

    This is used to record a data message as having been delivered and to
    update the ACK state for the call. The socket buffer will be freed.

    (*) Free a message.

    void rxrpc_kernel_free_skb(struct sk_buff *skb);

    This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC
    socket.

    (*) Determine if a data message is the last one on a call.

    bool rxrpc_kernel_is_data_last(struct sk_buff *skb);

    This is used to determine if a socket buffer holds the last data message
    to be received for a call (true will be returned if it does, false
    if not).

    The data message will be part of the reply on a client call and the
    request on an incoming call. In the latter case there will be more
    messages, but in the former case there will not.

    (*) Get the abort code from an abort message.

    u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb);

    This is used to extract the abort code from a remote abort message.

    (*) Get the error number from a local or network error message.

    int rxrpc_kernel_get_error_number(struct sk_buff *skb);

    This is used to extract the error number from a message indicating either
    a local error occurred or a network error occurred.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve
    answers to AFS clients. KerberosIV security is fully supported. The patches
    and some example test programs can be found in:

    http://people.redhat.com/~dhowells/rxrpc/

    This will eventually replace the old implementation of kernel-only RxRPC
    currently resident in net/rxrpc/.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells