19 Jun, 2005

40 commits

  • The simplicity of the fifo qdisc allows several qdisc operations to be
    redirected to the relevant queue management function directly. Saves
    a lot of code lines and gives the pfifo a byte based backlog.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Implements an interface to be used by leaf qdiscs maintaining an internal
    skb queue. The interface maintains a backlog in bytes additionaly
    to the skb_queue_len() maintained by the queue itself. Relevant statistics
    get incremented automatically. Every function comes in two variants, one
    assuming Qdisc->q is used as queue and the second taking a sk_buff_head
    as argument. Be aware that, if you use multiple queues, you still have to
    maintain the Qdisc->q.qlen counter yourself.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This patch replaces the spin_lock_irqsave call on the receive queue
    lock in SCTP with spin_lock_bh. Despite the proliferation of
    spin_lock_irqsave calls in this stack, it is only entered from the
    IPv4/IPv6 stack and user space. That is, it is never entered from
    hardirq context.

    The call in question is only called from recvmsg which means that
    IRQs aren't disabled. Therefore it is safe to replace it with
    spin_lock_bh.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • In light of my recent patch to net/ipv4/udp.c that replaced the
    spin_lock_irq calls on the receive queue lock with spin_lock_bh,
    here is a similar patch for all other occurences of spin_lock_irq
    on receive/error queue locks in IPv4 and IPv6.

    In these stacks, we know that they can only be entered from user
    or softirq context. Therefore it's safe to disable BH only.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch ensures that netlink events created as a result of programns
    using ioctls (such as ifconfig, route etc) contains the correct PID of
    those events.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • This patch converts "unsigned flags" to use more explict types like u16
    instead and incrementally introduces NLMSG_NEW().

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • This patch was supposed to be part of the neighbour tables related
    patchset but apparently got lost.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This patch changes the format of the XFRM_MSG_DELSA and
    XFRM_MSG_DELPOLICY notification so that the main message
    sent is of the same format as that received by the kernel
    if the original message was via netlink. This also means
    that we won't lose the byid information carried in km_event.

    Since this user interface is introduced by Jamal's patch
    we can still afford to change it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch rectifies some rtnetlink message builders that derive the
    flags from the pid. It is now explicit like the other cases
    which get it right. Also fixes half a dozen dumpers which did not
    set NLM_F_MULTI at all.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes
    a flags argument. NLMSG_PUT stays there for compatibility but now
    calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to
    NLMSG_NEW_ANSWER which now also takes a flags argument.

    Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER
    and fixes the two direct users of __nlmsg_put to either provide
    the flags or use NLMSG_NEW(_ANSWER).

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Fixes dsmark to do all configuration sanity checks first and
    only apply the changes if all of them can be applied without
    any errors. Also fixes the weak sanity checks for DSMARK_VALUE
    and DSMASK_MASK.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Only skb_trim() if 'start' is non-NULL.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
    NLM_F_DUMP flag set. Every neighbour table configuration is
    spread over multiple messages to avoid running into message
    size limits on systems with many interfaces. The first message
    in the sequence transports all not device specific data such as
    statistics, configuration, and the default parameter set.
    This message is followed by 0..n messages carrying device
    specific parameter sets.

    Although the ordering should be sufficient, NDTA_NAME can be
    used to identify sequences. The initial message can be identified
    by checking for NDTA_CONFIG. The device specific messages do
    not contain this TLV but have NDTPA_IFINDEX set to the
    corresponding interface index.

    To change neighbour table attributes, send RTM_SETNEIGHTBL
    with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
    NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
    otherwise. Device specific parameter sets can be changed by
    setting NDTPA_IFINDEX to the interface index of the corresponding
    device.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • RTA_GET_U(32|64)(tlv)
    Assumes TLV is a u32/u64 field and returns its value.

    RTA_GET_[M]SECS(tlv)
    Assumes TLV is a u64 and transports jiffies converted
    to seconds or milliseconds and returns its value.

    RTA_PUT_U(32|64)(skb, type, value)
    Appends %value as fixed u32/u64 to %skb as TLV %type.

    RTA_PUT_[M]SECS(skb, type, jiffies)
    Converts %jiffies to secs/msecs and appends it as u64
    to %skb as TLV %type.

    RTA_PUT_STRING(skb, type, string)
    Appends %NUL terminated %string to %skb as TLV %type.

    RTA_NEST(skb, type)
    Starts a nested TLV %type and returns the nesting handle.

    RTA_NEST_END(skb, nesting_handle)
    Finishes the nested TLV %nesting_handle, must be called
    symmetric to RTA_NEST(). Returns skb->len

    RTA_NEST_CANCEL(skb, nesting_handle)
    Cancel the nested TLV %nesting_handle and trim nested TLV
    from skb again, returns -1.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • NLMSG_PUT_ANSWER(skb, nlcb, type, length)
    Start a new netlink message as answer to a request,
    returns the message header.

    NLMSG_END(skb, nlh)
    End a netlink message, fixes total message length,
    returns skb->len.

    NLMSG_CANCEL(skb, nlh)
    Cancel the building process and trim whole message
    from skb again, returns -1.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This fixes the CONFIG_INET=n build failure noticed
    by Andrew Morton.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This chunks out the accept_queue and tcp_listen_opt code and moves
    them to net/core/request_sock.c and include/net/request_sock.h, to
    make it useful for other transport protocols, DCCP being the first one
    to use it.

    Next patches will rename tcp_listen_opt to accept_sock and remove the
    inline tcp functions that just call a reqsk_queue_ function.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Ok, this one just renames some stuff to have a better namespace and to
    dissassociate it from TCP:

    struct open_request -> struct request_sock
    tcp_openreq_alloc -> reqsk_alloc
    tcp_openreq_free -> reqsk_free
    tcp_openreq_fastfree -> __reqsk_free

    With this most of the infrastructure closely resembles a struct
    sock methods subset.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Kept this first changeset minimal, without changing existing names to
    ease peer review.

    Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
    has two new members:

    ->slab, that replaces tcp_openreq_cachep
    ->obj_size, to inform the size of the openreq descendant for
    a specific protocol

    The protocol specific fields in struct open_request were moved to a
    class hierarchy, with the things that are common to all connection
    oriented PF_INET protocols in struct inet_request_sock, the TCP ones
    in tcp_request_sock, that is an inet_request_sock, that is an
    open_request.

    I.e. this uses the same approach used for the struct sock class
    hierarchy, with sk_prot indicating if the protocol wants to use the
    open_request infrastructure by filling in sk_prot->rsk_prot with an
    or_calltable.

    Results? Performance is improved and TCP v4 now uses only 64 bytes per
    open request minisock, down from 96 without this patch :-)

    Next changeset will rename some of the structs, fields and functions
    mentioned above, struct or_calltable is way unclear, better name it
    struct request_sock_ops, s/struct open_request/struct request_sock/g,
    etc.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This is for use with slab users that pass a dynamically allocated slab name in
    kmem_cache_create, so that before destroying the slab one can retrieve the name
    and free its memory.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Small fixup to use netlink macros instead of hardcoding.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • Herbert Xu wrote:
    > @@ -1254,6 +1326,7 @@ static int pfkey_add(struct sock *sk, st
    > if (IS_ERR(x))
    > return PTR_ERR(x);
    >
    > + xfrm_state_hold(x);

    This introduces a leak when xfrm_state_add()/xfrm_state_update()
    fail. We hold two references (one from xfrm_state_alloc(), one
    from xfrm_state_hold()), but only drop one. We need to take the
    reference because the reference from xfrm_state_alloc() can
    be dropped by __xfrm_state_delete(), so the fix is to drop both
    references on error. Same problem in xfrm_user.c.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • This patch removes XFRM_SAP_* and converts them over to XFRM_MSG_*.
    The netlink interface is meant to map directly onto the underlying
    xfrm subsystem. Therefore rather than using a new independent
    representation for the events we can simply use the existing ones
    from xfrm_user.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • This patch fixes policy deletion in xfrm_user so that it sets
    km_event.data.byid. This puts xfrm_user on par with what af_key
    does in this case.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • This patch turns km_event.data into a union. This makes code that
    uses it clearer.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • This patch adjusts the SA state conversion in af_key such that
    XFRM_STATE_ERROR/XFRM_STATE_DEAD will be converted to SADB_STATE_DEAD
    instead of SADB_STATE_DYING.

    According to RFC 2367, SADB_STATE_DYING SAs can be turned into
    mature ones through updating their lifetime settings. Since SAs
    which are in the states XFRM_STATE_ERROR/XFRM_STATE_DEAD cannot
    be resurrected, this value is unsuitable.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • This patch ensures that the hard state/policy expire notifications are
    only sent when the state/policy is successfully removed from their
    respective tables.

    As it is, it's possible for a state/policy to both expire through
    reaching a hard limit, as well as being deleted by the user.

    Note that this behaviour isn't actually forbidden by RFC 2367.
    However, it is a quality of implementation issue.

    As an added bonus, the restructuring in this patch will help
    eventually in moving the expire notifications from softirq
    context into process context, thus improving their reliability.

    One important side-effect from this change is that SAs reaching
    their hard byte/packet limits are now deleted immediately, just
    like SAs that have reached their hard time limits.

    Previously they were announced immediately but only deleted after
    30 seconds.

    This is bad because it prevents the system from issuing an ACQUIRE
    command until the existing state was deleted by the user or expires
    after the time is up.

    In the scenario where the expire notification was lost this introduces
    a 30 second delay into the system for no good reason.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • Heres the final patch.
    What this patch provides

    - netlink xfrm events
    - ability to have events generated by netlink propagated to pfkey
    and vice versa.
    - fixes the acquire lets-be-happy-with-one-success issue

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: Herbert Xu

    Jamal Hadi Salim
     
  • Linus Torvalds
     
  • Linus Torvalds
     
  • Martin can maintain the DocBook system for us.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Waitz
     
  • When significant delays happen during boot (e.g. with a kernel debugger,
    but the problem has also seen in other cases) the timeout for blanking the
    console may trigger, but the work scheduler may not have been initialized,
    yet. schedule_work() will oops over the null keventd_wq.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • …git/jgarzik/libata-dev

    Linus Torvalds