11 Oct, 2007

35 commits

  • This patch make processing netlink user -> kernel messages synchronious.
    This change was inspired by the talk with Alexey Kuznetsov about current
    netlink messages processing. He says that he was badly wrong when introduced
    asynchronious user -> kernel communication.

    The call netlink_unicast is the only path to send message to the kernel
    netlink socket. But, unfortunately, it is also used to send data to the
    user.

    Before this change the user message has been attached to the socket queue
    and sk->sk_data_ready was called. The process has been blocked until all
    pending messages were processed. The bad thing is that this processing
    may occur in the arbitrary process context.

    This patch changes nlk->data_ready callback to get 1 skb and force packet
    processing right in the netlink_unicast.

    Kernel -> user path in netlink_unicast remains untouched.

    EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
    drop, but the process remains in the cycle until the message will be fully
    processed. So, there is no need to use this kludges now.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • This patch releases the lock on the state before calling x->type->output.
    It also adds the lock to the spots where they're currently needed.

    Most of those places (all except mip6) are expected to disappear with
    async crypto.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds locking so that when we're copying non-atomic fields such as
    life-time or coaddr to user-space we don't get a partial result.

    For af_key I've changed every instance of pfkey_xfrm_state2msg apart from
    expiration notification to include the keys and life-times. This is in-line
    with XFRM behaviour.

    The actual cases affected are:

    * pfkey_getspi: No change as we don't have any keys to copy.
    * key_notify_sa:
    + ADD/UPD: This wouldn't work otherwise.
    + DEL: It can't hurt.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Here's a good example of code duplication leading to code rot. The
    notification patch did its own netlink message creation for xfrm states.
    It duplicated code that was already in dump_one_state. Guess what, the
    next time (and the time after) when someone updated dump_one_state the
    notification path got zilch.

    This patch moves that code from dump_one_state to copy_to_user_state_extra
    and uses it in xfrm_notify_sa too. Unfortunately whoever updates this
    still needs to update xfrm_sa_len since the notification path wants to
    know the exact size for allocation.

    At least I've added a comment saying so and if someone still forgest, we'll
    have a WARN_ON telling us so.

    I also changed the security size calculation to use xfrm_user_sec_ctx since
    that's what we actually put into the skb. However it makes no practical
    difference since it has the same size as xfrm_sec_ctx.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch moves some common code that conceptually belongs to the xfrm core
    from af_key/xfrm_user into xfrm_alloc_spi.

    In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
    Previously it also protected the construction of the response PF_KEY/XFRM
    messages to user-space. This is inconsistent as other identical constructions
    are not protected by the state lock. This is bad because they in fact should
    be protected but only in certain spots (so as not to hold the lock for too
    long which may cause packet drops).

    The SPI byte order conversion has also been moved.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • There is no point in waking people up when creating/updating larval states
    because they'll just go back to sleep again as larval states by definition
    cannot be found by xfrm_state_find.

    We should only wake them up when the larvals mature or die.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Current the x->mode->output functions store the IPv6 nh pointer in the
    skb network header. This is inconvenient because the network header then
    has to be fixed up before the packet can leave the IPsec stack. The mac
    header field is unused on output so we can use that to store this instead.

    This patch does that and removes the network header fix-up in xfrm_output.

    It also uses ipv6_hdr where appropriate in the x->type->output functions.

    There is also a minor clean-up in esp4 to make it use the same code as
    esp6 to help any subsequent effort to merge the two.

    Lastly it kills two redundant skb_set_* statements in BEET that were
    simply copied over from transport mode.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Constructs of the form

    xfrm_state_hold(x);
    foo(x);
    xfrm_state_put(x);

    tend to be broken because foo is either synchronous where this is totally
    unnecessary or if foo is asynchronous then the reference count is in the
    wrong spot.

    In the case of xfrm_secpath_reject, the function is synchronous and therefore
    we should just kill the reference count.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The lastused update check in xfrm_output can be done just as well in
    the mode output function which is specific to RO.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that the only callers of xfrm_replay_notify are in xfrm, we can remove
    the export.

    This patch also removes xfrm_aevent_doreplay since it's now called in just
    one spot.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The replay counter is one of only two remaining things in the output code
    that requires a lock on the xfrm state (the other being the crypto). This
    patch moves it into the generic xfrm_output so we can remove the lock from
    the transforms themselves.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The functions xfrm_state_check and xfrm_state_check_space are only used by
    the output code in xfrm_output.c so we can move them over.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Most of the code in xfrm4_output_one and xfrm6_output_one are identical so
    this patch moves them into a common xfrm_output function which will live
    in net/xfrm.

    In fact this would seem to fix a bug as on IPv4 we never reset the network
    header after a transform which may upset netfilter later on.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch makes loopback_dev per network namespace. Adding
    code to create a different loopback device for each network
    namespace and adding the code to free a loopback device
    when a network namespace exits.

    This patch modifies all users the loopback_dev so they
    access it as init_net.loopback_dev, keeping all of the
    code compiling and working. A later pass will be needed to
    update the users to use something other than the initial network
    namespace.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch replaces all occurences to the static variable
    loopback_dev to a pointer loopback_dev. That provides the
    mindless, trivial, uninteressting change part for the dynamic
    allocation for the loopback.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Daniel Lezcano
    Acked-By: Kirill Korotaev
    Acked-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • I was looking at Patrick's fix to inet_diag and it occured
    to me that we're using a pointer argument to return values
    unnecessarily in netlink_run_queue. Changing it to return
    the value will allow the compiler to generate better code
    since the value won't have to be memory-backed.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Each netlink socket will live in exactly one network namespace,
    this includes the controlling kernel sockets.

    This patch updates all of the existing netlink protocols
    to only support the initial network namespace. Request
    by clients in other namespaces will get -ECONREFUSED.
    As they would if the kernel did not have the support for
    that netlink protocol compiled in.

    As each netlink protocol is updated to be multiple network
    namespace safe it can register multiple kernel sockets
    to acquire a presence in the rest of the network namespaces.

    The implementation in af_netlink is a simple filter implementation
    at hash table insertion and hash table look up time.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Every user of the network device notifiers is either a protocol
    stack or a pseudo device. If a protocol stack that does not have
    support for multiple network namespaces receives an event for a
    device that is not in the initial network namespace it quite possibly
    can get confused and do the wrong thing.

    To avoid problems until all of the protocol stacks are converted
    this patch modifies all netdev event handlers to ignore events on
    devices that are not in the initial network namespace.

    As the rest of the code is made network namespace aware these
    checks can be removed.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch modifies the current ipsec audit layer
    by breaking it up into purpose driven audit calls.

    So far, the only audit calls made are when add/delete
    an SA/policy. It had been discussed to give each
    key manager it's own calls to do this, but I found
    there to be much redundnacy since they did the exact
    same things, except for how they got auid and sid, so I
    combined them. The below audit calls can be made by any
    key manager. Hopefully, this is ok.

    Signed-off-by: Joy Latten
    Signed-off-by: David S. Miller

    Joy Latten
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • These functions are only used once and are a lot easier to understand if
    inlined directly into the function.

    Fixes by Masahide NAKAMURA.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Increases readability a lot.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • nlmsg_parse() puts attributes at array[type] so the indexing
    method can be simpilfied by removing the obscuring "- 1".

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Adds a policy defining the minimal payload lengths for all the attributes
    allowing for most attribute validation checks to be removed from in
    the middle of the code path. Makes updates more consistent as many format
    errors are recognised earlier, before any changes have been attempted.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Uses nlmsg_parse() to parse the attributes. This actually changes
    behaviour as unknown attributes (type > MAXTYPE) no longer cause
    an error. Instead unknown attributes will be ignored henceforth
    to keep older kernels compatible with more recent userspace tools.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Moves all complex message size calculation into own inlined helper
    functions and makes use of the type-safe netlink interface.

    Using nlmsg_new() simplifies the calculation itself as it takes care
    of the netlink header length by itself.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Moves all of the SUB_POLICY ifdefs related to the attribute size
    calculation into a function.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Adds alg_len() to calculate the properly padded length of an
    algorithm attribute to simplify the code.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Also makes use of copy_sec_ctx() in another place and removes
    duplicated code.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This simplifies successful return codes from >0 to 0.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

14 Aug, 2007

1 commit


02 Aug, 2007

1 commit


31 Jul, 2007

2 commits

  • This patch modifies the xfrm state selection logic to use the inner
    addresses where the outer have been (incorrectly) used. This is
    required for beet mode in general and interfamily setups in both
    tunnel and beet mode.

    Signed-off-by: Joakim Koskela
    Signed-off-by: Herbert Xu
    Signed-off-by: Diego Beltrami
    Signed-off-by: Miika Komu
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Joakim Koskela
     
  • Similar to the issue we had with template families which
    specified the inner families of policies, we need to set
    the inner families of states as the main xfrm user Openswan
    leaves it as zero.

    af_key is unaffected because the inner family is set by it
    and not the KM.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt