11 Oct, 2007

22 commits

  • This patch make processing netlink user -> kernel messages synchronious.
    This change was inspired by the talk with Alexey Kuznetsov about current
    netlink messages processing. He says that he was badly wrong when introduced
    asynchronious user -> kernel communication.

    The call netlink_unicast is the only path to send message to the kernel
    netlink socket. But, unfortunately, it is also used to send data to the
    user.

    Before this change the user message has been attached to the socket queue
    and sk->sk_data_ready was called. The process has been blocked until all
    pending messages were processed. The bad thing is that this processing
    may occur in the arbitrary process context.

    This patch changes nlk->data_ready callback to get 1 skb and force packet
    processing right in the netlink_unicast.

    Kernel -> user path in netlink_unicast remains untouched.

    EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
    drop, but the process remains in the cycle until the message will be fully
    processed. So, there is no need to use this kludges now.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • This patch adds locking so that when we're copying non-atomic fields such as
    life-time or coaddr to user-space we don't get a partial result.

    For af_key I've changed every instance of pfkey_xfrm_state2msg apart from
    expiration notification to include the keys and life-times. This is in-line
    with XFRM behaviour.

    The actual cases affected are:

    * pfkey_getspi: No change as we don't have any keys to copy.
    * key_notify_sa:
    + ADD/UPD: This wouldn't work otherwise.
    + DEL: It can't hurt.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Here's a good example of code duplication leading to code rot. The
    notification patch did its own netlink message creation for xfrm states.
    It duplicated code that was already in dump_one_state. Guess what, the
    next time (and the time after) when someone updated dump_one_state the
    notification path got zilch.

    This patch moves that code from dump_one_state to copy_to_user_state_extra
    and uses it in xfrm_notify_sa too. Unfortunately whoever updates this
    still needs to update xfrm_sa_len since the notification path wants to
    know the exact size for allocation.

    At least I've added a comment saying so and if someone still forgest, we'll
    have a WARN_ON telling us so.

    I also changed the security size calculation to use xfrm_user_sec_ctx since
    that's what we actually put into the skb. However it makes no practical
    difference since it has the same size as xfrm_sec_ctx.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch moves some common code that conceptually belongs to the xfrm core
    from af_key/xfrm_user into xfrm_alloc_spi.

    In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
    Previously it also protected the construction of the response PF_KEY/XFRM
    messages to user-space. This is inconsistent as other identical constructions
    are not protected by the state lock. This is bad because they in fact should
    be protected but only in certain spots (so as not to hold the lock for too
    long which may cause packet drops).

    The SPI byte order conversion has also been moved.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • I was looking at Patrick's fix to inet_diag and it occured
    to me that we're using a pointer argument to return values
    unnecessarily in netlink_run_queue. Changing it to return
    the value will allow the compiler to generate better code
    since the value won't have to be memory-backed.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Each netlink socket will live in exactly one network namespace,
    this includes the controlling kernel sockets.

    This patch updates all of the existing netlink protocols
    to only support the initial network namespace. Request
    by clients in other namespaces will get -ECONREFUSED.
    As they would if the kernel did not have the support for
    that netlink protocol compiled in.

    As each netlink protocol is updated to be multiple network
    namespace safe it can register multiple kernel sockets
    to acquire a presence in the rest of the network namespaces.

    The implementation in af_netlink is a simple filter implementation
    at hash table insertion and hash table look up time.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch modifies the current ipsec audit layer
    by breaking it up into purpose driven audit calls.

    So far, the only audit calls made are when add/delete
    an SA/policy. It had been discussed to give each
    key manager it's own calls to do this, but I found
    there to be much redundnacy since they did the exact
    same things, except for how they got auid and sid, so I
    combined them. The below audit calls can be made by any
    key manager. Hopefully, this is ok.

    Signed-off-by: Joy Latten
    Signed-off-by: David S. Miller

    Joy Latten
     
  • These functions are only used once and are a lot easier to understand if
    inlined directly into the function.

    Fixes by Masahide NAKAMURA.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Increases readability a lot.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • nlmsg_parse() puts attributes at array[type] so the indexing
    method can be simpilfied by removing the obscuring "- 1".

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Adds a policy defining the minimal payload lengths for all the attributes
    allowing for most attribute validation checks to be removed from in
    the middle of the code path. Makes updates more consistent as many format
    errors are recognised earlier, before any changes have been attempted.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Uses nlmsg_parse() to parse the attributes. This actually changes
    behaviour as unknown attributes (type > MAXTYPE) no longer cause
    an error. Instead unknown attributes will be ignored henceforth
    to keep older kernels compatible with more recent userspace tools.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Moves all complex message size calculation into own inlined helper
    functions and makes use of the type-safe netlink interface.

    Using nlmsg_new() simplifies the calculation itself as it takes care
    of the netlink header length by itself.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Moves all of the SUB_POLICY ifdefs related to the attribute size
    calculation into a function.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Adds alg_len() to calculate the properly padded length of an
    algorithm attribute to simplify the code.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Also makes use of copy_sec_ctx() in another place and removes
    duplicated code.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This simplifies successful return codes from >0 to 0.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

31 Jul, 2007

1 commit

  • Similar to the issue we had with template families which
    specified the inner families of policies, we need to set
    the inner families of states as the main xfrm user Openswan
    leaves it as zero.

    af_key is unaffected because the inner family is set by it
    and not the KM.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

08 Jun, 2007

1 commit

  • Currently we check for permission before deleting entries from SAD and
    SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete())
    However we are not checking for authorization when flushing the SPD and
    the SAD completely. It was perhaps missed in the original security hooks
    patch.

    This patch adds a security check when flushing entries from the SAD and
    SPD. It runs the entire database and checks each entry for a denial.
    If the process attempting the flush is unable to remove all of the
    entries a denial is logged the the flush function returns an error
    without removing anything.

    This is particularly useful when a process may need to create or delete
    its own xfrm entries used for things like labeled networking but that
    same process should not be able to delete other entries or flush the
    entire database.

    Signed-off-by: Joy Latten
    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Joy Latten
     

05 May, 2007

2 commits


29 Apr, 2007

1 commit


27 Apr, 2007

1 commit


26 Apr, 2007

9 commits

  • On a system with a lot of SAs, counting SAD entries chews useful
    CPU time since you need to dump the whole SAD to user space;
    i.e something like ip xfrm state ls | grep -i src | wc -l
    I have seen taking literally minutes on a 40K SAs when the system
    is swapping.
    With this patch, some of the SAD info (that was already being tracked)
    is exposed to user space. i.e you do:
    ip xfrm state count
    And you get the count; you can also pass -s to the command line and
    get the hash info.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • Spring cleaning time...

    There seems to be a lot of places in the network code that have
    extra bogus semicolons after conditionals. Most commonly is a
    bogus semicolon after: switch() { }

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Switch cb_lock to mutex and allow netlink kernel users to override it
    with a subsystem specific mutex for consistent locking in dump callbacks.
    All netlink_dump_start users have been audited not to rely on any
    side-effects of the previously used spinlock.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Now that all users of netlink_dump_start() use netlink_run_queue()
    to process the receive queue, it is possible to return -EINTR from
    netlink_dump_start() directly, therefore simplying the callers.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • The error pointer argument in netlink message handlers is used
    to signal the special case where processing has to be interrupted
    because a dump was started but no error happened. Instead it is
    simpler and more clear to return -EINTR and have netlink_run_queue()
    deal with getting the queue right.

    nfnetlink passed on this error pointer to its subsystem handlers
    but only uses it to signal the start of a netlink dump. Therefore
    it can be removed there as well.

    This patch also cleans up the error handling in the affected
    message handlers to be consistent since it had to be touched anyway.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Changes netlink_rcv_skb() to skip netlink controll messages and don't
    pass them on to the message handler.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • netlink_rcv_skb() is changed to skip messages which don't have the
    NLM_F_REQUEST bit to avoid every netlink family having to perform this
    check on their own.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

14 Apr, 2007

1 commit

  • When sending a security context of 50+ characters in an ACQUIRE
    message, following kernel panic occurred.

    kernel BUG in xfrm_send_acquire at net/xfrm/xfrm_user.c:1781!
    cpu 0x3: Vector: 700 (Program Check) at [c0000000421bb2e0]
    pc: c00000000033b074: .xfrm_send_acquire+0x240/0x2c8
    lr: c00000000033b014: .xfrm_send_acquire+0x1e0/0x2c8
    sp: c0000000421bb560
    msr: 8000000000029032
    current = 0xc00000000fce8f00
    paca = 0xc000000000464b00
    pid = 2303, comm = ping
    kernel BUG in xfrm_send_acquire at net/xfrm/xfrm_user.c:1781!
    enter ? for help
    3:mon> t
    [c0000000421bb650] c00000000033538c .km_query+0x6c/0xec
    [c0000000421bb6f0] c000000000337374 .xfrm_state_find+0x7f4/0xb88
    [c0000000421bb7f0] c000000000332350 .xfrm_tmpl_resolve+0xc4/0x21c
    [c0000000421bb8d0] c0000000003326e8 .xfrm_lookup+0x1a0/0x5b0
    [c0000000421bba00] c0000000002e6ea0 .ip_route_output_flow+0x88/0xb4
    [c0000000421bbaa0] c0000000003106d8 .ip4_datagram_connect+0x218/0x374
    [c0000000421bbbd0] c00000000031bc00 .inet_dgram_connect+0xac/0xd4
    [c0000000421bbc60] c0000000002b11ac .sys_connect+0xd8/0x120
    [c0000000421bbd90] c0000000002d38d0 .compat_sys_socketcall+0xdc/0x214
    [c0000000421bbe30] c00000000000869c syscall_exit+0x0/0x40
    --- Exception: c00 (System Call) at 0000000007f0ca9c
    SP (fc0ef8f0) is in userspace

    We are using size of security context from xfrm_policy to determine
    how much space to alloc skb and then putting security context from
    xfrm_state into skb. Should have been using size of security context
    from xfrm_state to alloc skb. Following fix does that

    Signed-off-by: Joy Latten
    Acked-by: James Morris
    Signed-off-by: David S. Miller

    Joy Latten
     

23 Mar, 2007

1 commit

  • Turning up the warnings on gcc makes it emit warnings
    about the placement of 'inline' in function declarations.
    Here's everything that was under net/

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones
     

08 Mar, 2007

1 commit

  • Inside pfkey_delete and xfrm_del_sa the audit hooks were not called if
    there was any permission/security failures in attempting to do the del
    operation (such as permission denied from security_xfrm_state_delete).
    This patch moves the audit hook to the exit path such that all failures
    (and successes) will actually get audited.

    Signed-off-by: Eric Paris
    Acked-by: Venkat Yekkirala
    Acked-by: James Morris
    Signed-off-by: David S. Miller

    Eric Paris