16 Oct, 2007

13 commits

  • kmalloc + memset -> kzalloc in frag_alloc_queue

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • These ones use the generic data types too, so move
    them in one place.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • After the evictor code is consolidated there is no need in
    passing the extra pointer to the xxx_put() functions.

    The only place when it made sense was the evictor code itself.

    Maybe this change must got with the previous (or with the
    next) patch, but I try to make them shorter as much as
    possible to simplify the review (but they are still large
    anyway), so this change goes in a separate patch.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The evictors collect some statistics for ipv4 and ipv6,
    so make it return the number of evicted queues and account
    them all at once in the caller.

    The XXX_ADD_STATS_BH() macros are just for this case,
    but maybe there are places in code, that can make use of
    them as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • To make in possible we need to know the exact frag queue
    size for inet_frags->mem management and two callbacks:

    * to destoy the skb (optional, used in conntracks only)
    * to free the queue itself (mandatory, but later I plan to
    move the allocation and the destruction of frag_queues
    into the common place, so this callback will most likely
    be optional too).

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This code works with the generic data types as well, so
    move this into inet_fragment.c

    This move makes it possible to hide the secret_timer
    management and the secret_rebuild routine completely in
    the inet_fragment.c

    Introduce the ->hashfn() callback in inet_frags() to get
    the hashfun for a given inet_frag_queue() object.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Since now all the xxx_frag_kill functions now work
    with the generic inet_frag_queue data type, this can
    be moved into a common place.

    The xxx_unlink() code is moved as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Some sysctl variables are used to tune the frag queues
    management and it will be useful to work with them in
    a common way in the future, so move them into one
    structure, moreover they are the same for all the frag
    management codes.

    I don't place them in the existing inet_frags object,
    introduced in the previous patch for two reasons:

    1. to keep them in the __read_mostly section;
    2. not to export the whole inet_frags objects outside.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • There are some objects that are common in all the places
    which are used to keep track of frag queues, they are:

    * hash table
    * LRU list
    * rw lock
    * rnd number for hash function
    * the number of queues
    * the amount of memory occupied by queues
    * secret timer

    Move all this stuff into one structure (struct inet_frags)
    to make it possible use them uniformly in the future. Like
    with the previous patch this mostly consists of hunks like

    - write_lock(&ipfrag_lock);
    + write_lock(&ip4_frags.lock);

    To address the issue with exporting the number of queues and
    the amount of memory occupied by queues outside the .c file
    they are declared in, I introduce a couple of helpers.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Introduce the struct inet_frag_queue in include/net/inet_frag.h
    file and place there all the common fields from three structs:

    * struct ipq in ipv4/ip_fragment.c
    * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
    * struct frag_queue in ipv6/reassembly.c

    After this, replace these fields on appropriate structures with
    this structure instance and fix the users to use correct names
    i.e. hunks like

    - atomic_dec(&fq->refcnt);
    + atomic_dec(&fq->q.refcnt);

    (these occupy most of the patch)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • With all the users of the double pointers removed, this patch mops up by
    finally replacing all occurances of sk_buff ** in the netfilter API by
    sk_buff *.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch replaces unnecessary uses of skb_copy, pskb_copy and
    skb_realloc_headroom by functions such as skb_make_writable and
    pskb_expand_head.

    This allows us to remove the double pointers later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that all callers of netfilter can guarantee that the skb is not shared,
    we no longer have to copy the skb in skb_make_writable.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

11 Oct, 2007

10 commits

  • This patch make processing netlink user -> kernel messages synchronious.
    This change was inspired by the talk with Alexey Kuznetsov about current
    netlink messages processing. He says that he was badly wrong when introduced
    asynchronious user -> kernel communication.

    The call netlink_unicast is the only path to send message to the kernel
    netlink socket. But, unfortunately, it is also used to send data to the
    user.

    Before this change the user message has been attached to the socket queue
    and sk->sk_data_ready was called. The process has been blocked until all
    pending messages were processed. The bad thing is that this processing
    may occur in the arbitrary process context.

    This patch changes nlk->data_ready callback to get 1 skb and force packet
    processing right in the netlink_unicast.

    Kernel -> user path in netlink_unicast remains untouched.

    EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
    drop, but the process remains in the cycle until the message will be fully
    processed. So, there is no need to use this kludges now.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • There is no struct nfattr anymore, rename functions to 'nlattr'.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Get rid of the duplicated rtnetlink macros and use the generic netlink
    attribute functions. The old duplicated stuff is moved to a new header
    file that exists just for userspace.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Wrap the hard_header_parse function to simplify next step of
    header_ops conversion.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This patch makes loopback_dev per network namespace. Adding
    code to create a different loopback device for each network
    namespace and adding the code to free a loopback device
    when a network namespace exits.

    This patch modifies all users the loopback_dev so they
    access it as init_net.loopback_dev, keeping all of the
    code compiling and working. A later pass will be needed to
    update the users to use something other than the initial network
    namespace.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch replaces all occurences to the static variable
    loopback_dev to a pointer loopback_dev. That provides the
    mindless, trivial, uninteressting change part for the dynamic
    allocation for the loopback.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Daniel Lezcano
    Acked-By: Kirill Korotaev
    Acked-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • Each netlink socket will live in exactly one network namespace,
    this includes the controlling kernel sockets.

    This patch updates all of the existing netlink protocols
    to only support the initial network namespace. Request
    by clients in other namespaces will get -ECONREFUSED.
    As they would if the kernel did not have the support for
    that netlink protocol compiled in.

    As each netlink protocol is updated to be multiple network
    namespace safe it can register multiple kernel sockets
    to acquire a presence in the rest of the network namespaces.

    The implementation in af_netlink is a simple filter implementation
    at hash table insertion and hash table look up time.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Every user of the network device notifiers is either a protocol
    stack or a pseudo device. If a protocol stack that does not have
    support for multiple network namespaces receives an event for a
    device that is not in the initial network namespace it quite possibly
    can get confused and do the wrong thing.

    To avoid problems until all of the protocol stacks are converted
    this patch modifies all netdev event handlers to ignore events on
    devices that are not in the initial network namespace.

    As the rest of the code is made network namespace aware these
    checks can be removed.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch makes /proc/net per network namespace. It modifies the global
    variables proc_net and proc_net_stat to be per network namespace.
    The proc_net file helpers are modified to take a network namespace argument,
    and all of their callers are fixed to pass &init_net for that argument.
    This ensures that all of the /proc/net files are only visible and
    usable in the initial network namespace until the code behind them
    has been updated to be handle multiple network namespaces.

    Making /proc/net per namespace is necessary as at least some files
    in /proc/net depend upon the set of network devices which is per
    network namespace, and even more files in /proc/net have contents
    that are relevant to a single network namespace.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

11 Sep, 2007

1 commit

  • So I've had a deadlock reported to me. I've found that the sequence of
    events goes like this:

    1) process A (modprobe) runs to remove ip_tables.ko

    2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
    increasing the ip_tables socket_ops use count

    3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
    in the kernel, which in turn executes the ip_tables module cleanup routine,
    which calls nf_unregister_sockopt

    4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
    calling process into uninterruptible sleep, expecting the process using the
    socket option code to wake it up when it exits the kernel

    4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
    ipt_find_table_lock, which in this case calls request_module to load
    ip_tables_nat.ko

    5) request_module forks a copy of modprobe (process C) to load the module and
    blocks until modprobe exits.

    6) Process C. forked by request_module process the dependencies of
    ip_tables_nat.ko, of which ip_tables.ko is one.

    7) Process C attempts to lock the request module and all its dependencies, it
    blocks when it attempts to lock ip_tables.ko (which was previously locked in
    step 3)

    Theres not really any great permanent solution to this that I can see, but I've
    developed a two part solution that corrects the problem

    Part 1) Modifies the nf_sockopt registration code so that, instead of using a
    use counter internal to the nf_sockopt_ops structure, we instead use a pointer
    to the registering modules owner to do module reference counting when nf_sockopt
    calls a modules set/get routine. This prevents the deadlock by preventing set 4
    from happening.

    Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
    remove operations (the same way rmmod does), and add an option to explicity
    request blocking operation. So if you select blocking operation in modprobe you
    can still cause the above deadlock, but only if you explicity try (and since
    root can do any old stupid thing it would like.... :) ).

    Signed-off-by: Neil Horman
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Neil Horman
     

31 Jul, 2007

1 commit


25 Jul, 2007

1 commit

  • Loading one of the LOG target fails if a different target has already
    registered itself as backend for the same family. This can affect the
    ipt_LOG and ipt_ULOG modules when both are loaded.

    Reported and tested by:

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

15 Jul, 2007

5 commits


11 Jul, 2007

9 commits