13 Aug, 2009

1 commit


03 Aug, 2009

1 commit


13 Jul, 2009

2 commits

  • The function get_net_ns_by_pid(), to get a network
    namespace from a pid_t, will be required in cfg80211
    as well. Therefore, let's move it to net_namespace.c
    and export it. We can't make it a static inline in
    the !NETNS case because it needs to verify that the
    given pid even exists (and return -ESRCH).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • All we need to take care of is using proper RCU list
    add/del primitives and inserting a synchronize_rcu()
    at one place to make sure the exit notifiers are run
    after everybody has stopped iterating the list.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

22 May, 2009

2 commits


05 May, 2009

2 commits


03 Mar, 2009

1 commit

  • It turns out that net_alive is unnecessary, and the original problem
    that led to it being added was simply that the icmp code thought
    it was a network device and wound up being unable to handle packets
    while there were still packets in the network namespace.

    Now that icmp and tcp have been fixed to properly register themselves
    this problem is no longer present and we have a stronger guarantee
    that packets will not arrive in a network namespace then that provided
    by net_alive in netif_receive_skb. So remove net_alive allowing
    packet reception run a little faster.

    Additionally document the strong reason why network namespace cleanup
    is safe so that if something happens again someone else will have
    a chance of figuring it out.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

24 Feb, 2009

1 commit


22 Feb, 2009

1 commit

  • This patch fix a double free when a network namespace fails.
    The previous code does a kfree of the net_generic structure when
    one of the init subsystem initialization fails.
    The 'setup_net' function does kfree(ng) and returns an error.
    The caller, 'copy_net_ns', call net_free on error, and this one
    calls kfree(net->gen), making this pointer freed twice.

    This patch make the code symetric, the net_alloc does the net_generic
    allocation and the net_free frees the net_generic.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     

21 Jan, 2009

1 commit


31 Oct, 2008

2 commits


29 Oct, 2008

1 commit

  • call_rcu() will unconditionally rewrite RCU head anyway.
    Applies to
    struct neigh_parms
    struct neigh_table
    struct net
    struct cipso_v4_doi
    struct in_ifaddr
    struct in_device
    rt->u.dst

    Signed-off-by: Alexey Dobriyan
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

15 Oct, 2008

1 commit


08 Oct, 2008

1 commit

  • Conntrack code will use it for
    a) removing expectations and helpers when corresponding module is removed, and
    b) removing conntracks when L3 protocol conntrack module is removed.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Patrick McHardy

    Alexey Dobriyan
     

21 Jun, 2008

1 commit

  • Alexey Dobriyan writes:
    > Subject: ICMP sockets destruction vs ICMP packets oops

    > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
    > icmp_reply() wants ICMP socket.
    >
    > Steps to reproduce:
    >
    > launch shell in new netns
    > move real NIC to netns
    > setup routing
    > ping -i 0
    > exit from shell
    >
    > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    > IP: [] icmp_sk+0x17/0x30
    > PGD 17f3cd067 PUD 17f3ce067 PMD 0
    > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
    > CPU 0
    > Modules linked in: usblp usbcore
    > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
    > RIP: 0010:[] [] icmp_sk+0x17/0x30
    > RSP: 0018:ffffffff8057fc30 EFLAGS: 00010286
    > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
    > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
    > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
    > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
    > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
    > FS: 0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
    > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
    > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
    > Stack: 0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
    > ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
    > 000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
    > Call Trace:
    > [] icmp_reply+0x44/0x1e0
    > [] ? ip_route_input+0x23a/0x1360
    > [] icmp_echo+0x65/0x70
    > [] icmp_rcv+0x180/0x1b0
    > [] ip_local_deliver+0xf4/0x1f0
    > [] ip_rcv+0x33b/0x650
    > [] netif_receive_skb+0x27a/0x340
    > [] process_backlog+0x9d/0x100
    > [] net_rx_action+0x18d/0x250
    > [] __do_softirq+0x75/0x100
    > [] call_softirq+0x1c/0x30
    > [] do_softirq+0x65/0xa0
    > [] irq_exit+0x97/0xa0
    > [] do_IRQ+0xa8/0x130
    > [] ? mwait_idle+0x0/0x60
    > [] ret_from_intr+0x0/0xf
    > [] ? mwait_idle+0x4c/0x60
    > [] ? mwait_idle+0x43/0x60
    > [] ? cpu_idle+0x57/0xa0
    > [] ? rest_init+0x70/0x80
    > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
    > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 8b 04 c3 48 83 c4 08
    > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
    > RIP [] icmp_sk+0x17/0x30
    > RSP
    > CR2: 0000000000000000
    > ---[ end trace ea161157b76b33e8 ]---
    > Kernel panic - not syncing: Aiee, killing interrupt handler!

    Receiving packets while we are cleaning up a network namespace is a
    racy proposition. It is possible when the packet arrives that we have
    removed some but not all of the state we need to fully process it. We
    have the choice of either playing wack-a-mole with the cleanup routines
    or simply dropping packets when we don't have a network namespace to
    handle them.

    Since the check looks inexpensive in netif_receive_skb let's just
    drop the incoming packets.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

16 Apr, 2008

1 commit

  • Make release_net/hold_net noop for performance-hungry people. This is a debug
    staff and should be used in the debug mode only.

    Add check for net != NULL in hold/release calls. This will be required
    later on.

    [ Added minor simplifications suggested by Brian Haley. -DaveM ]

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     

15 Apr, 2008

2 commits

  • Add the elastic array of void * pointer to the struct net.
    The access rules are simple:

    1. register the ops with register_pernet_gen_device to get
    the id of your private pointer
    2. call net_assign_generic() to put the private data on the
    struct net (most preferably this should be done in the
    ->init callback of the ops registered)
    3. do not store any private reference on the net_generic array;
    4. do not change this pointer while the net is alive;
    5. use the net_generic() to get the pointer.

    When adding a new pointer, I copy the old array, replace it
    with a new one and schedule the old for kfree after an RCU
    grace period.

    Since the net_generic explores the net->gen array inside rcu
    read section and once set the net->gen->ptr[x] pointer never
    changes, this grants us a safe access to generic pointers.

    Quoting Paul: "... RCU is protecting -only- the net_generic
    structure that net_generic() is traversing, and the [pointer]
    returned by net_generic() is protected by a reference counter
    in the upper-level struct net."

    Signed-off-by: Pavel Emelyanov
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • To make some per-net generic pointers, we need some way to address
    them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
    subsystems.

    Addressing questions about potential checkpoint/restart problems:
    these IDs are "lite-offsets" within the net structure and are by no
    means supposed to be exported to the userspace.

    Since it will be used in the nearest future by devices only (tun,
    vlan, tunnels, bridge, etc), I make it resemble the functionality
    of register_pernet_device().

    The new ids is stored in the *id pointer _before_ calling the init
    callback to make this id available in this callback.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

03 Feb, 2008

1 commit


29 Jan, 2008

1 commit

  • This patch adds a separate workqueue for cleaning up a network
    namespace. If we use the keventd workqueue to execute cleanup_net(),
    there is a problem to unregister devices in IPv6. Indeed the code
    that cleans up also schedule work in keventd: as long as cleanup_net()
    hasn't return, dst_gc_task() cannot run and as long as dst_gc_task() has
    not run, there are still some references pending on the net devices and
    cleanup_net() can not unregister and exit the keventd workqueue.

    Signed-off-by: Benjamin Thery
    Signed-off-by: Daniel Lezcano
    Acked-by: Denis V. Lunev
    Acked-By: Kirill Korotaev
    Signed-off-by: David S. Miller

    Benjamin Thery
     

23 Jan, 2008

1 commit


13 Nov, 2007

1 commit

  • If CONFIG_NET_NS is not set, the only namespace is possible.

    This patch removes list of pernet_operations and cleanups code a bit.
    This list is not needed if there are no namespaces. We should just call
    ->init method.

    Additionally, the ->exit will be called on module unloading only. This
    case is safe - the code is not discarded. For the in/kernel code, ->exit
    should never be called.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     

07 Nov, 2007

1 commit

  • Because net_free is called by copy_net_ns before its declaration, the
    compiler gives an error. This patch puts net_free before copy_net_ns
    to fix this.

    The compiler error:
    net/core/net_namespace.c: In function 'copy_net_ns':
    net/core/net_namespace.c:97: error: implicit declaration of function 'net_free'
    net/core/net_namespace.c: At top level:
    net/core/net_namespace.c:104: warning: conflicting types for 'net_free'
    net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration
    net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here

    The error was introduced by the '[NET]: Hide the dead code in the
    net_namespace.c' patch (6a1a3b9f686bb04820a232cc1657ef2c45670709).

    Signed-off-by: Johann Felix Soden
    Signed-off-by: David S. Miller

    Johann Felix Soden
     

01 Nov, 2007

4 commits

  • This cache is only required to create new namespaces,
    but we won't have them in CONFIG_NET_NS=n case.

    Hide it under the appropriate ifdef.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The setup_net is called for the init net namespace
    only (int the CONFIG_NET_NS=n of course) from the __init
    function, so mark it as __net_init to disappear with the
    caller after the boot.

    Yet again, in the perfect world this has to be under
    #ifdef CONFIG_NET_NS, but it isn't guaranteed that every
    subsystem is registered *after* the init_net_ns is set
    up. After we are sure, that we don't start registering
    them before the init net setup, we'll be able to move
    this code under the ifdef.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The namespace creation/destruction code is never called
    if the CONFIG_NET_NS is n, so it's OK to move it under
    appropriate ifdef.

    The copy_net_ns() in the "n" case checks for flags and
    returns -EINVAL when new net ns is requested. In a perfect
    world this stub must be in net_namespace.h, but this
    function need to know the CLONE_NEWNET value and thus
    requires sched.h. On the other hand this header is to be
    injected into almost every .c file in the networking code,
    and making all this code depend on the sched.h is a
    suicidal attempt.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • When the new pernet something (subsys, device or operations) is
    being registered, the init callback is to be called for each
    namespace, that currently exitst in the system. During the
    unregister, the same is to be done with the exit callback.

    However, not every pernet something has both calls, but the
    check for the appropriate pointer to be not NULL is performed
    inside the for_each_net() loop.

    This is (at least) strange, so tune this.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

31 Oct, 2007

1 commit

  • When a network namespace reference is held by a network subsystem,
    and when this reference is decremented in a rcu update callback, we
    must ensure that there is no more outstanding rcu update before
    trying to free the network namespace.

    In the normal case, the rcu_barrier is called when the network namespace
    is exiting in the cleanup_net function.

    But when a network namespace creation fails, and the subsystems are
    undone (like the cleanup), the rcu_barrier is missing.

    This patch adds the missing rcu_barrier.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     

11 Oct, 2007

6 commits

  • The newly created net namespace is set to 0 with memset()
    in setup_net(). The setup_net() is also called for the
    init_net_ns(), which is zeroed naturally as a global var.

    So remove this memset and allocate new nets with the
    kmem_cache_zalloc().

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Denis V. Lunev noticed that the locking rules
    for the network namespace list are over complicated and broken.

    In particular the current register_netdev_notifier currently
    does not take any lock making the for_each_net iteration racy
    with network namespace creation and destruction. Oops.

    The fact that we need to use for_each_net in rtnl_unlock() when
    the rtnetlink support becomes per network namespace makes designing
    the proper locking tricky. In addition we need to be able to call
    rtnl_lock() and rtnl_unlock() when we have the net_mutex held.

    After thinking about it and looking at the alternatives carefully
    it looks like the simplest and most maintainable solution is
    to remove net_list_mutex altogether, and to use the rtnl_mutex instead.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch allows you to create a new network namespace
    using sys_clone, or sys_unshare.

    As the network namespace is still experimental and under development
    clone and unshare support is only made available when CONFIG_NET_NS is
    selected at compile time.

    As this patch introduces network namespace support into code paths
    that exist when the CONFIG_NET is not selected there are a few
    additions made to net_namespace.h to allow a few more functions
    to be used when the networking stack is not compiled in.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • I proposed introducing a list_for_each_entry_continue_reverse macro
    to be used in setup_net() when unrolling the failed ->init callback.

    Here is the macro and some more cleanup in the setup_net() itself
    to remove one variable from the stack :) The same thing is for the
    cleanup_net() - the existing list_for_each_entry_reverse() is used.

    Minor, but the code looks nicer.

    Signed-off-by: Pavel Emelyanov
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • We will undo this once it is actually used.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This is the basic infrastructure needed to support network
    namespaces. This infrastructure is:
    - Registration functions to support initializing per network
    namespace data when a network namespaces is created or destroyed.

    - struct net. The network namespace data structure.
    This structure will grow as variables are made per network
    namespace but this is the minimal starting point.

    - Functions to grab a reference to the network namespace.
    I provide both get/put functions that keep a network namespace
    from being freed. And hold/release functions serve as weak references
    and will warn if their count is not zero when the data structure
    is freed. Useful for dealing with more complicated data structures
    like the ipv4 route cache.

    - A list of all of the network namespaces so we can iterate over them.

    - A slab for the network namespace data structure allowing leaks
    to be spotted.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman